Scientists merge statistics, biology to produce important new gene computational tool

April 3, 2018, University of California, Los Angeles
UCLA researchers Wei 'Vivian' Li, left, and Jingyi 'Jessica' Li have designed statistical analysis software called 'scImpute,' which is more precise and reliable than previous tools. Credit: Reed Hutchinson/UCLA

The cells in our bodies express themselves in different ways. One cell might put a chunk of genetic code to work, while another cell ignores the same information entirely. Understanding why could spur new stem cell therapies, or lead to a more fundamental understanding of how organisms develop. But zeroing in on these cell-to-cell differences can be challenging.

Now, two UCLA researchers have come up with a computational that increases the reliability of measuring how strongly genes are expressed in an individual cell, even when the cell is barely reading certain genes. The research was published last month in the journal Nature Communications.

"The DNA sequence is the same in a brain cell, a liver cell and a heart cell," said Jingyi "Jessica" Li , the study's corresponding author and a UCLA assistant professor of statistics. "Why do those cells look so different? The key thing is ."

DNA encodes the information needed to create and operate an organism. But the task of reading and acting on that information falls to RNA, long strands of mobile molecules that transport genetic instructions to other parts of a cell. By tallying the various RNA molecules in a cell, researchers can tell which genes are active—or "expressed"—and to what degree.

However, if RNA molecules are present only in trace amounts, analysis tools can be fooled into thinking that the corresponding genes aren't active at all. Unless corrected for, these "dropouts" can paint a misleading picture about actual differences between cells.

"If you want to obtain useful biological information at the individual cell level, then you need to do some statistical inferences," said Li, who is also head of the Junction of Statistics and Biology laboratory. "Otherwise your conclusions may be wrong."

Li and Wei "Vivian" Li, a doctoral candidate in the UCLA department of statistics, have designed statistical analysis software for handling dropouts in RNA sequencing. Their tool, called "scImpute," estimates which genes in a cell are most likely to drop out based on studying all individual cells in an experiment. The tool then uses information from similar cells to make an educated guess about what the level of gene expression should be.

Utilizing estimates isn't new. But available tools are either too broad—swapping out all gene expressions of one cell with another—or hyper-specialized for a particular type of study. The advantages of scImpute are "flexibility and universality," Jessica Li said. The tool acts with surgical precision to replace only abundances that have most likely dropped out and can be used in any type of single-cell gene-expression analysis.

In Vivian Li's comprehensive tests on both simulated and actual data—some of which provide empirical evidence for actual levels of gene expression—scImpute is more accurate than other methods. The software reliably distinguishes dropout genes from those that aren't expressed at all, and it provides accurate estimates of the actual abundances.

The open-source software is available for free online as an add-on for a widely used scientific computing platform for statistical analysis known as the R programming environment.

The two researchers have proven that scImpute works well in small groups of cells when dropout rates are low. But in large populations, dropout rates can exceed 90 percent of the genes. Their next goal is to make the tool just as reliable in those situations. By borrowing from other —not just other —and from online databases, they believe that scImpute can become a robust tool for all situations.

Explore further: Genetic switch activates transformation of stem cells into heart muscle cells

More information: Wei Vivian Li et al. An accurate and robust imputation method scImpute for single-cell RNA-seq data, Nature Communications (2018). DOI: 10.1038/s41467-018-03405-7

Related Stories

Software package processes huge amounts of single-cell data

February 13, 2018

Scientists from the Helmholtz Zentrum München have developed a program that for managing enormous datasets. The software, called Scanpy, is a candidate for analyzing the Human Cell Atlas, and has recently been published ...

Recommended for you

Scientists ID another possible threat to orcas: pink salmon

January 19, 2019

Over the years, scientists have identified dams, pollution and vessel noise as causes of the troubling decline of the Pacific Northwest's resident killer whales. Now, they may have found a new and more surprising culprit: ...

Researchers come face to face with huge great white shark

January 18, 2019

Two shark researchers who came face to face with what could be one of the largest great whites ever recorded are using their encounter as an opportunity to push for legislation that would protect sharks in Hawaii.

Why do Hydra end up with just a single head?

January 18, 2019

Often considered immortal, the freshwater Hydra can regenerate any part of its body, a trait discovered by the Geneva naturalist Abraham Trembley nearly 300 years ago. Any fragment of its body containing a few thousands cells ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.