Scientists merge statistics, biology to produce important new gene computational tool

April 3, 2018, University of California, Los Angeles
UCLA researchers Wei 'Vivian' Li, left, and Jingyi 'Jessica' Li have designed statistical analysis software called 'scImpute,' which is more precise and reliable than previous tools. Credit: Reed Hutchinson/UCLA

The cells in our bodies express themselves in different ways. One cell might put a chunk of genetic code to work, while another cell ignores the same information entirely. Understanding why could spur new stem cell therapies, or lead to a more fundamental understanding of how organisms develop. But zeroing in on these cell-to-cell differences can be challenging.

Now, two UCLA researchers have come up with a computational that increases the reliability of measuring how strongly genes are expressed in an individual cell, even when the cell is barely reading certain genes. The research was published last month in the journal Nature Communications.

"The DNA sequence is the same in a brain cell, a liver cell and a heart cell," said Jingyi "Jessica" Li , the study's corresponding author and a UCLA assistant professor of statistics. "Why do those cells look so different? The key thing is ."

DNA encodes the information needed to create and operate an organism. But the task of reading and acting on that information falls to RNA, long strands of mobile molecules that transport genetic instructions to other parts of a cell. By tallying the various RNA molecules in a cell, researchers can tell which genes are active—or "expressed"—and to what degree.

However, if RNA molecules are present only in trace amounts, analysis tools can be fooled into thinking that the corresponding genes aren't active at all. Unless corrected for, these "dropouts" can paint a misleading picture about actual differences between cells.

"If you want to obtain useful biological information at the individual cell level, then you need to do some statistical inferences," said Li, who is also head of the Junction of Statistics and Biology laboratory. "Otherwise your conclusions may be wrong."

Li and Wei "Vivian" Li, a doctoral candidate in the UCLA department of statistics, have designed statistical analysis software for handling dropouts in RNA sequencing. Their tool, called "scImpute," estimates which genes in a cell are most likely to drop out based on studying all individual cells in an experiment. The tool then uses information from similar cells to make an educated guess about what the level of gene expression should be.

Utilizing estimates isn't new. But available tools are either too broad—swapping out all gene expressions of one cell with another—or hyper-specialized for a particular type of study. The advantages of scImpute are "flexibility and universality," Jessica Li said. The tool acts with surgical precision to replace only abundances that have most likely dropped out and can be used in any type of single-cell gene-expression analysis.

In Vivian Li's comprehensive tests on both simulated and actual data—some of which provide empirical evidence for actual levels of gene expression—scImpute is more accurate than other methods. The software reliably distinguishes dropout genes from those that aren't expressed at all, and it provides accurate estimates of the actual abundances.

The open-source software is available for free online as an add-on for a widely used scientific computing platform for statistical analysis known as the R programming environment.

The two researchers have proven that scImpute works well in small groups of cells when dropout rates are low. But in large populations, dropout rates can exceed 90 percent of the genes. Their next goal is to make the tool just as reliable in those situations. By borrowing from other —not just other —and from online databases, they believe that scImpute can become a robust tool for all situations.

Explore further: Genetic switch activates transformation of stem cells into heart muscle cells

More information: Wei Vivian Li et al. An accurate and robust imputation method scImpute for single-cell RNA-seq data, Nature Communications (2018). DOI: 10.1038/s41467-018-03405-7

Related Stories

Software package processes huge amounts of single-cell data

February 13, 2018

Scientists from the Helmholtz Zentrum München have developed a program that for managing enormous datasets. The software, called Scanpy, is a candidate for analyzing the Human Cell Atlas, and has recently been published ...

Recommended for you

Houseplants could one day monitor home health

July 20, 2018

In a perspective published in the July 20 issue of Science, Neal Stewart and his University of Tennessee coauthors explore the future of houseplants as aesthetically pleasing and functional sirens of home health.

Putting bacteria to work

July 20, 2018

The idea of bacteria as diverse, complex perceptive entities that can hunt prey in packs, remember past experiences and interact with the moods and perceptions of their human hosts sounds like the plot of some low-budget ...

LC10 – the neuron that tracks fruit flies

July 20, 2018

Many animals rely on vision to detect, locate, and track moving objects. Male Drosophila fruit flies primarily use visual cues to stay close to a female and to direct their courtship song towards her. Scientists from the ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.