Scientists merge statistics, biology to produce important new gene computational tool

April 3, 2018, University of California, Los Angeles
UCLA researchers Wei 'Vivian' Li, left, and Jingyi 'Jessica' Li have designed statistical analysis software called 'scImpute,' which is more precise and reliable than previous tools. Credit: Reed Hutchinson/UCLA

The cells in our bodies express themselves in different ways. One cell might put a chunk of genetic code to work, while another cell ignores the same information entirely. Understanding why could spur new stem cell therapies, or lead to a more fundamental understanding of how organisms develop. But zeroing in on these cell-to-cell differences can be challenging.

Now, two UCLA researchers have come up with a computational that increases the reliability of measuring how strongly genes are expressed in an individual cell, even when the cell is barely reading certain genes. The research was published last month in the journal Nature Communications.

"The DNA sequence is the same in a brain cell, a liver cell and a heart cell," said Jingyi "Jessica" Li , the study's corresponding author and a UCLA assistant professor of statistics. "Why do those cells look so different? The key thing is ."

DNA encodes the information needed to create and operate an organism. But the task of reading and acting on that information falls to RNA, long strands of mobile molecules that transport genetic instructions to other parts of a cell. By tallying the various RNA molecules in a cell, researchers can tell which genes are active—or "expressed"—and to what degree.

However, if RNA molecules are present only in trace amounts, analysis tools can be fooled into thinking that the corresponding genes aren't active at all. Unless corrected for, these "dropouts" can paint a misleading picture about actual differences between cells.

"If you want to obtain useful biological information at the individual cell level, then you need to do some statistical inferences," said Li, who is also head of the Junction of Statistics and Biology laboratory. "Otherwise your conclusions may be wrong."

Li and Wei "Vivian" Li, a doctoral candidate in the UCLA department of statistics, have designed statistical analysis software for handling dropouts in RNA sequencing. Their tool, called "scImpute," estimates which genes in a cell are most likely to drop out based on studying all individual cells in an experiment. The tool then uses information from similar cells to make an educated guess about what the level of gene expression should be.

Utilizing estimates isn't new. But available tools are either too broad—swapping out all gene expressions of one cell with another—or hyper-specialized for a particular type of study. The advantages of scImpute are "flexibility and universality," Jessica Li said. The tool acts with surgical precision to replace only abundances that have most likely dropped out and can be used in any type of single-cell gene-expression analysis.

In Vivian Li's comprehensive tests on both simulated and actual data—some of which provide empirical evidence for actual levels of gene expression—scImpute is more accurate than other methods. The software reliably distinguishes dropout genes from those that aren't expressed at all, and it provides accurate estimates of the actual abundances.

The open-source software is available for free online as an add-on for a widely used scientific computing platform for statistical analysis known as the R programming environment.

The two researchers have proven that scImpute works well in small groups of cells when dropout rates are low. But in large populations, dropout rates can exceed 90 percent of the genes. Their next goal is to make the tool just as reliable in those situations. By borrowing from other —not just other —and from online databases, they believe that scImpute can become a robust tool for all situations.

Explore further: Genetic switch activates transformation of stem cells into heart muscle cells

More information: Wei Vivian Li et al. An accurate and robust imputation method scImpute for single-cell RNA-seq data, Nature Communications (2018). DOI: 10.1038/s41467-018-03405-7

Related Stories

Software package processes huge amounts of single-cell data

February 13, 2018

Scientists from the Helmholtz Zentrum München have developed a program that for managing enormous datasets. The software, called Scanpy, is a candidate for analyzing the Human Cell Atlas, and has recently been published ...

Recommended for you

Fish-inspired material changes color using nanocolumns

March 20, 2019

Inspired by the flashing colors of the neon tetra fish, researchers have developed a technique for changing the color of a material by manipulating the orientation of nanostructured columns in the material.

Researchers shed new light on the origins of modern humans

March 20, 2019

Researchers from the University of Huddersfield, with colleagues from the University of Cambridge and the University of Minho in Braga, have been using a genetic approach to tackle one of the most intractable questions of ...

One transistor for all purposes

March 20, 2019

In mobiles, fridges, planes – transistors are everywhere. But they often operate only within a restricted current range. LMU physicists have now developed an organic transistor that functions perfectly under both low and ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.