Using artificial intelligence for error correction in single cell analyses

January 24, 2019, Helmholtz Zentrum Muenchen

Modern technology makes it possible to sequence individual cells and to identify which genes are currently being expressed in each cell. These methods are sensitive and consequently error prone. Devices, environment and biology itself can be responsible for failures and differences between measurements. Researchers at Helmholtz Zentrum München joined forces with colleagues from the Technical University of Munich (TUM) and the British Wellcome Sanger Institute and have developed algorithms that make it possible to predict and correct such sources of error. The work was published in Nature Methods and Nature Communications.

A visionary project of enormous scope, the Human Cell Atlas aims to map out all the tissues of the human body at various time points with the goal of creating a reference database for the development of personalized medicine, i.e. the ability to distinguish healthy from diseased . This is made possible by a technology known as RNA sequencing, which helps researchers understand exactly which genes are switched on or off at any given moment in these tiny components of life. "From a methodological point of view, this represents an enormous leap forward. Previously, such data could only be obtained from large groups of cells because the measurements required so much RNA," Maren Büttner explains. "So the results were always only the average of all the cells used. Now we're able to get precise data for every single cell," says the doctoral student at the Institute of Computational Biology (ICB) of the Helmholtz Zentrum München.

The increased sensitivity of the technique, however, also means increased susceptibility to the batch effect. "The batch effect describes fluctuations between measurements that can occur, for example, if the temperature of the device deviates even slightly or the processing time of the cells changes," Maren Büttner explains. Although several models exist for the correction of these deviations, those methods are highly dependent on the actual magnitude of the effect. "We therefore developed a user-friendly, robust and sensitive measure called kBET that quantifies differences between experiments and therefore facilitates the comparison of different correction results," Büttner says.

Besides the batch effect, a phenomenon known as dropout events poses a major challenge in single-cell sequencing. "Let's say we sequence a cell and observe that a particular gene in the cell does not emit any signal at all," explains Dr. Dr. Fabian Theis, ICB Director and professor of Mathematical Modeling of Biological Systems at the TUM. "The underlying cause of this can be biological or technical in nature: either the gene is not being read by the sequencer because it is simply not expressed, or it was not detected for technical reasons," he explains.

To recognize these cases, bioinformaticians Gökcen Eraslan and Lukas Simon from Theis's group used a large number of sequences of many single cells and developed what is known as a , i.e. artificial intelligence which simulates learning processes that occur in humans (neural networks).

Drawing on a new probabilistic model and comparing the original and reconstructed data, the algorithm determines whether the absence of a gene signal is due to a biological or technical failure. "This model even allows cell type-specific corrections to be determined without two different cell types becoming artificially similar," Fabian Theis says. "As one of the first deep learning methods in the field of single-cell genomics, the algorithm has the added benefit that it scales up well to handle data sets containing millions of cells."

But there is one thing the method is not− and this is important to emphasize: "We're not developing software to smooth out results. Our chief goal is to identify and correct errors," Fabian Theis explains. "We're able to share these data, which are as accurate as possible, with our colleagues worldwide and compare our results with theirs"—for example, when the Helmholtz researchers contribute their algorithms and analyses to the Human Cell Atlas, because reliability and comparability of the data are of paramount importance.

Explore further: Software package processes huge amounts of single-cell data

More information: Maren Büttner et al. A test metric for assessing single-cell RNA-seq batch correction, Nature Methods (2018). DOI: 10.1038/s41592-018-0254-1

Gökcen Eraslan et al. Single-cell RNA-seq denoising using a deep count autoencoder, Nature Communications (2019). DOI: 10.1038/s41467-018-07931-2

Related Stories

Software package processes huge amounts of single-cell data

February 13, 2018

Scientists from the Helmholtz Zentrum München have developed a program that for managing enormous datasets. The software, called Scanpy, is a candidate for analyzing the Human Cell Atlas, and has recently been published ...

Deep learning predicts hematopoietic stem cell development

February 21, 2017

Autonomous driving, automatic speech recognition, and the game Go: Deep Learning is generating more and more public awareness. Scientists at the Helmholtz Zentrum München and their partners at ETH Zurich and the Technical ...

Algorithms offer insight into cellular development

August 31, 2016

Through RNA sequencing, researchers can measure which genes are expressed in each individual cell of a sample. A new statistical method allows researchers to infer different developmental processes from a cell mixture consisting ...

A new tracking and quantification tool for single cells

July 19, 2016

Working with colleagues from the ETH Zürich, scientists at the Helmholtz Zentrum München and the Technical University of Munich have developed software that allows observing cells for weeks while also measuring molecular ...

Recommended for you

What happened before the Big Bang?

March 26, 2019

A team of scientists has proposed a powerful new test for inflation, the theory that the universe dramatically expanded in size in a fleeting fraction of a second right after the Big Bang. Their goal is to give insight into ...

Cellular microRNA detection with miRacles

March 26, 2019

MicroRNAs (miRNAs) are short noncoding regulatory RNAs that can repress gene expression post-transcriptionally and are therefore increasingly used as biomarkers of disease. Detecting miRNAs can be arduous and expensive as ...

Race at the edge of the sun: Ions are faster than atoms

March 26, 2019

Scientists at the University of Göttingen, the Institut d'Astrophysique in Paris and the Istituto Ricerche Solari Locarno have observed that ions move faster than atoms in the gas streams of a solar prominence. The results ...

Physicists discover new class of pentaquarks

March 26, 2019

Tomasz Skwarnicki, professor of physics in the College of Arts and Sciences at Syracuse University, has uncovered new information about a class of particles called pentaquarks. His findings could lead to a new understanding ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.