Overview of scLENS (single-cell Low-dimensional embedding using the effective Noise Subtract). (Left) Current dimensionality reduction methods for scRNA-seq data involve conventional data preprocessing steps, such as log normalization, followed by manual selection of signals from the scaled data. However, this study reveals that the high levels of sparsity and variability in scRNA-seq data can lead to signal distortion during the data preprocessing, compromising the accuracy of downstream analyses. (Right) To address this issue, the researchers integrated L2 normalization into the conventional preprocessing pipeline, effectively mitigating signal distortion. Moreover, they developed a novel signal detection algorithm that eliminates the need for user intervention by leveraging random matrix theory-based noise filtering and signal robustness testing. Credit: Nature Communications (2024). DOI: 10.1038/s41467-024-47884-3

Unlocking biological information from complex single-cell genomic data has just become easier and more precise, thanks to the innovative scLENS tool developed by the Biomedical Mathematics Group within the IBS Center for Mathematical and Computational Sciences led by Chief Investigator Kim Jae Kyoung, who is also a Professor at KAIST. This represents a significant leap forward in the field of single-cell transcriptomics.

The research is published in the journal Nature Communications.

Single-cell is an advanced technique that measures at the individual cell level, revealing cellular changes and interactions that are not observable with traditional genomic analysis methods. When applied to cancer tissues, this analysis can delineate the composition of diverse cell types within a tumor, providing insights into how cancer progresses and identifying key genes involved during each stage of progression.

Despite the immense potential of single-cell genomic analysis, handling the vast amount of data that it generates has always been challenging. The amount of data covers the expression of tens of thousands of genes across hundreds to thousands of . This not only results in large datasets but also introduces noise-related distortions, which arise in part due to current measurement limitations.

Corresponding author Kim Jae Kyoung highlighted, "There has been a remarkable advancement in experimental technologies for analyzing single-cell transcriptomes over the past decade. However, due to limitations in data analysis methods, there has been a struggle to fully utilize valuable data obtained through extensive cost and time."

Researchers have developed numerous analysis methods over the years to discern biological signals from this noise. However, the accuracy of these methods has been less than satisfactory. A critical issue is that determining signal and noise thresholds often depends on subjective decisions from the users.

The newly developed scLENS tool harnesses Random Matrix Theory and Signal robustness test to automatically differentiate signals from noise without relying on subjective user input.

First author Kim Hyun stated, "Previously, users had to arbitrarily decide the threshold for signal and noise, which compromised the reproducibility of analysis results and introduced subjectivity. scLENS eliminates this problem by automatically detecting signals using only the inherent structure of the data."

During the development of scLENS, researchers identified the fundamental reasons for inaccuracies in existing analysis methods. They found that commonly used data preprocessing methods distort both biological signals and noise. The new preprocessing approach that scLENS offers is free from such distortions.

By resolving issues related to noise threshold determined by subjective user choice and signal distortion in conventional data preprocessing, scLENS significantly outperforms existing methods in accuracy. Additionally, scLENS automates the laborious process of signal dimension selection, allowing researchers to extract biological signals conveniently and automatically.

Ci Kim added, "scLENS solves major issues in single-cell transcriptome data analysis, substantially improving the accuracy and efficiency throughout the analysis process. This is a prime example of how fundamental mathematical theories can drive innovation in life sciences research, allowing researchers to more quickly and accurately answer biological questions and uncover secrets of life that were previously hidden."

More information: Hyun Kim et al, scLENS: data-driven signal detection for unbiased scRNA-seq data analysis, Nature Communications (2024). DOI: 10.1038/s41467-024-47884-3

Journal information: Nature Communications