New computational tool reliably differentiates between cancer and normal cells from single-cell RNA-sequencing data

Credit: CC0 Public Domain

In an effort to address a major challenge when analyzing large single-cell RNA-sequencing datasets, researchers from The University of Texas MD Anderson Cancer Center have developed a new computational technique to accurately differentiate between data from cancer cells and the variety of normal cells found within tumor samples. The work was published today in Nature Biotechnology.

The new tool, dubbed CopyKAT (copy number karyotyping of aneuploid tumors), allows researchers to more easily examine the complex data obtained from large single-cell RNA-sequencing experiments, which deliver gene expression data from many thousands of .

CopyKAT uses that to look for aneuploidy, or the presence of abnormal chromosome numbers, which is common in most cancers, said study senior author Nicholas Navin, Ph.D., associate professor of Genetics and Bioinformatics & Computational Biology. The tool also helps to identify distinct subpopulations, or clones, within the .

"We developed CopyKAT as a tool to infer from the transcriptome data. By applying this tool to several datasets, we showed that we could unambiguously identify, with about 99% accuracy, tumor versus the other immune or stromal cells present in a mixed tumor sample," Navin said. "We could then go one step further to discover the subclones present and understand their ."

Historically, tumors have been studied as a mixture of all cells present, many of which are not cancerous. The advent of single-cell RNA sequencing in recent years has enabled researchers to analyze tumors in much greater resolution, examining the gene expression of each individual cell to develop a picture of the tumor landscape, including the surrounding microenvironment.

However, it's not easy to distinguish between cancer cells and normal cells without a reliable computational approach, Navin explained. Former postdoctoral fellow Ruli Gao, Ph.D., now assistant professor of Cardiovascular Sciences at Houston Methodist Research Institute, developed the CopyKAT algorithms, which improve upon older techniques by increasing accuracy and adjusting for the newest generation of single-cell RNA-sequencing data.

The team first benchmarked its tool by comparing results to whole-genome sequencing data, which showed high accuracy in predicting copy number changes. In three additional datasets from pancreatic cancer, and anaplastic thyroid cancer, the researchers showed that CopyKAT was accurate in distinguishing between and normal cells in mixed samples.

These analyses were made possible through collaborations with Stephen Y. Lai, M.D., Ph.D., professor of Head and Neck Surgery, as well as Stacy Moulder, M.D., professor of Breast Medical Oncology, and the Breast Cancer Moon Shot, part of MD Anderson's Moon Shots Program, a collaborative effort to rapidly develop into meaningful clinical advances that save patients' lives.

In analyzing these samples, the researchers also showed the tool is effective in identifying subpopulations of cancer cells within the tumor based on copy number differences, as confirmed by experiments in triple-negative breast cancers.

"By using CopyKAT, we were able to identify rare subpopulations within triple-negative breast cancers that have unique genetic alterations not widely reported, including those with potential therapeutic implications," Gao said. "We hope this tool will be useful to the research community to make the most of their single-cell RNA-sequencing data and to drive new discoveries in cancer."

The tool is freely available to researchers here. The authors note that the tool is not applicable to the study of all types. Aneuploidy, for example, is relatively rare in pediatric and hematologic cancers.

More information: Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes, Nature Biotechnology (2021). DOI: 10.1038/s41587-020-00795-2 ,

Journal information: Nature Biotechnology

Citation: New computational tool reliably differentiates between cancer and normal cells from single-cell RNA-sequencing data (2021, January 18) retrieved 30 May 2023 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Using gene expression data to compare laboratory cancer models to real tumors


Feedback to editors