March 25, 2024

GPT-4 for identifying cell types in single cells matches and sometimes outperforms expert methods

by Columbia University's Mailman School of Public Health

GPT-4 can accurately interpret types of cells important for the analysis of single-cell RNA sequencing—a sequencing process fundamental to interpreting cell types—with high consistency to that of time-consuming manual annotation by human experts of gene information, according to a study at Columbia University Mailman School of Public Health. The findings are published in the journal Nature Methods.

GPT-4 is a large language model designed for speech understanding and generation. Upon assessment across numerous tissue and cell types, GPT-4 has demonstrated the ability to produce cell type annotations that closely align with manual annotations of human experts and surpass existing automatic algorithms.

This feature has the potential to significantly lessen the amount of effort and expertise needed for annotating cell types, a process that can take months. Moreover, the researchers have developed GPTCelltype, an R software package, to facilitate the automated annotation of cell types using GPT-4.

"The process of annotating cell types for single cells is often time-consuming, requiring human experts to compare genes across cell clusters," said Wenpin Hou, Ph.D., assistant professor of Biostatistics at Columbia Mailman School.

"Although automated cell type annotation methods have been developed, manual methods to interpret scientific data remain widely used, and such a process can take weeks to months. We hypothesized that GPT-4 can accurately annotate cell types, transitioning the process from manual to a semi- or even fully automated procedure and be cost-efficient and seamless."

The researchers assessed GPT-4's performance across ten datasets covering five species, hundreds of tissue and cell types, and including both normal and cancer samples. GPT-4 was queried using GPTCelltype, the software tool developed by the researchers. For competing purposes, they also evaluated other GPT versions and manual methods as a reference tool.

As a first step, the researchers first explored the various factors that may affect the annotation accuracy of GPT-4. They found that GPT-4 performs best when using the top 10 different genes and exhibits similar accuracy across various prompt strategies, including a basic prompt strategy, a chain-of-thought-inspired prompt strategy that includes reasoning steps, and a repeated prompt strategy. GPT-4 matched manual analyses in over 75% of cell types in most studies and tissues demonstrating its competency in generating expert-comparable cell type annotations.

In addition, the low agreement between GPT-4 and manual annotations in some cell types does not necessarily imply that GPT-4's annotation is incorrect. In an example of stromal or connective tissue cells, GPT-4 provides more accurate cell type annotations. GPT-4 was also notably faster.

Hou and her colleague also assessed GPT-4's robustness in complex real data scenarios and found that GPT-4 can distinguish between pure and mixed cell types with 93% accuracy, and differentiated between known and unknown cell types with 99% accuracy. They evaluated the performance of reproducing GPT-4's methods using prior simulation studies. GPT-4 generated identical notations for the same marker genes in 85% of cases.

"All of these results demonstrate GPT-4's robustness in various scenarios," observed Hou.

While GPT-4 surpasses existing methods, there are limitations to consider, according to Hou, including the challenges for verifying GPT-4's quality and reliability because it discloses little about its training proceedings.

"Since our study focuses on the standard version of GPT-4, fine-tuning GPT-4 could further improve cell type annotation performance," said Hou.

Zhicheng Ji of Duke University School of Medicine is a co-author.

More information: Wenpin Hou et al, Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis, Nature Methods (2024). DOI: 10.1038/s41592-024-02235-4

Journal information: Nature Methods

Provided by Columbia University's Mailman School of Public Health

Citation: GPT-4 for identifying cell types in single cells matches and sometimes outperforms expert methods (2024, March 25) retrieved 5 July 2024 from https://phys.org/news/2024-03-gpt-cell-cells-outperforms-expert.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Identifying cell types from single-cell RNA sequencing data automatically

44 shares

Feedback to editors

GPT-4 for identifying cell types in single cells matches and sometimes outperforms expert methods

Engineers find a way to protect microbes from extreme conditions

Desert-loving fungi and lichens pose deadly threat to 5,000-year-old rock art

Study reveals rapid evolution and global spread of Pseudomonas aeruginosa

Recovery of unique geological samples sheds light on formation of today's Antarctic ice sheet

Phage viruses, used to treat antibiotic resistance, gain advantage by cutting off competitors' reproduction ability

Using copper to convert CO₂ to methane could be game changer in mitigating climate change

Song melodies have become simpler since 1950, study suggests

Permaculture found to be a sustainable alternative to conventional agriculture

A closer look at cell toxins: Researchers examine how radionuclides interact with kidney cells

Scientists discover new plants that could lead to 'climate-proof' chocolate

Relevant PhysicsForums posts

Conflicting interpretations of rosemary oil study

Who chooses official designations for individual dolphins, such as FB15, F153, F286?

Color Recognition: What we see vs animals with a larger color range

Innovative ideas and technologies to help folks with disabilities

Is meat broth really nutritious?

COVID Virus Lives Longer with Higher CO2 In the Air

Identifying cell types from single-cell RNA sequencing data automatically

Machine learning radically reduces workload of cell counting for disease diagnosis

Enhanced AI tracks neurons in moving animals

Mathematical methods for analyzing single-cell transcriptomic data

New DNA methylation-based method for precise assessment of pancreas cell composition

Scientists develop high-resolution method to analyze skin gene expression in thermal burns

Desert-loving fungi and lichens pose deadly threat to 5,000-year-old rock art

Study reveals rapid evolution and global spread of Pseudomonas aeruginosa

Phage viruses, used to treat antibiotic resistance, gain advantage by cutting off competitors' reproduction ability

Energy landscape theory sheds light on evolution of foldable proteins

Researchers uncover key mechanisms in chromosome structure development

Researchers capture never-before-seen view of gene transcription

Medical Xpress

Tech Xplore

Science X

GPT-4 for identifying cell types in single cells matches and sometimes outperforms expert methods

Engineers find a way to protect microbes from extreme conditions

Desert-loving fungi and lichens pose deadly threat to 5,000-year-old rock art

Study reveals rapid evolution and global spread of Pseudomonas aeruginosa

Recovery of unique geological samples sheds light on formation of today's Antarctic ice sheet

Phage viruses, used to treat antibiotic resistance, gain advantage by cutting off competitors' reproduction ability

Using copper to convert CO₂ to methane could be game changer in mitigating climate change

Song melodies have become simpler since 1950, study suggests

Permaculture found to be a sustainable alternative to conventional agriculture

A closer look at cell toxins: Researchers examine how radionuclides interact with kidney cells

Scientists discover new plants that could lead to 'climate-proof' chocolate

Relevant PhysicsForums posts

Related Stories

Identifying cell types from single-cell RNA sequencing data automatically

Machine learning radically reduces workload of cell counting for disease diagnosis

Enhanced AI tracks neurons in moving animals

Mathematical methods for analyzing single-cell transcriptomic data

New DNA methylation-based method for precise assessment of pancreas cell composition

Scientists develop high-resolution method to analyze skin gene expression in thermal burns

Recommended for you

Desert-loving fungi and lichens pose deadly threat to 5,000-year-old rock art

Study reveals rapid evolution and global spread of Pseudomonas aeruginosa

Phage viruses, used to treat antibiotic resistance, gain advantage by cutting off competitors' reproduction ability

Energy landscape theory sheds light on evolution of foldable proteins

Researchers uncover key mechanisms in chromosome structure development

Researchers capture never-before-seen view of gene transcription

Newsletter sign up

Donate and enjoy an ad-free experience