CoNGA finds relationships between two graphs. Here, T cells are grouped by similarities in their T-cell receptor genes and the genes they turn on (clonotypes, left panel). A CoNGA score map shows the top-scoring clonotypes (middle). The relationships can also be graphed to show which CoNGA hits derive from which TCR and gene expression clusters using bi-colored disks (right panel). Credit: Bradley Lab

Immune cells have many jobs to do: Some identify infected cells and eliminate them. Others help rein in inflammation to prevent damage to healthy tissue. And many are critical components of cancer treatment. Researchers know that the specialized receptors of a type of immune cell called a T cell help regulate T cells' activity and immune roles. A new computational method called CoNGA, published today in Nature Biotechnology, could help bring into focus the hidden biological patterns that link T-cell receptor, or TCR, gene sequences and T-cell function.

Developed by a collaborative team of scientists from Fred Hutchinson Cancer Research Center, St. Jude Children's Research Hospital, 10X Genomics and the University of Southern California, CoNGA can analyze gathered from individual T to reveal new populations of T cells and unearth TCR characteristics that shape T-cell development.

Recent technological developments allow researchers to peer inside and reveal which genes are turned on and off. But single-cell data from millions of cells creates enormous and complex datasets that are beyond the interpretive power of individual people, said Dr. Phil Bradley, a Hutch computational biologist who co-led the project with St. Jude immunologist Dr. Paul Thomas.

"CoNGA is an algorithm inspired by this new class of [single-cell] data that's just coming online, laying out a way that we can analyze it," Bradley said.

Bradley and his collaborators envision other researchers using CoNGA to better understand complex groups of T cells, including those responding to a tumor. CoNGA could give scientists working to improve cancer immunotherapies—such as CAR T-cell immunotherapy or checkpoint inhibitors—a better understanding of the factors that drive successful responses in patients, a first step toward designing better versions of these treatments.

T cells: Multifaceted immune cells

T cells are generally part of what's known as adaptive immunity—the immunity that changes after you've had an infection, creating a "memory" carried in long-lived that react more quickly and effectively if you encounter that infection again. Part of this ability lies in a specialized protein on T cells' surface, the T-cell receptor or TCR. Each new T cell produced by our bodies has its own unique TCR, made by shuffling together bits of TCR genes from an array of options.

"T cells are these really interesting cells—they've got this receptor, and the nature of that receptor or what it likes to stick to, in principle, determines the fate of that cell in the context of infection or cancer," Bradley said.

T cells use their TCRs to survey the cells in our bodies, hunting for cells that are infected or diseased. Cells display molecular "tags" on their surface that T cells sample. When a T cell's TCR binds a tag on a cell, it's a sign that something is wrong.

Different T cells have many different duties. "Killer" T cells use their TCRs to pinpoint cells that should be killed off. Others give a helping hand to immune cells that produce protective proteins called antibodies. Yet others tamp down immune responses to prevent damage to healthy tissue. And there are many flavors of T cell within each category, helping the body to tailor its immune responses as needed.

The TCR and what it "sees" plays a role in determining whether a T cell will become a killer, a helper, or an immune traffic controller. But much of how this process works remains mysterious. Researchers are still seeking to find ways to predict TCR targets from TCR sequences, in addition to understanding how that shapes T-cell development. Being able to match TCR sequences to what's going on inside a T cell—which genes are turned on, or transcribed—gives insights into what role the cell is playing and whether it's currently on active duty or waiting to be called up.

"What we were really interested in knowing is, if you have this profile of the transcription in the cell, how does that relate to the sequence of the T-cell receptor on the surface?" Bradley said. "The eventual goal would be that you could take a T-cell receptor sequence and predict what that T cell is doing. That's the Holy Grail for this field, and it's really, really, hard. We're a ways away from that."

To find this Holy Grail, researchers need information from lots of individual T cells—the more they find that T cells with specific TCR characteristics also turn on a specific gene, the more likely the link is real—but these complex datasets are challenging to analyze.

Recent technological leaps allow scientists to glean information from millions of cells. These datasets can include , which genes are turned on (and how high), as well as which proteins are on the surface of the cells. Each new layer of data makes analysis more challenging.

It's more than a person can analyze on their own. We need math.

That's why Bradley, Thomas, Dr. Stefan Schattgen, a postdoctoral fellow in Thomas' group who spearheaded the study, and University of Southern California undergraduate student Kate Guion developed CoNGA, which stands for clonotype neighbor graph analysis.

Basically, the team created algorithms that can compare two graphs. One graph groups T cells that have similar transcriptional profiles (the genes they have turned on), and the other groups T cells based on similarities in their TCR sequences.

CoNGA algorithms define the relationships between these two graphs, finding correlations between TCR "neighborhoods" (TCR sequences that share similar biochemical features) and shared genes that are turned on in these cells. These patterns could give clues as to how TCRs help determine T-cell behavior.

New patterns, new possibilities

The team drew on publicly available T-cell datasets generated primarily by biotech company 10X Genomics, which produces single-cell analysis platforms. (Bradley served as an unpaid consultant to 10X Genomics during the initial analysis of this data.) The single-cell data included information about the TCR sequences and which genes were turned on in each cell. A subset of the data also included information on which molecular targets the TCRs stuck to.

When they applied CoNGA to the data, the team found correlations between TCR sequences and key genes turned on in the cells. CoNGA analysis also enabled the team to find some new, unexpected signatures in T cells related to the cells' activity and signaling. Although the biological significance of the signatures is still unclear, one may point to a previously unknown type of T cell, while another may provide clues as to how certain TCR characteristics may shape T-cell development.

Bradley and teammates' initial goal was to demonstrate that CoNGA can find meaningful patterns in complex data. Now that it's passed its first test, the team is working to follow up on the new correlations it revealed.

First of all, the team wants to confirm that the unusual population of T cells CoNGA highlighted are truly new to science. (Bradley noted that rediscovery of previously described cells happens frequently when new tools are rolled out.) If the cells really are new, the scientists will dig into the role that they play in immune responses.

Bradley and his collaborators are also thinking bigger. They want to apply CoNGA to even larger datasets (more data means more confidence that statistical patterns reflect real biology). And they're working to adapt CoNGA to apply it to B cells, a type of immune cell that makes protective proteins called antibodies. B cells also have a specialized receptor, the B-cell receptor, that regulates their development, behavior and function—and CoNGA could help scientists understand how.

The researchers are also excited about CoNGA's potential to facilitate other groups' analyses. While the team concentrated on the TCR, CoNGA could be applied to single-cell datasets that integrate other types of information, such as specific molecules on T-cell surfaces, or TCR targets.

The team will make their CoNGA algorithms freely available to the public for scientists to use or modify as they need.

"I do think it will be a useful tool for immunologists to have," Bradley said.

More information: Stefan A. Schattgen et al, Integrating T cell receptor sequences and transcriptional profiles by clonotype neighbor graph analysis (CoNGA), Nature Biotechnology (2021). DOI: 10.1038/s41587-021-00989-2

Journal information: Nature Biotechnology