New study maps protein interactions for a quarter of the human genome

May 17, 2017, Harvard Medical School
Credit: CC0 Public Domain

Harvard Medical School researchers have mapped the interaction partners for proteins encoded by more than 5,800 genes, representing over a quarter of the human genome, according to a new study published online in Nature on May 17.

The , dubbed BioPlex 2.0, identifies more than 56,000 unique -to-protein interactions—87 percent of them previously unknown—the largest such network to date.

BioPlex reveals protein communities associated with fundamental cellular processes and diseases such as hypertension and cancer, and highlights new opportunities for efforts to understand human biology and disease.

The work was done in collaboration with Biogen, which also provided partial funding for the study.

"A gene isn't just a sequence of a piece of DNA. A gene is also the protein it encodes, and we will never understand the genome until we understand the proteome," said co-senior author Wade Harper, the Bert and Natalie Vallee Professor of Molecular Pathology and chair of the Department of Cell Biology at Harvard Medical School. "BioPlex provides a framework with the depth and breadth of data needed to address this challenge."

"This project is an atlas of human protein interactions, spanning almost every aspect of biology," said co-senior author Steven Gygi, professor of and director of the Thermo Fisher Center for Multiplexed Proteomics at Harvard Medical School. "It creates a social network for each protein and allows us to see not only how proteins interact, but also possible functional roles for previously unknown proteins."

Bait and prey

Of the roughly 20,000 protein-coding genes in the human genome, scientists have studied only a fraction in detail. To work toward a description of the entire cast of proteins in a cell and the interactions between them—known as the proteome and interactome, respectively—a team led by Harper and Gygi developed BioPlex, a high-throughput approach for the identification of protein interplay.

BioPlex uses so-called affinity purification, in which a single tagged "bait" protein is expressed in human in the laboratory. The bait protein binds with its interaction partners, or "prey" proteins, which are then fished out from the cell and analyzed using mass spectrometry, a technique that identifies and quantifies proteins based on their unique molecular signatures. In 2015, an initial effort (BioPlex 1.0) used approximately 2,600 different bait proteins, drawn from the Human ORFeome database, to identify nearly 24,000 protein interactions.

In the current study, the team expanded the network to include a total of 5,891 bait proteins, which revealed 56,553 interactions involving 10,961 different proteins. An estimated 87 percent of these interactions have not been previously reported.

Guilt by association

y mapping these interactions, BioPlex 2.0 identifies groups of functionally related proteins, which tend to cluster into tightly interconnected communities. Such "guilt-by-association" analyses suggested possible roles for previously unknown proteins, as these communities often commingle proteins with both known and unknown functions.

The team mapped numerous protein clusters associated with basic cellular processes, such as DNA transcription and energy production, and a variety of human diseases. Colorectal cancer, for example, appears to be linked to protein networks that play a role in abnormal cell growth, while hypertension is linked to protein networks for ion channels, transcription factors and metabolic enzymes.

"With the upgraded network, we can make stronger predictions because we have a more complete picture of the interactions within a cell," said first author Edward Huttlin, instructor of cell biology at Harvard Medical School. "We can pick out statistical patterns in the data that might suggest disease susceptibility for certain proteins, or others that might suggest function or localization properties. It makes a significant portion of the human proteome accessible for study."

Launching point

The entire BioPlex network and accompanying data are publicly available, supporting both large-scale studies of protein interaction and targeted studies of the function of specific proteins.

Although the network serves as the largest collection of such data gathered to date, the authors caution it remains an incomplete model. The current pipeline expresses bait proteins in only one cell type (human embryonic kidney cells) grown under one set of conditions, for example, and distinct interactions may occur in different cell types or microenvironments.

As the network increases in size and more human proteins are used as baits, scientists can better judge the accuracy of each individual protein interaction by considering its context in the larger network. Isolating the same protein complex several times, each time using a different member as a bait, can provide multiple independent experimental observations to confirm each protein's membership. Moreover, by using prey proteins as bait, many protein interactions can be observed in the opposite direction as well. Both of these scenarios greatly reduce the likelihood that particular interactions were identified due to chance. The team continues to add to BioPlex, with a target goal of around 10,000 bait proteins, which would cover half of the and would further increase the predictive power of the network.

"We certainly aren't seeing all the interactions, but it's a launching point. We think it's important to continue to build this map, to see how much of it is reproduced in other cell types under different conditions, to see whether the interactions are similar or dynamic," Gygi said. "Because whether you're interested in cancer or neurodegenerative disease, basic development or evolutionary fitness—you can make new hypotheses and learn something from this network."

Explore further: Facebook for the proteome

More information: Architecture of the human interactome defines protein communities and disease networks, Nature (2017).

Related Stories

Facebook for the proteome

July 17, 2015

There are approximately 20,000 human genes that encode proteins, but despite remarkable progress since the human genome was first sequenced more than a decade ago, scientists still understand in detail how only a small fraction ...

A social network of human proteins

October 23, 2015

Scientists at the Max Planck Institute of Biochemistry in Martinsried near Munich and at the MPI of Molecular Cell Biology and Genetics in Dresden have now drawn a detailed map of human protein interactions. Using a novel ...

Ancient proteins studied in detail

May 8, 2017

How did protein interactions arise and how have they developed? In a new study, researchers have looked at two proteins which began co-evolving between 400 and 600 million years ago. What did they look like? How did they ...

Recommended for you

China seems to confirm scientist's gene-edited babies claim

January 21, 2019

Chinese authorities appear to have confirmed a scientist's unpublished claim that he helped make the world's first gene-edited babies and that a second pregnancy is underway, and say he could face consequences for his work.

Competing species can both survive, study finds

January 21, 2019

When species compete for limited resources, structures in their environment can be the difference between coexistence or one eliminating another. Relationships between species also are important, according to new research ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.