September 13, 2023

Revealing the secrets of protein evolution using the AlphaFold database

by Vicky Hatch, European Molecular Biology Laboratory

By developing an efficient way to compare all predicted protein structures in the AlphaFold database, researchers have revealed similarities between proteins across different species. This work aids our understanding of protein evolution and has uncovered new insights into the origin of human immunity proteins.

The research was conducted by EMBL's European Bioinformatics Institute (EMBL-EBI), the Institute of Molecular Systems Biology ETH Zurich, and the School of Biological Sciences Seoul National University.

The AlphaFold database is a transformative resource in the field of protein research, serving as a comprehensive repository of AI-predicted 3D structures for all known proteins. The database fills a critical gap in understanding protein function and evolution by offering high-quality structural predictions. Although AI predictions are not a substitute for experimentally determined structures, they do provide invaluable insights for the scientific community.

For this study, published in the journal Nature, the researchers developed a new algorithm known as Foldseek Cluster that can be used to analyze large sets of protein structures all at once. Foldseek Cluster was applied to the 200 million predicted protein structures in the AlphaFold database, identifying over 2 million unique structural clusters—groups of protein structures that are similar to each other in their three-dimensional shapes. One third of these clusters lack any previous annotations, meaning they had not before been described or categorized.

Bridging the gap in protein science

Proteins are critical to processes that take place in the cell. Understanding protein structure is pivotal for studying their function and evolution. Despite significant advancements in sequence-based predictions of protein structures, computational limitations have made it difficult to study these structures at scale. Foldseek Cluster now enables structural comparisons and clustering at an unprecedented scale, reducing the time for such tasks by several orders of magnitude.

"We've entered a new era in structural biology where computational methods unlock unprecedented access to explore the protein universe," said Martin Steinegger, Assistant Professor at the School of Biological Sciences Seoul National University.

"We estimated that clustering all structures with established methods would have taken a decade when compared to the five days it took using our new method, Foldseek Cluster. Our algorithm can sift through millions of predicted protein structures in the AlphaFold database and cluster them based on their 3D shapes. This acceleration in computational power doesn't just make things faster; it makes things possible."

Protein evolution and immunity

The study also delves into the evolutionary implications of these clusters. While most clusters are ancient in origin, around 4% appear to be species-specific. This offers new insights into evolutionary phenomena such as de novo gene birth—when new genes arise from non-coding regions of the genome. The work also illustrates several examples of evolutionary relationships that could enrich our understanding of protein function across different species, including their role in human immunity.

"This work isn't just about making comparisons more efficiently, it's about gaining new insights into the evolutionary history of proteins," said Pedro Beltrao, Associate Professor at the Institute of Molecular Systems Biology, ETH Zurich.

"One of the most interesting findings from this study is our detection of structural similarities between human immune system proteins and those found in bacteria. This suggests that proteins involved in the immune system may have ancient evolutionary origins that we share with bacterial species. If true, this could reshape our understanding of immunity. Our research not only advances current knowledge but also lays out a roadmap for future investigations into the mysteries of protein function and evolution."

Improving the AlphaFold database functionality

As the AlphaFold database and other life science databases continue to grow there is a significant need to help users sift through the vast amount of data while reducing the computational costs of analyzing and managing these data. Approaches such as the Foldseek Cluster algorithm, that is scalable to billions of structures, will be invaluable in helping researchers navigate this wealth of information.

"Foldseek Cluster is more than just a technological advancement; it's an enhancement that elevates the entire AlphaFold database experience for researchers worldwide," said Sameer Velankar, Team Leader at EMBL-EBI.

"With the explosion of predicted protein structures we have in AFDB, managing and navigating these data efficiently has been a significant challenge," he continued. "Foldseek Cluster has revolutionized this process. We are working on integrating FoldSeek clusters into AFDB to streamline the analysis of large sets of protein structures and make it easier for our user community to find exactly what they're looking for."

More information: Martin Steinegger, Clustering-predicted structures at the scale of the known protein universe, Nature (2023). DOI: 10.1038/s41586-023-06510-w. www.nature.com/articles/s41586-023-06510-w

Journal information: Nature

Provided by European Molecular Biology Laboratory

Citation: Revealing the secrets of protein evolution using the AlphaFold database (2023, September 13) retrieved 27 April 2024 from https://phys.org/news/2023-09-revealing-secrets-protein-evolution-alphafold.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AlphaFold predicts structure of almost every catalogued protein known to science

233 shares

Feedback to editors

Revealing the secrets of protein evolution using the AlphaFold database

Bridging the gap in protein science

Improving the AlphaFold database functionality

Global study shows a third more insects come out after dark

Cicada-palooza! Billions of bugs to blanket America

Getting dynamic information from static snapshots

Ancient Maya blessed their ballcourts: Researchers find evidence of ceremonial offerings in Mexico

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Relevant PhysicsForums posts

The Cass Report (UK)

Major Evolution in Action

If theres a 15% probability each month of getting a woman pregnant...

Can four legged animals drink from beneath their feet?

Mold in Plastic Water Bottles? What does it eat?

Dolphins don't breathe through their esophagus

AlphaFold predicts structure of almost every catalogued protein known to science

DeepMind and EMBL release the most complete database of predicted 3D structures of human proteins

Physicists use AI to find the most complex protein knots so far

Scientists build on AI modelling to understand more about protein-sugar structures

New strategy decodes dynamic structure of proteins within cells

PeSTo: A new AI tool for predicting protein interactions

Getting dynamic information from static snapshots

Study suggests host response needs to be studied along with other bacteriophage research

Automated machine learning robot unlocks new potential for genetics research

Study details a common bacterial defense against viral infection

AI deciphers new gene regulatory code in plants and makes accurate predictions for newly sequenced genomes

Scientists discover higher levels of CO₂ increase survival of viruses in the air and transmission risk

Medical Xpress

Tech Xplore

Science X

Revealing the secrets of protein evolution using the AlphaFold database

Bridging the gap in protein science

Improving the AlphaFold database functionality

Global study shows a third more insects come out after dark

Cicada-palooza! Billions of bugs to blanket America

Getting dynamic information from static snapshots

Ancient Maya blessed their ballcourts: Researchers find evidence of ceremonial offerings in Mexico

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Relevant PhysicsForums posts

Related Stories

AlphaFold predicts structure of almost every catalogued protein known to science

DeepMind and EMBL release the most complete database of predicted 3D structures of human proteins

Physicists use AI to find the most complex protein knots so far

Scientists build on AI modelling to understand more about protein-sugar structures

New strategy decodes dynamic structure of proteins within cells

PeSTo: A new AI tool for predicting protein interactions

Recommended for you

Getting dynamic information from static snapshots

Study suggests host response needs to be studied along with other bacteriophage research

Automated machine learning robot unlocks new potential for genetics research

Study details a common bacterial defense against viral infection

AI deciphers new gene regulatory code in plants and makes accurate predictions for newly sequenced genomes

Scientists discover higher levels of CO₂ increase survival of viruses in the air and transmission risk

Newsletter sign up

Donate and enjoy an ad-free experience