March 23, 2023 report

Predicting protein folding from single sequences with Meta AI ESM-2

by Justin Jackson , Phys.org

Researchers from Facebook AI Research (FAIR) at Meta AI have published a paper in the journal Science detailing a machine-learning-created database of 617 million predicted protein structures. The ESMFold language model described the structures 60 times faster than DeepMinds AlphaFold2, though with less reported accuracy.

The fold predictions were completed in just two weeks on a cluster of about 2,000 GPUs. The initial sequence lengths ranged from 20 to 1,024 nucleotides. 365 million predictions were made with good confidence, and ∼225 million predictions fell within a high confidence of accuracy.

According to the report, "Evolutionary-scale prediction of atomic-level protein structure with a language model," a random sample of 1 million high-confidence results showed that 767,580 proteins have a sequence identity below 90% to any sequence in UniRef90, a database of known protein sequences. Researchers believe this indicates that the proteins are distinct from existing UniRef90 sequences.

The Meta AI team then compared the sample of predicted structures with known structures in the Protein Data Bank, a database for three-dimensional protein structures. At thresholds 0.5 TM-score, 12.6% (125,765 proteins) were without a structural component match. Based on this, researchers estimate that about 28 million proteins (12.6% of 225 million) with high-confidence predictions could characterize regions of protein structure that are distant from existing knowledge.

Predictions based on sequences

A protein begins as a linear sequence of nucleotides copied from DNA (transcription), creating messenger-RNA, a raw ingredient wish list of the protein it will become. The mRNA nucleotides are then translated into amino acids (the raw ingredients). This chain of amino acids then undergoes an incredible transformation into a complex three-dimensional folded shape that, depending on its folded structure, carries out specific intricate cellular functions.

How a protein or enzyme folds in part determines its function because it limits and optimizes what it can interact with. The structure creates an opening or "lock" that only operates with the correct molecular "key." People have been using these lock and key enzymes for everything from the food industry and beer brewing to textiles and biofuel without a detailed understanding of how the proteins are actually folded.

Laundry detergents typically contain several types of enzymes, some of which will be cellulases that break down plant material. When the cellulase enzyme encounters cellulose from a grass stain, the cellulose becomes the key that fits the lock. The enzyme triggers a chemical reaction breaking down the bonds within the grass stain. The same enzyme will do nothing when encountering a lipstick or grease stain, that may be a job for another enzyme.

A single protein enzyme might perform a task thousands or even millions of times per second without breaking, offering industries a low-energy powerhouse of a catalyst and making enzymes an instrumental technology.

Every system in our body also relies on proteins to carry out biological functions. Because the folded structure of a protein is crucial to the activity it can engage in, understanding this structure is critical to understanding how they work when investigating causes of disease.

The ability to predict how a protein will fold based on the primary sequence of amino acids (raw ingredients) would allow medical researchers to better understand protein metabolite interactions and biological functions throughout the body. This higher-resolution understanding could identify hidden disease traits, accelerate research into new or better treatments and somewhat revolutionize modern medicine. Understanding precisely how structure follows the form of raw ingredients (translated mRNA) would also allow researchers to build custom proteins to perform specific tasks in healthcare and industry.

In the decades preceding AI prediction models, scientists modeled the structures of about 190,000 proteins of interest. Machine learning has now generated hundreds of millions of predictions that still need to be confirmed and studied to be useful. While still not reliable enough to replace the slower methodical X-ray crystallography for structure or a controlled assay experiment for function, AI is just getting started. The knowledge gained in the decades to come will likely eclipse everything that came before.

More information: Zeming Lin et al, Evolutionary-scale prediction of atomic-level protein structure with a language model, Science (2023). DOI: 10.1126/science.ade2574

Journal information: Science

Citation: Predicting protein folding from single sequences with Meta AI ESM-2 (2023, March 23) retrieved 28 April 2024 from https://phys.org/news/2023-03-protein-sequences-meta-ai-esm-.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Why synonymous mutations are not always silent

91 shares

Feedback to editors

Predicting protein folding from single sequences with Meta AI ESM-2

Predictions based on sequences

Global study shows a third more insects come out after dark

Cicada-palooza! Billions of bugs to blanket America

Getting dynamic information from static snapshots

Ancient Maya blessed their ballcourts: Researchers find evidence of ceremonial offerings in Mexico

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Relevant PhysicsForums posts

Is 5 milliamps at 240 volts dangerous?

The Cass Report (UK)

Major Evolution in Action

If theres a 15% probability each month of getting a woman pregnant...

Can four legged animals drink from beneath their feet?

Mold in Plastic Water Bottles? What does it eat?

Why synonymous mutations are not always silent

3D protein structure predictions made by AI could boost cancer research and drug discovery

Computational biologists design a novel and improved triosephosphate isomerase barrel protein

AI technology generates original proteins from scratch

Plausible steps toward the evolution of a key protein fold of RNA polymerases

When researchers don't have the proteins they need, they can get AI to 'hallucinate' new structures

Getting dynamic information from static snapshots

Study suggests host response needs to be studied along with other bacteriophage research

Automated machine learning robot unlocks new potential for genetics research

Study details a common bacterial defense against viral infection

Researchers decipher how an enzyme modifies the genetic material in the cell nucleus

Scientists discover higher levels of CO₂ increase survival of viruses in the air and transmission risk

Medical Xpress

Tech Xplore

Science X

Predicting protein folding from single sequences with Meta AI ESM-2

Predictions based on sequences

Global study shows a third more insects come out after dark

Cicada-palooza! Billions of bugs to blanket America

Getting dynamic information from static snapshots

Ancient Maya blessed their ballcourts: Researchers find evidence of ceremonial offerings in Mexico

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Relevant PhysicsForums posts

Related Stories

Why synonymous mutations are not always silent

3D protein structure predictions made by AI could boost cancer research and drug discovery

Computational biologists design a novel and improved triosephosphate isomerase barrel protein

AI technology generates original proteins from scratch

Plausible steps toward the evolution of a key protein fold of RNA polymerases

When researchers don't have the proteins they need, they can get AI to 'hallucinate' new structures

Recommended for you

Getting dynamic information from static snapshots

Study suggests host response needs to be studied along with other bacteriophage research

Automated machine learning robot unlocks new potential for genetics research

Study details a common bacterial defense against viral infection

Researchers decipher how an enzyme modifies the genetic material in the cell nucleus

Scientists discover higher levels of CO₂ increase survival of viruses in the air and transmission risk

Newsletter sign up

Donate and enjoy an ad-free experience