share this!
6
3
Share
Email

March 9, 2023

A new tool for protein sequence generation and design

by Nik Papageorgiou, Ecole Polytechnique Federale de Lausanne

molecule structure — Credit: Pixabay/CC0 Public Domain

EPFL researchers have developed a new technique that uses a protein language model for generating protein sequences with comparable properties to natural sequences. The method outperforms traditional models and offers promising potential for protein design.

Designing new proteins with specific structure and function is a highly important goal of bioengineering, but the vast size of protein sequence space makes the search for new proteins difficult. However, a new study by the group of Anne-Florence Bitbol at EPFL's School of Life Sciences has found that a deep-learning neural network, MSA Transformer, could be a promising solution.

Developed in 2021, MSA Transformer works in a similar way to natural language processing, used by the now famous ChatGPT. The team, composed of Damiano Sgarbossa, Umberto Lupo, and Anne-Florence Bitbol, proposed and tested an "iterative method," which relies on the ability of the model to predict missing or masked parts of a sequence based on the surrounding context.

The team found that through this approach, MSA Transformer can be used for generating new protein sequences from given protein "families" (groups of proteins with similar sequences), with similar properties to natural sequences.

In fact, protein sequences generated from large families with many homologs had better or similar properties than sequences generated by Potts models. "A Potts model is an entirely different type of generative model not based on natural language processing or deep learning, which was recently experimentally validated," explains Bitbol. "Our new MSA Transformer-based approach allowed us to generate proteins even from small families, where Potts models perform poorly."

The MSA Transformer reproduces the higher-order statistics and the distribution of sequences in natural data more accurately than other models, which makes it a strong candidate for protein sequence generation and protein design.

"This work can lead to the development of new proteins with specific structures and functions; such approaches will hopefully enable important medical applications in the future," says Bitbol. "The potential of the MSA Transformer as a strong candidate for protein design provides exciting new possibilities for the field of bioengineering."

The study is published in eLife, whose editors commented, "This important study proposes a method to sample novel sequences from a protein language model that could have exciting applications in protein sequence design. The claims are supported by a solid benchmarking of the designed sequences in terms of quality, novelty and diversity."

More information: Damiano Sgarbossa et al, Generative power of a protein language model trained on multiple sequence alignments, eLife (2023). DOI: 10.7554/eLife.79854

Journal information: eLife

Provided by Ecole Polytechnique Federale de Lausanne

Citation: A new tool for protein sequence generation and design (2023, March 9) retrieved 6 August 2024 from https://phys.org/news/2023-03-tool-protein-sequence-generation.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AI technology generates original proteins from scratch

9 shares

Feedback to editors

A new tool for protein sequence generation and design

Researchers reveal atomic-scale details of catalysts' active sites

Sniff test for explosives detection extends its reach

Researchers dig deeper into stability challenges of nuclear fusion—with mayonnaise

New X-ray world record: Looking inside a microchip with 4 nanometer precision

Groundwater reserves in southwestern Europe more stable overall than previously thought

Competition over millions of years preserves genetic diversity of three crustaceans

Researchers discover optimum twilight time for plant growth

Patents can help researchers understand wildlife trade trends, new study shows

New technology protects crops by testing the air for the DNA of plant diseases

Visiting an art exhibition can make you think more socially and openly—but for how long?

Relevant PhysicsForums posts

Contradictory statements made by two different professors about IQ scores

New and Interesting Publications Relevant to the Origin of Life

The Cass Report (UK)

The predictive brain (Stimulus-Specific Error Prediction Neurons)

Understanding COVID Quarantine Guidance

Innovative ideas and technologies to help folks with disabilities

AI technology generates original proteins from scratch

Proteins and natural language: Artificial intelligence enables the design of novel proteins

ProteinGAN: A generative adversarial network that generates functional protein sequences

Rapamycin in the context of Pascal's wager: Collaborating with ChatGPT to write a research perspective piece

Machine-learning model provides detailed insight on proteins

Deep learning dreams up new protein structures

New technology protects crops by testing the air for the DNA of plant diseases

Patents can help researchers understand wildlife trade trends, new study shows

Competition over millions of years preserves genetic diversity of three crustaceans

Researchers find book scorpion venom effective against hospital germs

Researchers identify over 2,000 potential toxins using machine learning

Hunt for herbicide solution in snap bean reveals master switch for stress resistance

Medical Xpress

Tech Xplore

Science X

A new tool for protein sequence generation and design

Researchers reveal atomic-scale details of catalysts' active sites

Sniff test for explosives detection extends its reach

Researchers dig deeper into stability challenges of nuclear fusion—with mayonnaise

New X-ray world record: Looking inside a microchip with 4 nanometer precision

Groundwater reserves in southwestern Europe more stable overall than previously thought

Competition over millions of years preserves genetic diversity of three crustaceans

Researchers discover optimum twilight time for plant growth

Patents can help researchers understand wildlife trade trends, new study shows

New technology protects crops by testing the air for the DNA of plant diseases

Visiting an art exhibition can make you think more socially and openly—but for how long?

Relevant PhysicsForums posts

Related Stories

AI technology generates original proteins from scratch

Proteins and natural language: Artificial intelligence enables the design of novel proteins

ProteinGAN: A generative adversarial network that generates functional protein sequences

Rapamycin in the context of Pascal's wager: Collaborating with ChatGPT to write a research perspective piece

Machine-learning model provides detailed insight on proteins

Deep learning dreams up new protein structures

Recommended for you

New technology protects crops by testing the air for the DNA of plant diseases

Patents can help researchers understand wildlife trade trends, new study shows

Competition over millions of years preserves genetic diversity of three crustaceans

Researchers find book scorpion venom effective against hospital germs

Researchers identify over 2,000 potential toxins using machine learning

Hunt for herbicide solution in snap bean reveals master switch for stress resistance

Newsletter sign up

Donate and enjoy an ad-free experience