Evolution reveals missing link between DNA and protein shape

Dec 07, 2011
Studying related proteins, researchers identified pairs of amino acid residues (left) that seemed to change in lockstep in the evolutionary record. These co-varying pairs indicated points on protein (middle) likely to be in contact after folding, giving researchers enough clues to create a computational model of the protein's three-dimensional structure (right). Credit: Terry Helms/Memorial Sloan-Kettering Cancer Center

Fifty years after the pioneering discovery that a protein's three-dimensional structure is determined solely by the sequence of its amino acids, an international team of researchers has taken a major step toward fulfilling the tantalizing promise: predicting the structure of a protein from its DNA alone.

The team at Harvard Medical School (HMS), Politecnico di Torino / Foundation Torino (HuGeF) and Memorial Sloan-Kettering Cancer Center in New York (MSKCC) has reported substantial progress toward solving a classical problem of molecular biology: the computational folding problem.

The results will be published Dec. 7 in the journal .

In molecular biology and biomedical engineering, knowing the shape of is key to understanding how they perform the work of life, the mechanisms of disease and drug design. Normally the shape of protein molecules is determined by expensive and complicated experiments, and for most proteins these experiments have not yet been done. Computing the shape from genetic information alone is possible in principle. But despite limited success for some smaller proteins, this challenge has remained essentially unsolved. The difficulty lies in the enormous complexity of the search space, an astronomically large number of possible shapes. Without any shortcuts, it would take a supercomputer many years to explore all possible shapes of even a small protein.

"Experimental structure determination has a hard time keeping up with the explosion in genetic sequence information," said Debora Marks, a mathematical biologist in the Department of at HMS, who worked closely with Lucy Colwell, a mathematician, who recently moved from Harvard to Cambridge University. They collaborated with physicists Riccardo Zecchina and Andrea Pagnani in Torino in a team effort initiated by Marks and computational biologist Chris Sander of the Computational Biology Program at MSKCC, who had earlier attempted a similar solution to the problem, when substantially fewer sequences were available.

"Collaboration was key," Sander said. "As with many important discoveries in science, no one could provide the answer in isolation."

The international team tested a bold premise: That evolution can provide a roadmap to how the protein folds. Their approach combined three key elements: evolutionary information accumulated for many millions of years; data from high-throughput genetic sequencing; and a key method from statistical physics, co-developed in the Torino group with Martin Weigt, who recently moved to the University of Paris.

Using the accumulated evolutionary information in the form of the sequences of thousands of proteins, grouped in protein families that are likely to have similar shapes, the team found a way to solve the problem: an algorithm to infer which parts of a protein interact to determine its shape. They used a principle from statistical physics called "maximum entropy" in a method that extracts information about microscopic interactions from measurement of system properties.

"The protein folding problem has been a huge combinatorial challenge for decades," said Zecchina, "but our statistical methods turned out to be surprisingly effective in extracting essential information from the evolutionary record."

With these internal protein interactions in hand, widely used molecular simulation software developed by Axel Brunger at Stanford University generated the atomic details of the protein shape. The team was for the first time able to compute remarkably accurate shapes from sequence information alone for a test set of 15 diverse proteins, with no protein size limit in sight, with unprecedented accuracy.

"Alone, none of the individual pieces are completely novel, but apparently nobody had put all of them together to predict 3D protein structure," Colwell said.

To test their method, the researchers initially focused on the Ras family of signaling proteins, which has been extensively studied because of its known link to cancer. The structure of several Ras-type proteins has already been solved experimentally, but the proteins in the family are larger--with about 160 amino acid residues--than any proteins modeled computationally from sequence alone.

"When we saw the first computationally folded Ras protein, we nearly went through the roof," Marks said. To the researchers' amazement, their model folded within about 3.5 angstroms of the known structure with all the structural elements in the right place. And there is no reason, the authors say, that the method couldn't work with even larger proteins.

The researchers caution that there are other limits, however: Experimental structures, when available, generally are more accurate in atomic detail. And, the method works only when researchers have genetic data for large protein families. But advances in DNA sequencing have yielded a torrent of such data that is forecast to continue growing exponentially in the foreseeable future.

The next step, the researchers say, is to predict the structures of unsolved proteins currently being investigated by structural biologists, before exploring the large uncharted territory of currently unknown protein structures.

"Synergy between computational prediction and experimental determination of structures is likely to yield increasingly valuable insight into the large universe of protein shapes that crucially determine their function and evolutionary dynamics," Sander said.

Explore further: Automating the selection process for a genome assembler

More information: "Protein 3D structure computed from evolutionary sequence variation," Marks et al. PLoS ONE, December 6, 2011

Related Stories

Invention unravels mystery of protein folding

Sep 14, 2011

An Oak Ridge National Laboratory invention able to quickly predict three-dimensional structure of protein could have huge implications for drug discovery and human health.

Protein folding made easy

Jun 07, 2011

Protein folding has nothing to do with laundry. It is, in fact, one of the central questions in biochemistry. Protein folding is the continual and universal process whereby the long, coiled strings of amino ...

Researchers use new approach to predict protein function

Jul 11, 2007

In a paper published online this month in the journal Nature Chemical Biology, researchers report that they have developed a way to determine the function of some of the hundreds of thousands of proteins for wh ...

Similarities cause protein misfolding

May 31, 2011

A large number of illnesses stem from misfolded proteins, molecules composed of amino acids. Researchers at the University of Zurich have now studied protein misfolding using a special spectroscopic technique. ...

Recommended for you

Studies steadily advance cellulosic ethanol prospects

Oct 20, 2014

At the Agricultural Research Service's Bioenergy Research Unit in Peoria, Illinois, field work and bench investigations keep ARS scientists on the scientific front lines of converting biomass into cellulosic ...

User comments : 12

Adjust slider to filter visible comments by rank

Display comments: newest first

210
2.6 / 5 (8) Dec 07, 2011
HOLY COW! In the last six months, it is as if someone stepped on the 'discovery' gas pedal! At once the fields of computation and mathematics appeared to be leaving all other sciences in their wake...now this. And look at how multidisciplinary the research teams have had to become in the last little while. Protein folding, is one of those fields where there was never enough compute cycles available, even if we took all the desktops computers in North America as a pooled resource. Now, we can see the future: We need to get supercomputing to hi petabyte or low exabyte ranges to master this 'tool.' We will need 400-500 exabyte storage per runtime.And its own unique database with a search/discovery engine that can bench-press the SUN! Graphics that can churn out folding action in time intervals as small as the Planck constant and automate the process for several Trillion protein and gene components...TaaDaa!
Seems easy enough to me..? Wha?

word-to-ya-muthas
nanotech_republika_pl
5 / 5 (2) Dec 07, 2011
One of the top 10 science achievement of the year!
nanotech_republika_pl
3 / 5 (2) Dec 07, 2011
How come the owner of this site allows ranjou to post these ads all the time? Could be that ranjou is the owner of the Physorg.com? Why not just have a picture with whatever he sells on the side, like any other ad? When I see his post I consider it to be a scam so I don't read it. Why would that type of ad work, anybody?
210
1 / 5 (2) Dec 07, 2011
How come the owner of this site allows ranjou to post these ads all the time? Could be that ranjou is the owner of the Physorg.com? Why not just have a picture with whatever he sells on the side, like any other ad? When I see his post I consider it to be a scam so I don't read it. Why would that type of ad work, anybody?

I believe "ranjou" is automated but the filtering of such spam may not be - that's one possibility. Or, you keep the name but change the ranjou password, or the opposite, and the basic filtering activity for the site, if it had programmed filtering, fails big time!
Try not to spend any cycles on it...he loves for us to get mad and talk about him...it's more 'free press.' Hey, I'm going to the store......you want anything from the store???

word-
Sean_W
1 / 5 (1) Dec 07, 2011
Is it possible (or will it soon be possible) to decide on a shape you want--maybe as an antigen or an artificial enzyme--and be able to compute a genetic sequence which would code for the peptide chain which would fold into that shape?
aroc91
not rated yet Dec 07, 2011
Is it possible (or will it soon be possible) to decide on a shape you want--maybe as an antigen or an artificial enzyme--and be able to compute a genetic sequence which would code for the peptide chain which would fold into that shape?


Not currently, but eventually. It takes a lot of computing power to simulate folding even on the nanosecond scale at this point, so completely figuring out the process enough to be able to predict and reverse-engineer it is quite a ways off.
kevinrtrs
1 / 5 (7) Dec 08, 2011
Computing the shape from genetic information alone is possible in principle. But despite limited success for some smaller proteins, this challenge has remained essentially unsolved. The difficulty lies in the enormous complexity of the search space, an astronomically large number of possible shapes. Without any shortcuts, it would take a supercomputer many years to explore all possible shapes of even a small protein.

So here you have a real life situation where it shows just how impossible step-wise evolution really is. How is a random process going to figure out the shape for a required protein by trial and error within the few billion years that is available to it. On top of that remember that each "try" uses up energy. There's a limit to the amount of available energy as well as to the fact that the "try-out" has to actually work to be useful.
If you just blissfully ignore this problem and still adhere to evolutionary philosophy you must surely believe in the fairy godmother.
MarkyMark
5 / 5 (1) Dec 08, 2011
As apposed to the 'Fairy Godfather' called Jesus hmm Bible Humper?

At least Evolution is supported by verifiable Facts which is more than can be said by your dribble Kev!
aroc91
5 / 5 (2) Dec 08, 2011
So here you have a real life situation where it shows just how impossible step-wise evolution really is. How is a random process going to figure out the shape for a required protein by trial and error within the few billion years that is available to it. On top of that remember that each "try" uses up energy. There's a limit to the amount of available energy as well as to the fact that the "try-out" has to actually work to be useful.
If you just blissfully ignore this problem and still adhere to evolutionary philosophy you must surely believe in the fairy godmother


If something works, it's conserved. It's that simple.

However, I'm still convinced you're a troll though. You've been dropping these blatantly obvious "how can you believe in something without any evidence?" comments for some time now that, at least to me, seem to be very subtle parodies of religious fundamentalists. Either that, or you really are a retard.
FrankHerbert
Dec 08, 2011
This comment has been removed by a moderator.
Jaeherys
not rated yet Dec 08, 2011
This method in combination with mass human input from the FoldIt project could REALLY speed up determination of the 3°/4° structure of proteins. Ahhhhh it's a good time to be going into be going into biochem.
rawa1
1 / 5 (1) Dec 08, 2011
Protein folding is very dynamic stuff, as the protein molecules change their shape depending of pH, presence of ions and another molecules in the solution. It can help for protein identification in vitro during gene expression experiments, but it's of low relevance to real protein shape in vivo conditions.
CHollman82
3 / 5 (4) Dec 08, 2011
So here you have a real life situation where it shows just how impossible step-wise evolution really is.


This is not what it shows at all...

How is a random process


Evolution is not a random process, it has a random and a non-random component. The non-random component is significant, you can't just ignore it to push your nonsense.