(PhysOrg.com) -- Scientists at Penn State have developed a new computational method that they say will help them to understand how life began on Earth. The team's method has the potential to trace the evolutionary histories of proteins all the way back to either cells or viruses, thus settling the debate once and for all over which of these life forms came first.
"We have just begun to tap the potential power of this method," said Randen Patterson, a Penn State assistant professor of biology and one of the project's leaders. "We believe, if it is possible at all, that it is within our grasp to determine whether viruses evolved from cells or vice-versa."
The new computational method will be described in a paper to be published in a future issue of the journal Proceedings of the National Academy of Sciences. The journal also will post the paper on the early on-line section of its Web site sometime during the week ending 6 September 2008.
The team is focusing on an ancient group of proteins, called retroelements, which comprise approximately 50 percent of the human genome by weight and are a crucial component in a number of diseases, including AIDS. "Retroelements are an ancient and highly diverse class of proteins; therefore, they provide a rigorous benchmark for us to test our approach. We are happy with the results we derived, even though our method is in an early stage," said Patterson. The team plans to make the algorithms that they used in their method available to others as open-source software that is freely available on the Web.
Scientists map out the evolutionary histories of organisms by comparing their genetic and/or protein sequences. Those organisms that are closely related and share a recent common ancestor have greater degrees of similarity among their sequences. In their paper, the researchers describe how they used 11 groups of the retroelement proteins -- ranging from bacteria to human HIV -- to trace the evolutionary histories of retroelements. Their method uses a computer algorithm to generate evolutionary profiles -- also called phylogenetic profiles -- that are compared all-against-all.
For example, given four sequences, the new method compares profile A to profiles B, C, and D; it compares profile B to profiles C and D; and so on, for a total of six comparisons. The method then selects the regions of the profiles that match and creates a tree-like diagram, called a phylogenetic tree, based on the retroelements' similarities to one another. The tree provides evolutionary distance estimates and, hence, phylogenetic relationships among retroelements. Patterson said that the results from this study help to clarify many existing theories on retroelement evolution.
The conventional method for estimating evolutionary relationships, called multiple sequence alignment, also produces evolutionary trees, but can be insensitive to relationships among the most distantly related proteins, in large part because it makes only one simultaneous comparison across all of the genetic/protein sequences. To obtain more detailed information about possible relationships among the sequences, a human expert who can manually search for such relationships is needed. But Patterson said that relying on humans to do the work is not ideal.
"Although the human mind is the most powerful tool for pattern recognition, human-based measurements often are hard to reproduce," he said. "For example, if you do something and I do something, we're going to do it differently. It's better to have a standardized method for gauging relationships among ancient proteins, and that's exactly what we've created." According to Damian van Rossum, Penn State research associate/assistant professor of biology and another leader on the project, the new method can be used in conjunction with the conventional method to get a clearer picture of the evolutionary histories of proteins. "The more independent measures you have, the better view of the world you can get," he said.
In addition to searching for the origins of life, the team also is using its method to simultaneously gather data on the shapes of proteins, their functions in the body, and their evolutionary histories. In another paper, which was published in 2008 in the online journal Physics Archives, members of the team previously had demonstrated that their new method can simultaneously measure all three of these characteristics. "Previously, people have shown that profiling methods can resolve functional and structural differences and similarities between proteins, but to date no one has shown that you can measure evolutionary distances," said van Rossum. "Not only can our method measure evolutionary distances, but it also can measure functional and structural characteristics at the same time."
Patterson said that there are about 30,000 profiles in an online scientific repository that they can use to generate their phylogenetic profiles. He expects that the team's method will become even more powerful as additional sequences are added to this protein bank. In fact, the method already has become more refined in the short time since the team submitted its manuscript to the journal. "We already are producing evolutionary trees with much more detail than what we show in the paper," he said. "In fact, we are surprised at our progress so far in our goal of tracing these histories all the way back to the beginning of life."
Provided by Penn State
Explore further: Batting practice in the genome