Program models more detailed evolutionary networks from genetic data
The tree has been an effective model of evolution for 150 years, but a Rice University computer scientist believes it's far too simple to illustrate the breadth of current knowledge.
Rice researcher Luay Nakhleh and his group have developed PhyloNet, an open-source software package that accounts for horizontal as well as vertical inheritance of genetic material among genomes. His "maximum likelihood" method, detailed this month in the Proceedings of the National Academy of Sciences, allows PhyloNet to infer network models that better describe the evolution of certain groups of species than do tree models.
"Inferring" in this case means analyzing genes to determine their evolutionary history with the highest probability - the maximum likelihood - of connections between species. Nakhleh and Rice colleague Christopher Jermaine recently won a $1.1 million National Science Foundation grant to analyze evolutionary patterns using Bayesian inference, a statistics-based technique to estimate probabilities based on a data set.
To build networks that account for all of the genetic connections between species, the software infers the probability of variations that phylogenetic trees can't illustrate, such as horizontal gene transfers. These transfers circumvent simple parent-to-offspring evolution and allow genetic variations to move from one species to another by means other than reproduction.
Biologists want to know when and how these transfers happened, but tree structures conceal such information. "When horizontal transfer occurs, as with the hybridization of two species, the tree model becomes inadequate to describe the evolutionary history, and networks that incorporate horizontal gene transfer become the more appropriate model," Nakhleh said.
Nakhleh's Java-based software accounts for incomplete lineage sorting, in which clues to gene evolution that don't match the established lineage of species appear in the genetic record.
"We are the first group to develop a general model that will allow biologists to estimate hybridization while accounting for all these complexities in evolution," Nakhleh said.
Most existing programs for phylogenetics (the study of evolutionary relationships) ignore such complexities. "They end up overestimating the amount of hybridization," Nakhleh said. "They start seeing lots of complexities in the data and say, 'Oh, it's complex here; it must be hybridization,' and end up inferring too much. Our method acknowledges that part of the complexity has nothing to do with hybridization; it has to do with other random processes that happened during evolution."
The Rice researchers used two data sets to test the new program. One, a computer-generated set of data that mimics a realistic model of evolution, allowed them to evaluate the accuracy of the program. The second involved multiple genomes of mice found across Europe and Asia. "There have been stories about mice hybridizing," Nakhleh said. "Now that we have the first method to allow for systematic analysis, we ran it on a very large amount of data from five mouse samples and we detected hybridization" - most notably in the presence of a genetic signal from a mouse in Kazakhstan that found its way to mice in France and Germany, he said.
Nakhleh hopes evolutionary biologists will use PhyloNet to take a fresh look at the massive amount of genomic data collected over the past few decades. "The exciting thing for me about this is that biologists can now systematically go through lots of data they have generated and check to see if there has been hybridization."