Trimming the Tree of Life
In the world of science, the Tree of Life depicts the evolutionary relationship among all the species on earth. The information it contains is the central organizing principle of biology and is now known as a phylogenetic tree. It originated from the only illustration in Darwin's On the Origin of Species: a branched diagram that looks remarkably like a tree.
Rokas, an assistant professor of biological sciences, has received a CAREER award from the National Science Foundation. Faculty Early Career Development awards are considered NSF's most prestigious honor for junior faculty members. Rokas will receive $688,000 in Recovery Act funding for five years to develop new statistical methods that make it easier to accurately determine the relationships between various species.
"In many cases we are trying to resolve events that transpired billions of years ago,” says Rokas. "During the intervening period, the DNA of the species involved wasn't just sitting there inertly: It was shaped by powerful forces like natural selection and genetic drift. It's amazing how much information you can get by comparing species, but it is not a magic bullet.”
Fifteen years ago, scientists did not know the entire genome of a single species. So they had to use short snippets of DNA from different species in their attempts to establish evolutionary relationships. Today, however, scientists have sequenced the genomes of nearly 4,000 species. Most of these are bacteria and other microorganisms called prokaryotes. However, 400 to 600 are eukaryotes, organisms whose cells contain complex structures enclosed within membranes, including animals, plants and fungi. This has provided scientists with a lot of genetic information to analyze, and the next generation of DNA sequencers is much faster and cheaper, so the amount of this kind of information is likely to grow exponentially.
If beetle A evolved from beetle B which evolved from beetle C, it stands to reason that beetle A's DNA would match beetle B's DNA more closely than it does that of beetle C. Unfortunately, it is not that simple.
For one thing, genes are not all created equal. Some genes may mutate too quickly to be useful for inter-species comparisons. Others may change too slowly. For example, there are "highly conserved” genes that are passed intact from species to species over millennia because any mutations are lethal.
Another factor is an organism's lifestyle. The genetic code is built from four basic "letters” called nucleotide bases. These are adenine (A), guanine (G), cytosine (C) and thymine (T). It turns out that bacteria that live in hot springs (thermophiles) have more A's in their DNA than relatives living in the Arctic.
The function of individual genes may also have an effect. Transcription factors, for example, may not be good for making these comparisons while housekeeping genes may be great.
"No one has ever looked at these factors,” Rokas says.
Rokas will be investigating these factors by analyzing the relationships among more than two dozen of the 85 known species of yeast. Not only is the typical yeast genome relatively compact — it is less than one-tenth the size of that of fruit flies and nematodes and one-hundredth the size of that of humans — but its gene functions are the most completely studied of any eukaryote. "Every gene has been deleted one by one and the consequences have been recorded,” says Rokas.
"Our goal is to use all this information to identify genes that are good predictors of phylogeny and those that are poor predictors and see if we can identify any underlying principles that we can apply to other clades, including our own twig of the tree of life, mammals,” he says.
Although ambiguity in evolutionary relationships is the greatest among the most ancient species, there are cases of recent events that still need untangling. A case in point is the relationship between the hyrax, manatee and the elephant. Despite their dramatic differences, the three species all come from a common ancestor. The problem is that scientists who have studied them can't tell whether the hyrax is more closely related to the manatee or the elephant — and vice versa. Different genes give different relationships.
"This problem is caused when speciation events take place too rapidly,” Rokas explains. When the ancestral population is polymorphic — it has several versions of some genes — then it is possible for one daughter species to contain one version of a gene and a second daughter species to have a different version. "The mammal tree is plagued with these kinds of events.”
The phylogeneticist argues that an approach based on rare genomic changes (RGCs) is the best bet for straightening out the mammal and animal trees. These are cases when a large number of bases are inserted or deleted at one time. When changes of this sort take place in an ancestor a short time before a speciation event, their presence or absence in closely related species can identify parent and daughter species.
"They offer an independent way to look at molecular data that is less likely to happen by chance,” says Rokas. "What is the chance that you will see a 10-base pair deletion in exactly the same place in the genome? The chances are very small we are less likely to be fooled by random events.”
Although phylogenetic trees are the cornerstones of evolutionary theory, Rokas is concerned that they are frequently misinterpreted by students and professionals alike. So he will be using part of his grant to train high school students and teachers on the concepts of phylogenetics and will develop an undergraduate course that will integrate research results into the curriculum.