Some aspects of evolution are like the real estate business in that it’s all about location, location, location! Researchers with the U.S. Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) and the DOE Joint Genome Institute (DOE JGI) have shown that when it comes to comparing evolutionarily conserved DNA sequences that regulate the expression of genes, more closely related species are best.
“While one can compare distant vertebrates to humans and identify sequences that are highly evolutionarily conserved, such elements are few and far between,” said Len Pennacchio, a geneticist with Berkeley Lab’s Genomics Division and the head of JGI’s genome analysis program. “In contrast, by comparing species that are more closely related, such as other mammals, we can find much more DNA sequence alignment.”
Pennacchio and Shyam Prabhakar are the principal authors of a paper that appears in the June issue of the publication Genome Research, which presents the results of a comparative genomics study that quantified the advantages of staying close to the evolutionary home. Other co-authors of the paper were Francis Poulin, Malak Shoukry, Veena Afzal, Edward Rubin and Olivier Couronne.
When Mother Nature develops something that works, she tends to stick with it. Hence sequences of DNA that serve as protein-coding genes or enhancers that regulate the expression of those genes have been conserved through thousands of years of evolution. Gene hunters have capitalized on this tendency by comparing the DNA of different species to identify genes and determine their functions. For example, the genome of the Fugu fish contains essentially the same genes as the human genome but carries them in approximately 400 million bases as compared to the three billion bases that make up human DNA.
Cross-species DNA sequence comparisons have also been used to identify the enhancers that regulate genes – meaning they control whether a gene is switched on or off — but until now, the relative merits of comparing species as diverse as humans and fish were not known.
“To address this problem, we identified evolutionarily conserved non-coding regions in primate, mammalian and more distant species using a uniform approach that facilitates an unbiased assessment of the impact of evolutionary distance on predictive power,” said Pennacchio. “We benchmarked computational predictions against previously identified regulatory elements at diverse genomic loci, and also tested numerous extremely conserved sequences in humans and rodents for enhancer activity.”
The computational algorithm, which is used to provide a uniform evaluation of the benefits and limitations of DNA sequence comparisons between close versus distant species, was developed by Prabhakar. He dubbed this program “Gumby,” after a mathematical concept called the Gumbel distribution. Prabhakar’s Gumby program has now been incorporated into VISTA, the comprehensive suite of programs and databases for comparative analysis of genomic sequences that was developed and is maintained at Berkeley Lab.
Using the Gumby program, Prabhakar, Pennacchio and their colleagues were able to identify human regulatory DNA sequences with a sensitivity that ranged from 53 to 80 percent, and a true-positive rate that ran as high as 67 percent based on comparisons with primates and other eutherian (placental) mammals. By contrast, comparisons with more distant species, including marsupial, avian, amphibian and fish, failed to identify most of the empirically defined functional non-coding DNA sequences.
Said Prabhakar, “Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. The intuitive relationship we derived between ancient and recent non-coding sequence conservation from whole-genome comparative analysis explains most of the observations from empirical benchmarking.”
Source: UC Berkeley
Explore further: Novel gene variants found in a difficult childhood immune disorder