Scientists find identical DNA codes in different plant species

Apr 09, 2012

Analyzing massive amounts of data officially became a national priority recently when the White House Office of Science and Technology Policy announced the Big Data Research and Development Initiative. A multi-disciplinary team of University of Missouri researchers rose to the big data challenge when they solved a major biological question by using a groundbreaking computer algorithm to find identical DNA sequences in different plant and animal species.

"Our algorithm found identical sequences of DNA located at completely different places on multiple plant genomes," said Dmitry Korkin, lead author and assistant professor of computer science. "No one has ever been able to do that before on such a scale."

"Our discovery helps solve some of the mysteries of ," said Gavin Conant, co-author and assistant professor of animal sciences. "Basic research on the provides raw materials and improves techniques for creating medicines and crops."

Previous studies found long strings of identical code in different species of animals' DNA. But before this new MU research, which was published in the , computer programs had never been powerful enough to find identical sequences in plant DNAs, because the identical sections weren't found at the same points.

The genomes of six animals (dog, chicken, human, mouse, macaque and rat) were compared to each other. Likewise, six (Arabidopsis, soybean, rice, cottonwood, sorghum and grape) were compared to each other. Comparing all the genetic sequences took 4 weeks with 48 computer processors doing 1 million searches per hour for a grand total of approximately 32 billion searches.

Although the scientists found identical sequences between plant species, just as they did between animals, they suggested the sequences evolved differently.

"You would expect to see convergent evolution, but we don't," Conant said. "Plants and animals are both complex multi-cellular organisms that have to deal with many of the same environmental conditions, like taking in air and water and dealing with weather variations, but their genomes code for solutions to these challenges in different ways."

The MU team's research laid the groundwork for future studies into the reasons plants and animals developed different genetic mechanisms and how they function. Their basic research created a foundation for discoveries that may improve human life. Besides advancing genetic science's potential to fight disease, the code-analyzing computer program itself could help in the development of new medicines.

"The same algorithm can be used to find identical sequential patterns in an organism's entire set of proteins," said Korkin. "That could potentially lead to finding new targets for existing drugs or studying these drugs' side effects."

The PNAS paper, titled "Long Identical Multispecies Elements in Plant and Animal Genomes," involved collaboration between the Universities of Missouri, California and Arizona. The was developed by Jeff Reneker, a senior research informatician at MU's Center for Computational Biology and Medicine, during his doctoral study at the MU Computer Science Department under the supervision of Chi-Ren Shyu, Director of the MU Informatics Institute.

Explore further: Biotech firm's GM mosquitoes to fight dengue in Brazil

Related Stories

Computers aid in cracking deception in plants

Jun 25, 2009

If the growing presence of computer 'geeks' on television crime shows is any indicator, computers are increasingly becoming essential tools for detecting and combating skullduggery. However, television detectives ...

Genome comparison tools found to be susceptible to slip-ups

May 26, 2010

(PhysOrg.com) -- You might call it comparing apples and oranges, but lining up different species' genomes is common practice in evolutionary research. Scientists can see how species have evolved, pinpoint which sections of ...

DNA 'barcode' identified for plants

Feb 05, 2008

A 'barcode' gene that can be used to distinguish between the majority of plant species on Earth has been identified by scientists who publish their findings in the Proceedings of the National Academy of Sciences journal today. ...

Little plant has big stories to tell

Aug 29, 2011

(PhysOrg.com) -- Understanding which genes control traits, like when a plant will flower, what soil type is best or its ability to persist in drought conditions provides insight into the ability of plants ...

Recommended for you

Biotech firm's GM mosquitoes to fight dengue in Brazil

14 hours ago

It's a dry winter day in southeast Brazil, but a steamy tropical summer reigns inside the labs at Oxitec, where workers are making an unusual product: genetically modified mosquitoes to fight dengue fever.

User comments : 7

Adjust slider to filter visible comments by rank

Display comments: newest first

wealthychef
not rated yet Apr 09, 2012
I can hardly wait until computers are 10 times more powerful than they are now. :-)
Lurker2358
not rated yet Apr 09, 2012
You would expect to see convergent evolution, but we don't," Conant said


Your expectations are flawed.

Organisms in drastically different niches (fruit tree, grass, cottonwood, rice, and a legume) have no good reason to develop similar macroscopic traits, even if they share some similar microscopic genes.

Hasn't it already been shown that genes don't always do the same thing in different organisms? That changes in one gene can affect how another gene is expressed?

One should not confuse primary molecular structure with secondary, tertiary, or quarternary, etc.

string
stroke
strike
stroll
strip
stray
straight
pastrami

All have "str" and yet nothing in common. Heck some are verbs, some are adjectives, some are nouns, and one can even be an adverb.

So obviously an identical piece of "information" can have a different function based on surrounding "information".

===

Wow, he wrote a search program. Anyone in comp sci 101 could do that if they had the data input...
Graeme
not rated yet Apr 09, 2012
And it sounds like an inefficient algroithm was used. They could have indexed all the possible (up to some limit) sequences of DNA in an index, then sequenced throught the index to find how many entries there are for each sequence. You would expect that random sequences of say 10 or less would be repeated, but longer ones matching would be extremely rare by chance.

The other thing to look for is sequences that are not there. These sequences wouold likely code for things that would kill the organism, perhaps binding things in the wrong spot to the DNA.
Lurker2358
not rated yet Apr 10, 2012
And it sounds like an inefficient algroithm was used. They could have indexed all the possible (up to some limit) sequences of DNA in an index, then sequenced throught the index to find how many entries there are for each sequence. You would expect that random sequences of say 10 or less would be repeated, but longer ones matching would be extremely rare by chance.

The other thing to look for is sequences that are not there. These sequences wouold likely code for things that would kill the organism, perhaps binding things in the wrong spot to the DNA.


Yes, very good catch.

Genes that are not there could represent lost traits, or as you are trying to say, they may cause "auto-immune" diseases and therefore are de-selected, or else just never existed.

Very good.
DaFranker
1 / 5 (1) Apr 10, 2012
Other possible thing to try (still basic compsci101, maybe with an intro to crypto class in there):

Rather than index and brute-search through all the combinations, use mnemonic codes (see assembly languages for examples) for the combinations and replace them in a "searchable" DNA copy, leaving uncommon strands and codes as "raw data". Search through this. If two pieces of raw data match, they'll be obvious. The mnemonics will also be much easier to find and match (computationally-speaking) than a brute-force step-by-step comparison with all other DNA sequences in every genome.

Seriously, didn't these people ever think of this stuff while being bored to death by that lame math teacher in their freshman year? Oh wait...
tadchem
not rated yet Apr 10, 2012
I can hardly wait until computers are 10 times more powerful than they are now. :-)

According to Moore's Law, that will be in about August of 2018. :-)
Strgzr2009
not rated yet Apr 11, 2012
Gosh! Lurker2358, did you even bother to read the paper? I admit, I only skimmed it. But it was enough to realize, when reading your comments, that you have actually no idea about what is going on there.

"One should not confuse primary molecular structure with secondary, tertiary, or quarternary, etc."

Why in the world are you talking about protein structure hierarchy here, if the finding are in _DNA_ sequences? Have you ever heard of noncoding, "junk" DNA, which recently became very important? Did you know that the original ultras reported in Science had a lot of them?

"Wow, he wrote a search program. Anyone in comp sci 101 could do that if they had the data input..."

Have YOU ever written a search program that solves a common substring problem? The algorithm they used, from what I understand, is a version of hash-based search, which along with the compressed suffix tree, are among the current top approaches for this problem. I personally, would chose the suffix tree one, though