Scientists encounter holes in tree of life, push for better data storage

Sep 03, 2013

When it comes to public access, the tree of life has holes. A new study co-authored by University of Florida researchers shows about 70 percent of published genetic sequence comparisons are not publicly accessible, leaving researchers worldwide unable to get to critical data they may need to tackle a host a problems ranging from climate change to disease control.

Scientists are using the to construct the largest open-access tree of life as part of the National Science Foundation's $5.6-million Assembling, Visualizing and Analyzing the Tree of Life project. Understanding organismal relationships is increasingly valuable for tracking the origin and spread of , creating agricultural and , studying , controlling and establishing plans for conservation and ecosystem restoration.

The study appearing today in PLoS Biology describes a significant challenge for the project, which is expected to produce an initial draft tree by the end of the year. It highlights the need for developing more effective methods for storing data for long-term use and urges journals to adopt more stringent data-sharing policies.

"I think what we need is a major change in our mindset about just how important it is to deposit your data – this has to be a standard part of what we do," said co-author Doug Soltis, a distinguished professor at the Florida Museum of Natural History on the UF campus and UF's biology department. "Because if it's not there, it's lost forever. These are really, really important for long-term use, as we're seeing now in our efforts to build a tree."

Estimates of the amount of were based on 7,539 peer-reviewed studies about animals, fungi, , bacteria and various . Soltis said the missing genetic data has required project collaborators to contact hundreds of researchers to request information, or attempt to reproduce the sequence alignments and analyses, which is extremely labor intensive.

"There are ambiguities with the alignments, you have to make certain judgment calls, and so an alignment that I do is not going to be the same as an alignment that somebody else does," said lead author Bryan Drew, a postdoctoral researcher in UF's biology department. "It's hard to assess a publication's validity in a lot of cases if you don't have access to the alignments. To me, that's the biggest problem with all of this."

Challenges include complicated mechanisms for uploading data and inconsistencies between journals – some require or strongly recommend data be stored in an online database and others do not, Drew said. The most widely used, publicly accessible databases include GenBank, TreeBASE and Dryad. Most journals require DNA sequences be deposited in GenBank, but comparatively few require the sequence alignments to be publicly archived. When study co-authors emailed researchers to obtain missing information, a majority did not respond, and the co-authors were rarely successful in retrieving the data.

"A lot of the authors I contacted said their data was in TreeBASE, but they were unaware of the next step needed after acceptance by the journal – the researchers didn't know they had to go back into TreeBASE and actually make the data available to the public," Drew said.

Elizabeth Kellogg, a professor in the department of biology at the University of Missouri-St. Louis who was not involved with the study, said she is not surprised about the large amount of missing information.

"They're absolutely right that when people are publishing papers, you want to document your results as much as you can," Kellogg said. "But many journals aren't requiring that extra step, so some researchers are only submitting the minimum to have their studies published. "There are databases for archiving, but some of their interfaces are somewhat cumbersome, and if you haven't previously done this, it can appear to be a daunting task."

Explore further: Crowd-sourcing genetic data could help unravel the causes of disease

More information: www.opentreeoflife.org/

Related Stories

Researcher helps construct Lepidoptera family tree of life

Apr 15, 2013

A new international study co-authored by a University of Florida researcher describes one of the most comprehensive analyses of Lepidoptera evolutionary relationships to date, and could have broad implications in the fields ...

Recommended for you

Building better soybeans for a hot, dry, hungry world

10 hours ago

(Phys.org) —A new study shows that soybean plants can be redesigned to increase crop yields while requiring less water and helping to offset greenhouse gas warming. The study is the first to demonstrate ...

Gene removal could have implications beyond plant science

10 hours ago

(Phys.org) —For thousands of years humans have been tinkering with plant genetics, even when they didn't realize that is what they were doing, in an effort to make stronger, healthier crops that endured climates better, ...

Chrono, the last piece of the circadian clock puzzle?

Apr 15, 2014

All organisms, from mammals to fungi, have daily cycles controlled by a tightly regulated internal clock, called the circadian clock. The whole-body circadian clock, influenced by the exposure to light, dictates the wake-sleep ...

User comments : 0

More news stories

Chimpanzees prefer firm, stable beds

Chimpanzees may select a certain type of wood, Ugandan Ironwood, over other options for its firm, stable, and resilient properties to make their bed, according to a study published April 16, 2014 in the open-access ...

For cells, internal stress leads to unique shapes

From far away, the top of a leaf looks like one seamless surface; however, up close, that smooth exterior is actually made up of a patchwork of cells in a variety of shapes and sizes. Interested in how these ...

Down's chromosome cause genome-wide disruption

The extra copy of Chromosome 21 that causes Down's syndrome throws a spanner into the workings of all the other chromosomes as well, said a study published Wednesday that surprised its authors.

IBM posts lower 1Q earnings amid hardware slump

IBM's first-quarter earnings fell and revenue came in below Wall Street's expectations amid an ongoing decline in its hardware business, one that was exasperated by weaker demand in China and emerging markets.