DNA sequences need quality time too - guidelines for quality control published

Sep 05, 2012
This is the cover for the latest MycoKeys issue. Credit: Pensoft Publishers

Like all sources of information, DNA sequences come in various degrees of quality and reliability. To identify, proof, and discard compromised molecular data has thus become a critical component of the scientific endeavor - one that everyone generating sequence data is assumed to carry out before using the sequences for research purposes.

"Many researchers find sequence difficult, though", says Dr. Henrik Nilsson of the University of Gothenburg and the lead author of a new article on sequence reliability, published in the Open Access journal MycoKeys. "There just isn't any straightforward document to put in their hands to give them a flying start. As a result, scientists differ in the degree to which they are aware of the need to exercise sequence quality control and in what measures they take." Previous studies have highlighted several shortcomings of publicly available - more than ten percent of the fungal DNA may be misidentified at the species level, for example.

"A second complication", adds co-author Prof. Urmas Koljalg of the University of Tartu, "is that the software available for sequence quality management tend to be very complex and resource intensive. It borders on the unfair to expect everyone to have access to, and to master, such computer environments. Fortunately, a whole lot can be done towards quality control of DNA sequences using just manual means and a web browser. The current MycoKeys paper describes these means to help those who do not have a strong background in computer science."

The article—"Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences"—compiles principles and observations to assist the reader in the quality management of . Although focusing on , the guidelines are general and apply to most groups of organisms and genes. The guidelines target traditional DNA sequencing and are broadly applicable to datasets used in systematics, taxonomy, and ecology.

Co-author Dr. Martin Hartmann of the Swiss Federal Research Institute WSL concludes, "We hope that our guidelines will assist the readers in sharpening their datasets so that, eventually, the trend of increasing noise in the public sequence databases can be arrested. Molecular data offer so much promise that we simply cannot afford to lose accuracy to bias and artifacts."

Explore further: The origin of the language of life

More information: Nilsson RH, Tedersoo L, Abarenkov K, Ryberg M, Kristiansson E, Hartmann M, Schoch CL, Nylander JAA, Bergsten J, Porter TM, Jumpponen A, Vaishampayan P, Ovaskainen O, Hallenberg N, Bengtsson-Palme J, Eriksson KM, Larsson K-H, Larsson E, Kõljalg U (2012) Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences. MycoKeys 4: 37-62. doi: 10.3897/mycokeys.4.3606

add to favorites email to friend print save as pdf

Related Stories

Exploring the 'last frontier' of our genome

Sep 23, 2011

The human genome first appeared in print in 2001. But scientists aren’t done yet. There’s part of our DNA that geneticists have yet to assemble a sequence for: the centromeres.

On the trail of rogue genetically modified pathogens

Mar 18, 2008

Bacteria can be used to engineer genetic modifications, thereby providing scientists with a tool to combat many challenges in areas from food production to drug discovery. However, this sophisticated technology can also be ...

To get the full story you need to know the motifs

Mar 26, 2012

Genome sequencing alone provides researchers with only limited information on the organism works because it neither reveals how the system is regulated nor does it indicate the role of each specific DNA sequence or RNA transcript. ...

Human chromosome 3 is sequenced

Apr 27, 2006

The sequencing of human chromosome 3 at Baylor College represents the final stage of a multi-year project to sequence the human genome.

Standards for a New Genomic Era

Oct 21, 2009

(PhysOrg.com) -- A team of geneticists at Los Alamos National Laboratory, together with a consortium of international researchers, has recently proposed a set of standards designed to elucidate the quality of publicly available ...

Recommended for you

The origin of the language of life

Dec 19, 2014

The genetic code is the universal language of life. It describes how information is encoded in the genetic material and is the same for all organisms from simple bacteria to animals to humans. However, the ...

Quest to unravel mysteries of our gene network

Dec 18, 2014

There are roughly 27,000 genes in the human body, all but a relative few of them connected through an intricate and complex network that plays a dominant role in shaping our physiological structure and functions.

EU court clears stem cell patenting

Dec 18, 2014

A human egg used to produce stem cells but unable to develop into a viable embryo can be patented, the European Court of Justice ruled on Thursday.

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.