September 3, 2014

A new tool to correct DNA sequencing errors using consensus and context

by Paul Greenfield, Microsoft

The rapid development of next-generation DNA sequencing has revolutionized biological and ecological research in the last few years. The cost of DNA sequencing has fallen dramatically, and sequencing machines are becoming a standard piece of lab equipment. Low-cost sequencing is enabling researchers to uncover the gene differences that make some people more susceptible to diseases; to explore the genetic makeup microbial communities from the human gut or the bottom of the ocean; and to rapidly identify the organism responsible for a life-threatening infection.

But while the costs of sequencing have plummeted, the accuracy of the data produced has improved only slowly: about 1 percent of the bases generated are still called incorrectly. The bioinformatics community has responded to this problem by building specialized error correction tools that use the inherent redundancy in sequence data to find and repair miscalls and other sequencing errors. Tests have shown that incorporating the best of these error-correction tools into standard bioinformatics analytical pipelines can result in much better quality genomes and more accurately called gene variants.

However, accurately correcting errors turns out to be a difficult problem, largely because of the repetitive and ambiguous nature of genomes. It is easy to correct simple substitution errors, such as when 50 sequence reads say that a given base is an A, and only the read being corrected says it's a G. Such simple errors are well handled by downstream tools such as assemblers and aligners. The challenge is making the right correction when there are multiple plausible corrections—such as when 50 reads say A, 49 say G, and the read being corrected says T—as happens whenever reads fall across the end of a repeated region within a genome. Just to make things more challenging, this correction has to be done without any knowledge of the genomes being sequenced, and the only clues about which corrections are '"right" comes from the sequence data itself.

My colleagues and I at the Commonwealth Scientific and Industrial Research Organisation (CSIRO) have just released a new error correction tool we've developed for use by the research community. We call it "Blue." Blue is a high-performance C# application that runs natively on Windows systems, and under Mono on Linux and OS X. As we reported in a paper published in Bioinformatics, test results show that Blue is significantly faster than other available tools—especially on Windows—and is also more accurate as it recursively evaluates possible alternative corrections in the context of the read being corrected.

Another uncommon feature of Blue is that it can correct all three types of possible errors (substitutions, deletions, and insertions), making it suitable for use of data produced by the Roche 454 and Life Technologies Ion Torrent systems. Blue also allows for the correction of one set of reads with a consensus derived from another set of reads, and this capability has been used to correct small numbers of long (and expensive) Roche 454 reads with a consensus derived from a large file of cheaper (but shorter) Illumina reads. This "cross-correction" method has been used very effectively to improve the quality of several reference assemblies, ranging in size from bacteria to moths and grasses.

More information: Paul Greenfield, Konsta Duesing, Alexie Papanicolaou, and Denis C. Bauer. "Blue: correcting sequencing errors using consensus and context." Bioinformatics first published online June 11, 2014 DOI: 10.1093/bioinformatics/btu368

Blue and its associated tools can be downloaded from CSIRO Bioinformatics: www.bioinformatics.csiro.au/blue/

Journal information: Bioinformatics

Provided by Microsoft

Citation: A new tool to correct DNA sequencing errors using consensus and context (2014, September 3) retrieved 26 April 2024 from https://phys.org/news/2014-09-tool-dna-sequencing-errors-consensus.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New software automates and improves phylogenomics from next-generation sequencing data

0 shares

Feedback to editors

A new tool to correct DNA sequencing errors using consensus and context

More efficient molecular motor widens potential applications

Managing meandering waterways in a changing world

New dataset sheds light on relationship of far-red sun-induced chlorophyll fluorescence to canopy-level photosynthesis

How much trust do people have in different types of scientists?

Scientists say voluntary corporate emissions targets not enough to create real climate action

Barley plants fine-tune their root microbial communities through sugary secretions

A shortcut for drug discovery: Novel method predicts on a large scale how small molecules interact with proteins

Yeast study offers possible answer to why some species are generalists and others specialists

Cichlid fishes' curiosity promotes biodiversity: How exploratory behavior aids in ecological adaptation

Climate change could become the main driver of biodiversity decline by mid-century, analysis suggests

Relevant PhysicsForums posts

The Cass Report (UK)

Major Evolution in Action

If theres a 15% probability each month of getting a woman pregnant...

Can four legged animals drink from beneath their feet?

Mold in Plastic Water Bottles? What does it eat?

Dolphins don't breathe through their esophagus

New software automates and improves phylogenomics from next-generation sequencing data

Researchers develop tool to evaluate genome sequencing method

New genetic analysis identifies ancestry, reduces false positives in pinpointing disease

An error-eliminating fix overcomes big problem in '3rd-gen' genome sequencing

Going deep to improve maize transcriptome

New approach to 'spell checking' gene sequences

Scientists replace fishmeal in aquaculture with microbial protein derived from soybean processing wastewater

Scientists regenerate neural pathways in mice with cells from rats

Artificial intelligence helps scientists engineer plants to fight climate change

Enhanced CRISPR method enables stable insertion of large genes into the DNA of higher plants

Laser technology offers breakthrough in detecting illegal ivory

New small molecule helps scientists study regeneration

Medical Xpress

Tech Xplore

Science X

A new tool to correct DNA sequencing errors using consensus and context

More efficient molecular motor widens potential applications

Managing meandering waterways in a changing world

New dataset sheds light on relationship of far-red sun-induced chlorophyll fluorescence to canopy-level photosynthesis

How much trust do people have in different types of scientists?

Scientists say voluntary corporate emissions targets not enough to create real climate action

Barley plants fine-tune their root microbial communities through sugary secretions

A shortcut for drug discovery: Novel method predicts on a large scale how small molecules interact with proteins

Yeast study offers possible answer to why some species are generalists and others specialists

Cichlid fishes' curiosity promotes biodiversity: How exploratory behavior aids in ecological adaptation

Climate change could become the main driver of biodiversity decline by mid-century, analysis suggests

Relevant PhysicsForums posts

Related Stories

New software automates and improves phylogenomics from next-generation sequencing data

Researchers develop tool to evaluate genome sequencing method

New genetic analysis identifies ancestry, reduces false positives in pinpointing disease

An error-eliminating fix overcomes big problem in '3rd-gen' genome sequencing

Going deep to improve maize transcriptome

New approach to 'spell checking' gene sequences

Recommended for you

Scientists replace fishmeal in aquaculture with microbial protein derived from soybean processing wastewater

Scientists regenerate neural pathways in mice with cells from rats

Artificial intelligence helps scientists engineer plants to fight climate change

Enhanced CRISPR method enables stable insertion of large genes into the DNA of higher plants

Laser technology offers breakthrough in detecting illegal ivory

New small molecule helps scientists study regeneration

Newsletter sign up

Donate and enjoy an ad-free experience