September 3, 2014

A new tool to correct DNA sequencing errors using consensus and context

by Paul Greenfield, Microsoft

The rapid development of next-generation DNA sequencing has revolutionized biological and ecological research in the last few years. The cost of DNA sequencing has fallen dramatically, and sequencing machines are becoming a standard piece of lab equipment. Low-cost sequencing is enabling researchers to uncover the gene differences that make some people more susceptible to diseases; to explore the genetic makeup microbial communities from the human gut or the bottom of the ocean; and to rapidly identify the organism responsible for a life-threatening infection.

But while the costs of sequencing have plummeted, the accuracy of the data produced has improved only slowly: about 1 percent of the bases generated are still called incorrectly. The bioinformatics community has responded to this problem by building specialized error correction tools that use the inherent redundancy in sequence data to find and repair miscalls and other sequencing errors. Tests have shown that incorporating the best of these error-correction tools into standard bioinformatics analytical pipelines can result in much better quality genomes and more accurately called gene variants.

However, accurately correcting errors turns out to be a difficult problem, largely because of the repetitive and ambiguous nature of genomes. It is easy to correct simple substitution errors, such as when 50 sequence reads say that a given base is an A, and only the read being corrected says it's a G. Such simple errors are well handled by downstream tools such as assemblers and aligners. The challenge is making the right correction when there are multiple plausible corrections—such as when 50 reads say A, 49 say G, and the read being corrected says T—as happens whenever reads fall across the end of a repeated region within a genome. Just to make things more challenging, this correction has to be done without any knowledge of the genomes being sequenced, and the only clues about which corrections are '"right" comes from the sequence data itself.

My colleagues and I at the Commonwealth Scientific and Industrial Research Organisation (CSIRO) have just released a new error correction tool we've developed for use by the research community. We call it "Blue." Blue is a high-performance C# application that runs natively on Windows systems, and under Mono on Linux and OS X. As we reported in a paper published in Bioinformatics, test results show that Blue is significantly faster than other available tools—especially on Windows—and is also more accurate as it recursively evaluates possible alternative corrections in the context of the read being corrected.

Another uncommon feature of Blue is that it can correct all three types of possible errors (substitutions, deletions, and insertions), making it suitable for use of data produced by the Roche 454 and Life Technologies Ion Torrent systems. Blue also allows for the correction of one set of reads with a consensus derived from another set of reads, and this capability has been used to correct small numbers of long (and expensive) Roche 454 reads with a consensus derived from a large file of cheaper (but shorter) Illumina reads. This "cross-correction" method has been used very effectively to improve the quality of several reference assemblies, ranging in size from bacteria to moths and grasses.

More information: Paul Greenfield, Konsta Duesing, Alexie Papanicolaou, and Denis C. Bauer. "Blue: correcting sequencing errors using consensus and context." Bioinformatics first published online June 11, 2014 DOI: 10.1093/bioinformatics/btu368

Blue and its associated tools can be downloaded from CSIRO Bioinformatics: www.bioinformatics.csiro.au/blue/

Journal information: Bioinformatics

Provided by Microsoft

Citation: A new tool to correct DNA sequencing errors using consensus and context (2014, September 3) retrieved 5 July 2024 from https://phys.org/news/2014-09-tool-dna-sequencing-errors-consensus.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New software automates and improves phylogenomics from next-generation sequencing data

0 shares

Feedback to editors

A new tool to correct DNA sequencing errors using consensus and context

Starlings' migratory behavior found to be inherited, not learned

Webb captures a staggering quasar-galaxy merger in the remote universe

Repurposed technology used to probe new regions of Mars' atmosphere

Evidence shows ancient Saudi Arabia had complex and thriving communities, not struggling people in a barren land

Research finds humpbacks were happier during pandemic pause

Webb admires bejeweled ring of the lensed quasar RX J1131-1231

Researchers demonstrate economical process for the synthesis and purification of ionic liquids

New probe reveals water-ice microstructures

Researchers pioneer new methods in ultrafast science for sharper molecular movies

How listening for the right buzz keeps mosquitoes from mating with the wrong species

Relevant PhysicsForums posts

Conflicting interpretations of rosemary oil study

Who chooses official designations for individual dolphins, such as FB15, F153, F286?

Color Recognition: What we see vs animals with a larger color range

Innovative ideas and technologies to help folks with disabilities

Is meat broth really nutritious?

COVID Virus Lives Longer with Higher CO2 In the Air

New software automates and improves phylogenomics from next-generation sequencing data

Researchers develop tool to evaluate genome sequencing method

New genetic analysis identifies ancestry, reduces false positives in pinpointing disease

An error-eliminating fix overcomes big problem in '3rd-gen' genome sequencing

Going deep to improve maize transcriptome

New approach to 'spell checking' gene sequences

Researchers pioneer new methods in ultrafast science for sharper molecular movies

Engineers find a way to protect microbes from extreme conditions

Study finds ways to enhance transcription factor activity

Researchers capture never-before-seen view of gene transcription

New mRNA technology turns cells into long-lasting drug factories

Study reveals the mechanism of bio-inspired control of liquid flow

Medical Xpress

Tech Xplore

Science X

A new tool to correct DNA sequencing errors using consensus and context

Starlings' migratory behavior found to be inherited, not learned

Webb captures a staggering quasar-galaxy merger in the remote universe

Repurposed technology used to probe new regions of Mars' atmosphere

Evidence shows ancient Saudi Arabia had complex and thriving communities, not struggling people in a barren land

Research finds humpbacks were happier during pandemic pause

Webb admires bejeweled ring of the lensed quasar RX J1131-1231

Researchers demonstrate economical process for the synthesis and purification of ionic liquids

New probe reveals water-ice microstructures

Researchers pioneer new methods in ultrafast science for sharper molecular movies

How listening for the right buzz keeps mosquitoes from mating with the wrong species

Relevant PhysicsForums posts

Related Stories

New software automates and improves phylogenomics from next-generation sequencing data

Researchers develop tool to evaluate genome sequencing method

New genetic analysis identifies ancestry, reduces false positives in pinpointing disease

An error-eliminating fix overcomes big problem in '3rd-gen' genome sequencing

Going deep to improve maize transcriptome

New approach to 'spell checking' gene sequences

Recommended for you

Researchers pioneer new methods in ultrafast science for sharper molecular movies

Engineers find a way to protect microbes from extreme conditions

Study finds ways to enhance transcription factor activity

Researchers capture never-before-seen view of gene transcription

New mRNA technology turns cells into long-lasting drug factories

Study reveals the mechanism of bio-inspired control of liquid flow

Newsletter sign up

Donate and enjoy an ad-free experience