Taking the gamble out of DNA sequencing

February 24, 2013

Two USC scientists have developed an algorithm that could help make DNA sequencing affordable enough for clinics – and could be useful to researchers of all stripes.

Andrew Smith, a computational biologist at the USC Dornsife College of Letters, Arts and Sciences, developed the algorithm along with USC graduate student Timothy Daley to help predict the value of sequencing more DNA, to be published in on February 24.

Extracting information from the DNA means deciding how much to sequence: sequencing too little and you may not get the answers you are looking for, but sequence too much and you will waste both time and money. That expensive gamble is a big part of what keeps DNA sequencing out of the hands of . But not for long, according to Smith.

"It seems likely that some clinical applications of DNA sequencing will become routine in the next five to 10 years," Smith said. "For example, diagnostic sequencing to understand the properties of a tumor will be much more effective if the right mathematical methods are in place."

The beauty of Smith and Daley's algorithm, which predicts the size and composition of an unseen population based on a small sample, lies in its broad applicability.

"This is one of those great instances where a specific challenge in our research led us to uncover a powerful algorithm that has surprisingly broad applications," Smith said.

Think of it: how often do scientists need to predict what they haven't seen based on what they have? could use the algorithm to estimate the population of HIV positive individuals; astronomers could use it to determine how many exoplanets exist in our galaxy based on the ones they have already discovered; and could use it to estimate the diversity of in an individual.

The mathematical underpinnings of the rely on a model of sampling from ecology known as capture-recapture. In this model, individuals are captured and tagged so that a recapture of the same individual will be known – and the number of times each individual was captured can be used to make inferences about the population as a whole.

In this way scientists can estimate, for example, the number of gorillas remaining in the wild. In DNA sequencing, the individuals are the various different genomic molecules in a sample. However, the mathematical models used for counting gorillas don't work on the scale of DNA sequencing.

"The basic model has been known for decades, but the way it has been used makes it highly unstable in most applications. We took a different approach that depends on lots of computing power and seems to work best in large-scale applications like modern DNA sequencing," Daley said.

Scientists faced a similar problem in the early days of the human genome sequencing project. A mathematical solution was provided by Michael Waterman of USC, in 1988, which found widespread use. Recent advances in sequencing technology, however, require thinking differently about the mathematical properties of DNA sequencing data.

"Huge data sets required a novel approach. I'm very please it was developed here at USC," said Waterman.

Explore further: New method for sequencing genome in a single cell

Related Stories

New method for sequencing genome in a single cell

December 21, 2012

(Phys.org)—The traditional genome sequencing process requires thousands of cells (or more) to provide sufficient DNA, and this means that variations that are only present in a small number of cells―such as early cancer ...

Sequencing hundreds of chloroplast genomes now possible

January 31, 2013

Researchers at the University of Florida and Oberlin College have developed a sequencing method that will allow potentially hundreds of plant chloroplast genomes to be sequenced at once, facilitating studies of molecular ...

Recommended for you

Scientists create first stable semisynthetic organism

January 23, 2017

Life's genetic code has only ever contained four natural bases. These bases pair up to form two "base pairs"—the rungs of the DNA ladder—and they have simply been rearranged to create bacteria and butterflies, penguins ...

New steps in the meiosis chromosome dance

January 23, 2017

Where would we be without meiosis and recombination? For a start, none of us sexually reproducing organisms would be here, because that's how sperm and eggs are made. And when meiosis doesn't work properly, it can lead to ...

Research describes missing step in how cells move their cargo

January 23, 2017

Every time a hormone is released from a cell, every time a neurotransmitter leaps across a synapse to relay a message from one neuron to another, the cell must undergo exocytosis. This is the process responsible for transporting ...

Lab charts the anatomy of three molecular channels

January 23, 2017

Using a state-of-the-art imaging technology in which molecules are deep frozen, scientists in Roderick MacKinnon's lab at Rockefeller University have reconstructed in unprecedented detail the three-dimensional architecture ...

Immune defense without collateral damage

January 23, 2017

Researchers from the University of Basel in Switzerland have clarified the role of the enzyme MPO. In fighting infections, this enzyme, which gives pus its greenish color, produces a highly aggressive acid that can kill pathogens ...

Provocative prions may protect yeast cells from stress

January 23, 2017

Prions have a notorious reputation. They cause neurodegenerative disease, namely mad cow/Creutzfeld-Jakob disease. And the way these protein particles propagate—getting other proteins to join the pile—can seem insidious.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.