Scientists re-imagine how genomes are assembled

Nov 25, 2013
3D interaction frequencies can be used to bridge gaps in the DNA sequence and assemble genomes. Credit: Noam Kaplan, PhD, UMass Medical School

Scientists at the University of Massachusetts Medical School (UMMS) have developed a new method for piecing together the short DNA reads produced by next-generation sequencing technologies that are the basis for building complete genome sequences. Job Dekker, PhD, and colleagues have shown that entire genomes can be assembled faster and more accurately by measuring the frequency of interactions between DNA segments and by using their three-dimensional shape as a guide. Employing this technique, they have been able to place 65 previously unaccounted for DNA fragments in incomplete regions of the human genome.

Details of the study appear online in Nature Biotechnology.

"The ability of next-generation sequencing technologies to produce hundreds of millions of short reads of DNA sequences has been an incredible boon for biomedical researchers," said Dr. Dekker, co-director of the Program in Systems Biology, professor of biochemistry and molecular pharmacology at UMMS and senior author of the study. "As these DNA sequences have become shorter and shorter, however, assembling complete genomes have become increasingly challenging. After 20 years of intense efforts, even the still has gaps.

"Using the 3D structure of the as a guide, we have shown that it's possible for these snippets of DNA sequences to be assembled quickly, cheaply and more accurately than current methodologies allow. This elegant and powerful technique will allow us to complete the human genome, assemble the genomes of any other species and facilitate new genetic discoveries more quickly."

In the last decade, as the cost of high-throughput DNA sequencing has come down to as little as a few thousand dollars, sequencing of new genomes has become almost routine. Next-generation sequencing techniques can easily read hundreds of millions DNA sequences at a time. However, these sequences are randomly broken into extremely short pieces and need to be assembled into larger pieces using computer algorithms that can match up overlapping pieces. The end result of this initial assembly is typically a set of as many as 100,000 DNA fragments which then need to be organized with respect to one another in the correct order to create a complete genome.

Hindering this final task is the fact that genomes are full of highly repetitive sequences that appear in a multitude of places. Finding where precisely, among the thousands of possible locations, a particular fragment of DNA resides, is a daunting task. To complete this second step, often referred to as genome scaffolding, scientists rely on labor intensive, low-through put experimental techniques to build reasonably accurate, complete genomes.

"How to assemble these snippets of DNA has become a bottleneck for researchers that can take weeks or months to solve," said Noam Kaplan, PhD, postdoctoral research fellow in the Dekker lab and first author of the Nature Biotechnology study.

Tackling this problem, Dekker and Kaplan looked to the three-dimensional structure of the genome as a guide for assembling the linear DNA sequences. Using Hi-C technology, developed by the Dekker lab, they measured how frequently each DNA fragment in the genome interacts with others. DNA sequences that are located near each other in the three dimensional genome tend to interact more frequently, while DNA sequences that are further apart interact less frequently. Computational methods are then used to mathematically determine the linear genomic position of each fragment in the genome based on the 3D interaction frequency data that fits that sequence.

For example, said Kaplan, a sequence may fit into the one-dimensional linear genome in several places. But using the interaction frequency data, it is possible to determine the relationship it has with other sequences and whether it is close to or far away from those sequences. "So while a particular sequence may fit in many places in a linear genome, we can determine if a particular sequence is a better fit, three dimensionally, in one location versus another, based on this interaction data," said Dr. Kaplan.

With this new approach Kaplan and Dekker were able to predict the positions of 65 previously unlocalized fragments.

"We were surprised how well our method worked," said Kaplan. "It is satisfying to see how a simple idea can solve such a difficult and common problem."

Dekker added, "This new approach to genome assembly can help produce higher-quality genome sequences faster and easier than current methods. It will be especially interesting to apply this method to identify chromosomal aberrations, which are a hallmark of cancer."

Explore further: Cost-effective method accurately orders DNA sequencing along entire chromosomes

More information: High-throughput genome scaffolding from in vivo DNA interaction frequency, DOI: 10.1038/nbt.2768

Related Stories

Protein coding 'junk genes' may be linked to cancer

Nov 17, 2013

By using a new analysis method, researchers at Karolinska Institutet and Science for Life Laboratory (SciLifeLab) in Sweden have found close to one hundred novel human gene regions that code for proteins. A number of these ...

Sequencing hundreds of chloroplast genomes now possible

Jan 31, 2013

Researchers at the University of Florida and Oberlin College have developed a sequencing method that will allow potentially hundreds of plant chloroplast genomes to be sequenced at once, facilitating studies of molecular ...

Exploring the 'last frontier' of our genome

Sep 23, 2011

The human genome first appeared in print in 2001. But scientists aren’t done yet. There’s part of our DNA that geneticists have yet to assemble a sequence for: the centromeres.

Recommended for you

Identifying the source of stem cells

15 hours ago

When most animals begin life, cells immediately begin accepting assignments to become a head, tail or a vital organ. However, mammals, including humans, are special. The cells of mammalian embryos get to ...

Contamination likely explains 'food genes in blood' claim

Oct 29, 2014

Laboratory contaminants likely explain the results of a recent study claiming that complete genes can pass from foods we eat into our blood, according to a University of Michigan molecular biologist who re-examined ...

User comments : 1

Adjust slider to filter visible comments by rank

Display comments: newest first

wavettore
1 / 5 (3) Nov 27, 2013
In spite of two findings from Einstein and Planck, one to show the equation between atoms and energy and the other to discover the constant between waves' frequencies and energy, traditional science has never closed the circle. Still today, this science does not recognize the transformation from waves to atoms. The many revelations brought to light by this transformation have repercussions well beyond the progress of science and with a "domino effect" they show new horizons never seen before. Wavevolution is the link that was missing to science to reveal the Creation. Since the discovery of Wavevolution, Science and Religion are one and the same. Wavevolution shows how the origin of all energy was first waves and then mass, how the behavior in waves and atoms is identical and how One same movement common to all energy is comparable to One Divine Will that creates the whole Universe.

http://en.wikiver...volution

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.