Improving accuracy in genomic mapping with time-series data

December 29, 2015, American Institute of Physics
DNA molecules extended in nano channels. The backbone has been stained in green, and sequence-specific sites have been labeled in red. Credit: Julian Sheats/UMN

If you already have the sequenced map of an organism's genome but want to look for structural oddities in a sample, you can check the genomic barcode—a series of distances between known, targeted sites—by cutting a DNA sequence at those sites and examining the distance between the cuts. However, if the original map—obtained through next-generation sequencing involving PCR—contains any amplification biases, there is room for systematic error across studies. To remedy this, researchers at the University of Minnesota and BioNano Genomics have improved a nanochannel-based form of mapping by using dynamic time-series data to measure the probability distribution, or how much genetic material separates two labels, based on whether the strands are stretched or compressed.

"Imagine that two labels on the DNA backbone are connected together by a spring that models the configurational entropy of the DNA between them," said Kevin Dorfman, a professor in the University of Minnesota's College of Science & Engineering. "If this was a harmonic spring ... then we would expect to see an equal probability of positive and negative displacements about the rest of the length of the spring."

Rather than this normal curve, however, Dorfman and his colleagues observed greater compression than extension between the labels, and found that the the majority of thermal fluctuations between the labels are short-lived events - information that could help improve the accuracy of genome mapping.

"Such improvements are especially important for complicated samples like cancer, where the cells are heterogeneous, so we need high accuracy to find rare events," Dorfman said.

Dorfman and his lab have been working with collaborators at San Diego-based BioNano Genomics over the past three years, through grants supported by the National Institutes for Health and National Science Foundation. He and his colleagues detail their work this week in Biomicrofluidics.

A problem the researchers encountered with the traditionally used pulsed field gel electrophoresis method—in which genome maps are constructed by dicing DNA sequences with restriction enzymes—lay in reassembling the maps, as the conventional process sorts the fragments as a function of their size. In the nanochannel method, however the fluorescent labels stay ordered on each chain throughout. This allows the researchers to determine the content of the entire strands from their fluorescent barcodes, without having to reassemble them—removing the reliance on a previously obtained map.

The researchers started by labeling the DNA, which consisted of extracting the genomic DNA from E. coli cells, removing a single nucleotide and piece of the backbone at various targeted locations, and inserting fluorescent nucleotides in their places. Each DNA strand, typically around 300,000 base pairs, was then injected into a 45 nm-wide nanochannel. This forces the molecule to stretch since the bending length scale for DNA, at which it still moves in a rod-like, quantifiable manner, is about 50 nm.

They then imaged the location of the labels using a digital camera. Whereas typical single-molecule studies of DNA in nanochannels report the statistics from dozens of molecules, the researchers' method involves thousands of molecules, each covered in a flurry of labels—leading to millions of measurements of distances between the labels, which are essential to determining the probability distributions.

Future work for Dorfman and his colleagues includes using these distributions as an input into the genome mapping algorithm. This can be used to assign a confidence that a particular sequence of dots maps to a particular region of the genome, as well to help understand the effect of the knots, folds, and loops of the stretched DNA on genome mapping.

Explore further: Digging deeper into DNA: An efficient method to sequence chloroplast genomes

More information: 'Measurements of DNA barcode label separations in nanochannels from time-series data,' by Julian Sheats, Jeffrey G. Reifenberger, Han Cao and Kevin D. Dorfman, Biomicrofluidics on Dec. 29, 2015. DOI: 10.1063/1.4938732

Related Stories

Nanoparticles simplify DNA identification and quantification

November 25, 2015

In an article published in Small, researchers successfully applied a new qualitative and quantitative method for the detection of a DNA sequence characteristic of Leishmania infantum kinetoplast, a frequent parasite in veterinary ...

New method for sequencing genome in a single cell

December 21, 2012

(Phys.org)—The traditional genome sequencing process requires thousands of cells (or more) to provide sufficient DNA, and this means that variations that are only present in a small number of cells―such as early cancer ...

Recommended for you

Key player in cell metabolism identified

January 16, 2018

Researchers from the Genomic Instability and Cancer Laboratory at Institute for Research in Biomedicine (IRB Barcelona) have identified a key role for EXD2 in protein production in the mitochondria, the cellular organelles ...

New species of lemur found on Madagascar

January 15, 2018

A team of researchers with members from the State University of New York Polytechnic Institute, Omaha's Henry Doorly Zoo and Aquarium, Global Wildlife Conservation and the Madagascar Biodiversity Partnership has discovered ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.