Massive data for miniscule communities

Aug 01, 2012

It's relatively easy to collect massive amounts of data on microbes. But the files are so large that it takes days to simply transmit them to other researchers and months to analyze once they are received.

Researchers at Michigan State University have developed a new , featured in the current issue of the , that relieves the logjam that these "big data" issues create.

Microbial communities living in soil or the ocean are quite complicated. Their is easy enough to collect, but their data sets are so big that they actually overwhelm today's computers. C. Titus Brown, MSU assistant professor in bioinformatics, demonstrates a general technique that can be applied on most .

The interesting twist is that the team created a solution using small computers, a novel approach considering most bioinformatics research focuses on supercomputers, Brown said.

"To thoroughly examine a gram of soil, we need to generate about 50 terabases of genomic sequence – about 1,000 times more data than generated for the initial human genome project," said Brown, who co-authored on the paper with Jim Tiedje, University Distinguished professor of microbiology and molecular genetics. "That would take about 50 laptops to store that much data. Our paper shows the way to make it work on a much smaller scale."

Analyzing DNA data using traditional computing methods is like trying to eat a large pizza in a single bite. The huge influx of data bogs down computers' memory and causes them to choke. The new method employs a filter that folds the pizza up compactly using a special data structure. This allows computers to nibble at slices of the data and eventually digest the entire sequence. This technique creates a 40-fold decrease in memory requirements, allowing scientists to plow through reams of data without using a .

Brown and Tiedje will continue to pursue this line of research, and they are encouraging others to improve upon it as well. The researchers made the complete source code and the ancillary software available to the public to encourage extension.

"We want this program to continue to evolve and improve," Brown said. "In fact, it already has. Other researchers have taken our approach in a new direction and made a better genome assembler."

Explore further: Researcher develops method for monitoring whether private information is sufficiently protected

Related Stories

Big Computers For Big Science

Aug 23, 2004

A visiting neutron scattering scientist at ORNL sends data from her experiment to a San Diego supercomputer for analysis. The calculation results are sent to Argonne National Laboratory, where they are turned into "pictures." ...

Searching genomic data faster with new algorithm

Jul 10, 2012

In 2001, the Human Genome Project and Celera Genomics announced that after 10 years of work at a cost of some $400 million, they had completed a draft sequence of the human genome. Today, sequencing a human genome is something ...

The '$1,000 genome' may cost $100,000 to understand

May 11, 2011

Advances in technology have almost lifted the curtain on the long-awaited era of the "$1,000 genome" — a time when all the genes that make up a person can be deciphered for about that amount – compared to nearly ...

Can you really eat just one?

Jul 29, 2011

A Kansas State University genomicist is hoping an old potato chip slogan -- "betcha can't eat just one" -- will become the mindset of researchers when it comes to sequencing insect genomes.

A genomic CluE for cloud computing

Apr 23, 2009

DNA sequencing is the next frontier in biological research. As new sequencing technology becomes more efficient and affordable, it is increasingly available to small laboratories. Thus, sequencing data is being generated ...

Recommended for you

Tackling urban problems with Big Data

7 hours ago

Paul Waddell, a city planning professor at the University of California, Berkeley, with a penchant for conducting research with what he calls his "big urban data," is putting his work to a real-world test ...

Computer-assisted accelerator design

Apr 22, 2014

Stephen Brooks uses his own custom software tool to fire electron beams into a virtual model of proposed accelerator designs for eRHIC. The goal: Keep the cost down and be sure the beams will circulate in ...

User comments : 1

Adjust slider to filter visible comments by rank

Display comments: newest first

TheWalrus
1 / 5 (1) Aug 01, 2012
MinUscule.

m-i-n-u-s-c-u-l-e

The word is related to "minus," not "miniature."

More news stories

Facebook buys fitness app Moves

Facebook has bought the fitness app Moves, which helps users monitor daily physical activity and their calorie counts on a smartphone.

Autism Genome Project delivers genetic discovery

A new study from investigators with the Autism Genome Project, the world's largest research project on identifying genes associated with risk for autism, has found that the comprehensive use of copy number variant (CNV) genetic ...

Study links California drought to global warming

While researchers have sometimes connected weather extremes to man-made global warming, usually it is not done in real time. Now a study is asserting a link between climate change and both the intensifying California drought ...