August 1, 2012

Massive data for miniscule communities

It's relatively easy to collect massive amounts of data on microbes. But the files are so large that it takes days to simply transmit them to other researchers and months to analyze once they are received.

Researchers at Michigan State University have developed a new computational technique, featured in the current issue of the Proceedings of the National Academy of Sciences, that relieves the logjam that these "big data" issues create.

Microbial communities living in soil or the ocean are quite complicated. Their genomic data is easy enough to collect, but their data sets are so big that they actually overwhelm today's computers. C. Titus Brown, MSU assistant professor in bioinformatics, demonstrates a general technique that can be applied on most microbial communities.

The interesting twist is that the team created a solution using small computers, a novel approach considering most bioinformatics research focuses on supercomputers, Brown said.

"To thoroughly examine a gram of soil, we need to generate about 50 terabases of genomic sequence – about 1,000 times more data than generated for the initial human genome project," said Brown, who co-authored on the paper with Jim Tiedje, University Distinguished professor of microbiology and molecular genetics. "That would take about 50 laptops to store that much data. Our paper shows the way to make it work on a much smaller scale."

Analyzing DNA data using traditional computing methods is like trying to eat a large pizza in a single bite. The huge influx of data bogs down computers' memory and causes them to choke. The new method employs a filter that folds the pizza up compactly using a special data structure. This allows computers to nibble at slices of the data and eventually digest the entire sequence. This technique creates a 40-fold decrease in memory requirements, allowing scientists to plow through reams of data without using a supercomputer.

Brown and Tiedje will continue to pursue this line of research, and they are encouraging others to improve upon it as well. The researchers made the complete source code and the ancillary software available to the public to encourage extension.

"We want this program to continue to evolve and improve," Brown said. "In fact, it already has. Other researchers have taken our approach in a new direction and made a better genome assembler."

Journal information: Proceedings of the National Academy of Sciences

Provided by Michigan State University

Citation: Massive data for miniscule communities (2012, August 1) retrieved 20 July 2024 from https://phys.org/news/2012-08-massive-miniscule.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Big Computers For Big Science

0 shares

Feedback to editors

Saturday Citations: Scientists study monkey faces and cat bellies; another intermediate black hole in the Milky Way

15 hours ago

Researchers zero in on the underlying mechanism that causes alloys to crack when exposed to hydrogen-rich environments

Jul 19, 2024

International study highlights large and unequal life expectancy declines in India during COVID-19

Jul 19, 2024

Global study demonstrates benefit of marine protected areas to recreational fisheries

Jul 19, 2024

Killifish can adjust their egg-laying habits in response to predators, study shows

Jul 19, 2024

Enhanced information in national policies can accelerate Africa's efforts to track climate adaptation

Jul 19, 2024

Innovative microscopy reveals amyloid architecture, may give insights into neurodegenerative disease

Jul 19, 2024

Study deciphers intricate 3D structure of DNA aptamer for disease theranostics

Jul 19, 2024

Gold co-catalyst improves photocatalytic degradation of micropollutants, finds study

Jul 19, 2024

How mantle hydration changes over the lifetime of a subduction zone

Jul 19, 2024

Load comments (1)

Massive data for miniscule communities

Saturday Citations: Scientists study monkey faces and cat bellies; another intermediate black hole in the Milky Way

Researchers zero in on the underlying mechanism that causes alloys to crack when exposed to hydrogen-rich environments

International study highlights large and unequal life expectancy declines in India during COVID-19

Global study demonstrates benefit of marine protected areas to recreational fisheries

Killifish can adjust their egg-laying habits in response to predators, study shows

Enhanced information in national policies can accelerate Africa's efforts to track climate adaptation

Innovative microscopy reveals amyloid architecture, may give insights into neurodegenerative disease

Study deciphers intricate 3D structure of DNA aptamer for disease theranostics

Gold co-catalyst improves photocatalytic degradation of micropollutants, finds study

How mantle hydration changes over the lifetime of a subduction zone

Relevant PhysicsForums posts

Particle.js: Exploring Particle Physics with Web Technologies

Help solving a geometrical matching issue with Graph Neural Networks

5 GHz PC WiFi connection Cybersecurity question

Help with some optimization code for Block Matrices

Is an API Always Necessary for Server-Client Communication?

I did this POST message configuration damage to my wifi internet, help

Big Computers For Big Science

Searching genomic data faster with new algorithm

Einstein offers easy-to-use genome analyzer to scientific community

The '$1,000 genome' may cost $100,000 to understand

Can you really eat just one?

A genomic CluE for cloud computing

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Massive data for miniscule communities

Saturday Citations: Scientists study monkey faces and cat bellies; another intermediate black hole in the Milky Way

Researchers zero in on the underlying mechanism that causes alloys to crack when exposed to hydrogen-rich environments

International study highlights large and unequal life expectancy declines in India during COVID-19

Global study demonstrates benefit of marine protected areas to recreational fisheries

Killifish can adjust their egg-laying habits in response to predators, study shows

Enhanced information in national policies can accelerate Africa's efforts to track climate adaptation

Innovative microscopy reveals amyloid architecture, may give insights into neurodegenerative disease

Study deciphers intricate 3D structure of DNA aptamer for disease theranostics

Gold co-catalyst improves photocatalytic degradation of micropollutants, finds study

How mantle hydration changes over the lifetime of a subduction zone

Relevant PhysicsForums posts

Related Stories

Big Computers For Big Science

Searching genomic data faster with new algorithm

Einstein offers easy-to-use genome analyzer to scientific community

The '$1,000 genome' may cost $100,000 to understand

Can you really eat just one?

A genomic CluE for cloud computing

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience