October 13, 2015

New tool: How to get meaningful information out of big data

Every second trillions of data bits are accumulated and stored. All these data bits make no sense if you don´t know how to sort them. Now University of Southern Denmark (SDU) researchers present a tool that helps researchers sort data and retrieve meaningful knowledge from the data jungle, presenting their work in the journal Nature Methods.

Pretend for a second that you work with obesity research and that you have a trillion bits of obesity related data stored on a server: What do overweight people eat? How do they sleep? What time of day do they eat?

You suspect that the patients' lifestyle may influence their weight, and you can ask your computer to compare weight change and the number of consumed cheese sandwiches to see if there is a link. Then you can ask for another comparison. And yet another. And so you can continue for a very long time and collect a wide range of comparisons for your research.

Or you can approach your data in a way that is not only much faster, but also will discover links, you might not even have considered. Then you will not only be able to put your own suspicions about weight and lifestyle to the test - perhaps you will discover completely unexpected links, for instance that patients who are losing weight, more often eat gouda than cheddar sandwiches.

Looking for the hidden patterns

This is what clustering is about: To look for hidden patterns that we are unable to see ourselves; to ask a computer to group objects which share common traits together into groups.

In principle, it could be any kind of data: patients, proteins or maybe planets in distant galaxies.

At SDU Assistant Professor and head of the research group Practical Computer Science & Bioinformatics, Richard Röttger, and his colleagues from the Department of Mathematics and Computer Science use clustering for example to find regulatory networks in pathogenic organisms allowing for a fundamental understanding of these organisms without the dangerous and expensive need for wet-lab studies.

But clustering is a complicated way to work - even for a computer scientist and regardless of the fact that clustering is a long standing problem in computer science and one of the most fundamental data analysis procedures:

Clustering should be easy for all scientists, not just computer scientists

"Today there are hundreds of comparable but different clustering tools out there; but each of them requires very specific settings and often a deep understanding of the underlying algorithm. There is no overview of what is out there, what should be used when and there is no objective comparison of the available possibilities", explains Richard Röttger.

Therefore, he and his colleagues, Ph.D. student Christian Wiwie and Associate Professor Jan Baumbach, have now created a tool that can provide an objective overview of all available cluster tools, so that researchers get an unbiased, objective overview and suggestions to what tool to use with what parameters in which setting. "The entire process is speed-up tremendously and made more objective now", says Röttger.

The tool is called ClustEval and it is described in the journal Nature Methods.

More information: Christian Wiwie et al. Comparing the performance of biomedical clustering methods, Nature Methods (2015). DOI: 10.1038/nmeth.3583

Journal information: Nature Methods

Provided by University of Southern Denmark

Citation: New tool: How to get meaningful information out of big data (2015, October 13) retrieved 26 April 2024 from https://phys.org/news/2015-10-tool-meaningful-big.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Bioengineers advance computing technique for health care and more

65 shares

Feedback to editors

Optical barcodes expand range of high-resolution sensor

4 hours ago

Ridesourcing platforms thrive on socio-economic inequality, say researchers

5 hours ago

Did Vesuvius bury the home of the first Roman emperor?

5 hours ago

Florida dolphin found with highly pathogenic avian flu: Report

5 hours ago

A new way to study and help prevent landslides

5 hours ago

New algorithm cuts through 'noisy' data to better predict tipping points

5 hours ago

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

5 hours ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

6 hours ago

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

6 hours ago

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

6 hours ago

Load comments (0)

New tool: How to get meaningful information out of big data

Looking for the hidden patterns

Clustering should be easy for all scientists, not just computer scientists

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

Relevant PhysicsForums posts

Passing variables in FORTRAN

Parallel processing for loops and pointer defined outside the loop

My Website For Creating Interactive Visuals Linked To Equations

Number of Multiplications in the FFT Algorithm

Error logging in: onLoginSuccess is not a function

Latest Notable AI accomplishments

Bioengineers advance computing technique for health care and more

Small step towards growing tissue in the lab

Scientists show how timed feeding could help fight "metabolic jetlag"

Female gamers a new risk group for overweight

Mobile app records our erratic eating habits

A faster sequence homology search algorithm based on database subsequence clustering

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

New tool: How to get meaningful information out of big data

Looking for the hidden patterns

Clustering should be easy for all scientists, not just computer scientists

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

Relevant PhysicsForums posts

Related Stories

Bioengineers advance computing technique for health care and more

Small step towards growing tissue in the lab

Scientists show how timed feeding could help fight "metabolic jetlag"

Female gamers a new risk group for overweight

Mobile app records our erratic eating habits

A faster sequence homology search algorithm based on database subsequence clustering

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience