From molecules to the Milky Way: dealing with the data deluge

Nov 07, 2007

Most people have a few gigabytes of files on their PC. In the next decade, astronomers expect to be processing 10 million gigabytes of data every hour from the Square Kilometre Array telescope.

And with DNA sequencing getting cheaper, scientists will be data mining possibly hundreds of thousands of personal human genome databases, each of 50 gigabytes.

CSIRO has a new research program aimed at helping science and business cope with masses of data from areas like astronomy, gene sequencing, surveillance, image analysis and climate modelling.

The research program, which began this year, is called ‘Terabyte Science’ and is named for the data sets that start at terabytes (thousands of gigabytes) in size, which are now commonplace.

“CSIRO recognises that, for its science to be internationally competitive, the organisation needs to be able to analyse large volumes of complex, even intermittently available, data from a broad range of scientific fields,” says program leader, Dr John Taylor, from CSIRO Mathematical and Information Sciences.

One aspect of the problem is that methods that work with small data sets don’t necessarily work with large ones.

An aim of the program is to develop completely new mathematical approaches and processes for scientists in a range of disciplines to further their research and boost Australia’s position as a world science leader.

“Large and complex data is emerging almost everywhere in science and industry and it will hold back Australian research and business if it isn’t dealt with in a timely way,” Dr Taylor says.

Countries like the US also recognise the challenges, as Dr Taylor has seen first hand in his ten years’ working in laboratories there.

“This will need major developments in computer infrastructure and computational tools. It involves IT people, mathematicians and statisticians, image technologists, and other specialists from across CSIRO all working together in a very focussed way,” he says.

After a workshop in September, specific research areas have been identified and projects are progressing in advanced manufacturing, high throughput image analysis, modelling ocean biogeochemical cycles, situation analysis and environmental modelling.

Source: CSIRO Australia

Explore further: Computerized emotion detector

add to favorites email to friend print save as pdf

Related Stories

Recommended for you

Computerized emotion detector

1 hour ago

Face recognition software measures various parameters in a mug shot, such as the distance between the person's eyes, the height from lip to top of their nose and various other metrics and then compares it with photos of people ...

Cutting the cloud computing carbon cost

Sep 12, 2014

Cloud computing involves displacing data storage and processing from the user's computer on to remote servers. It can provide users with more storage space and computing power that they can then access from anywhere in the ...

Teaching computers the nuances of human conversation

Sep 12, 2014

Computer scientists have successfully developed programs to recognize spoken language, as in automated phone systems that respond to voice prompts and voice-activated assistants like Apple's Siri.

Mapping the connections between diverse sets of data

Sep 12, 2014

What is a map? Most often, it's a visual tool used to demonstrate the relationship between multiple places in geographic space. They're useful because you can look at one and very quickly pick up on the general ...

User comments : 0