Breaking data records bit by bit

December 14, 2017, CERN
Magnetic tapes, retrieved by robotic arms, are used for long-term storage. Credit: Julian Ordan/CERN

This year CERN's data centre broke its own record, when it collected more data than ever before.

During October 2017, the data centre stored the colossal amount of 12.3 petabytes of data. To put this in context, one petabyte is equivalent to the capacity of around 15,000 64GB smartphones. Most of this data come from the Large Hadron Collider's experiments, so this record is a direct result of the outstanding LHC performance, the rest is made up of data from other experiments and backups.

"For the last ten years, the data volume stored on tape at CERN has been growing at an almost exponential rate. By the end of June we had already passed a data storage milestone, with a total of 200 petabytes of data permanently archived on tape," explains German Cancio, who leads the tape, archive & backups storage section in CERN's IT department.

The CERN data centre is at the heart of the Organization's infrastructure. Here data from every experiment at CERN is collected, the first stage in reconstructing that data is performed, and copies of all the experiments' data are archived to long-term tape storage.

Most of the data collected at CERN will be stored forever, the physics data is so valuable that it will never be deleted and needs to be preserved for future generations of physicists.

"An important characteristic of the CERN data archive is its longevity," Cancio adds. "Even after an experiment ends all recorded data has to remain available for at least 20 years, but usually longer. Some of the archive files produced by previous CERN experiments have been migrated across different hardware, software and media generations for over 30 years. For archives like CERN's, that do not only preserve existing data but also continue to grow, our data preservation is particularly challenging."

While tapes may sound like an outdated mode of storage, they are actually the most reliable and cost-effective technology for large-scale archiving of data, and have always been used in this field. One copy of data on a tape is considered much more reliable than the same copy on a disk.

CERN currently manages the largest scientific data in the High Energy Physics (HEP) domain and keeps innovating in storage," concludes Cancio.

Explore further: CERN Data Centre passes the 200-petabyte milestone

Related Stories

CERN Data Centre passes the 200-petabyte milestone

July 7, 2017

On 29 June 2017, the CERN DC passed the milestone of 200 petabytes of data permanently archived in its tape libraries. Where do these data come from? Particles collide in the Large Hadron Collider (LHC) detectors approximately ...

LHC reaches 2017 targets ahead of schedule

October 31, 2017

Today, CERN Control Centre operators announced good news, the Large Hadron Collider (LHC) has successfully met its production target for 2017, delivering more than 45 inverse femtobarns to the experiments.

CERN makes public first data of LHC experiments

November 21, 2014

CERN today launched its Open Data Portal where data from real collision events, produced by experiments at the Large Hadron Collider (LHC) will for the first time be made openly available to all. It is expected that these ...

What to do with 15 million gigabytes of data

November 3, 2008

When it is fully up and running, the four massive detectors on the new Large Hadron Collider (LHC) at the CERN particle-physics lab near Geneva are expected to produce up to 15 million gigabytes, aka 15 petabytes, of data ...

Top lab CERN launches key new accelerator

May 9, 2017

Europe's top physics lab CERN launched its newest particle accelerator on Tuesday, billed as a key step towards future experiments that could unlock the universe's greatest mysteries.

Recommended for you

How community structure affects the resilience of a network

June 22, 2018

Network theory is a method for analyzing the connections between nodes in a system. One of the most compelling aspects of network theory is that discoveries related to one field, such as cellular biology, can be abstracted ...

Water can be very dead, electrically speaking

June 21, 2018

In a study published in Science this week, the researchers describe the dielectric properties of water that is only a few molecules thick. Such water was previously predicted to exhibit a reduced electric response but it ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.