CERN Data Centre passes the 200-petabyte milestone
On 29 June 2017, the CERN DC passed the milestone of 200 petabytes of data permanently archived in its tape libraries. Where do these data come from? Particles collide in the Large Hadron Collider (LHC) detectors approximately 1 billion times per second, generating about one petabyte of collision data per second. However, such quantities of data are impossible for current computing systems to record and they are hence filtered by the experiments, keeping only the most "interesting" ones. The filtered LHC data are then aggregated in the CERN Data Centre (DC), where initial data reconstruction is performed, and where a copy is archived to long-term tape storage. Even after the drastic data reduction performed by the experiments, the CERN DC processes on average one petabyte of data per day. This is how the the milestone of 200 petabytes of data permanently archived in its tape libraries was reached on 29 June.
The four big LHC experiments have produced unprecedented volumes of data in the two last years. This is due in large part to the outstanding performance and availability of the LHC itself. Indeed, in 2016, expectations were initially for around 5 million seconds of data taking, while the final total was around 7.5 million seconds, a very welcome 50% increase. 2017 is following a similar trend.
Further, as luminosity is higher than in 2016, many collisions overlap and the events are more complex, requiring increasingly sophisticated reconstruction and analysis. This has a strong impact on computing requirements. Consequently, records are being broken in many aspects of data acquisition, data rates and data volumes, with exceptional levels of use for computing and storage resources.
To face these challenges, the computing infrastructure at large, and notably the storage systems, went through major upgrades and consolidation during the two years of Long Shutdown 1. These upgrades enabled the data centre to cope with the 73 petabytes of data received in 2016 (49 of which were LHC data) and with the flow of data delivered so far in 2017. These upgrades also allowed the CERN Advanced STORage system (CASTOR) to pass the challenging milestone of 200 petabytes of permanently archived data. These permanently archived data represent an important fraction of the total amount of data received in the CERN data centre, the rest being temporary data which are periodically cleaned up.
Another consequence of the greater data volumes is an increased demand for data transfer and thus a need for a higher network capacity. Since early February, a third 100Gb/s (gigabit per second) fibre optic circuit links the CERN DC to its remote extension hosted at the Wigner Research Centre for Physics (RCP) in Hungary, 1800km away. The additional bandwidth and redundancy provided by this third link help CERN benefit reliably from the computing power and storage at the remote extension. A must-have in the context of computing increasing needs!