Why keep the raw data?

December 7, 2016
Graphical image of re-using data. Credit: Kroon-Batenburg et al.

The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ. Building on the 2015 workshop organised by the IUCr Diffraction Data Deposition Working Group (DDDWG), the authors bring the story up to date with accounts of new subject-specific and institutional data repositories, and of growing policy pressures on research data management such as the European Open Science initiative.

The article is, however, more than just a workshop report or a survey of evolving policy. It seeks to inform the cost-benefit arguments over diffraction data deposition with examples from real front-line research. For example, Kroon-Batenburg and Helliwell have collaborated on studies of protein binding of the chemotherapeutic agent cisplatin, and have made all their 34 raw data sets available through the University of Manchester Data Library. Some of these datasets have been reanalysed and resulted in fresh understanding of cisplatin-lysozyme models.

The prospect of extracting further information from archived primary data sets in this way (either by the insights of fresh pairs of eyes or through subsequent improvements in software analysis) has implications for structural databases, facilitating the idea of continuous improvement of studies, such as for macromolecular structure models (long championed by Terwilliger).

It is not only in the field of macromolecular structure determination that these considerations are important. One of the greatest challenges to reusing any raw data is the need for complete metadata associated with any set, to allow its subsequent interpretation and full evaluation.

Various IUCr Commissions are actively publishing their summaries of the essential metadata that need to be captured alongside all experimental data sets. These initiatives and their relationship to the IUCr's standard for data characterization (CIF, the Crystallographic Information Framework) are reviewed within the article. Again, practical pointers are given to essential metadata that need to be captured alongside diffraction .

While there are encouraging signs that the scientific community is taking more informed interest in data management and its scientific potential, fresh challenges are being thrown up by the latest generation of instrumentation, capable of generating vast amounts of data at an incredible rate. It may not be possible to archive or even thoroughly analyse all the data that is being produced. However, this article will help to supply a deep understanding of the reasons why society should invest effort and resources into extracting the greatest value possible from the data deluge, in crystallography as in any science.

Explore further: A public database of macromolecular diffraction experiments

More information: Loes M. J. Kroon-Batenburg et al, Raw diffraction data preservation and reuse: overview, update on practicalities and metadata requirements, IUCrJ (2017). DOI: 10.1107/S2052252516018315

Related Stories

A public database of macromolecular diffraction experiments

November 8, 2016

The reproducibility of published experimental results has recently attracted attention in many different scientific fields. The lack of availability of original primary scientific data represents a major factor contributing ...

Free urban data—what's it good for?

October 29, 2014

Cities around the world are increasingly making urban data freely available to the public. But is the content or structure of these vast data sets easy to access and of value? A new study of more than 9,000 data sets from ...

Simple errors limit scientific scrutiny

November 11, 2015

Researchers have found more than half of the public datasets provided with scientific papers are incomplete, which prevents reproducibility tests and follow-up studies.

Giving credit where credit is due

October 14, 2016

Solving today's environmental problems involves vast amounts of data, which have to be gathered, stored, retrieved, analyzed and—increasingly—cited in academic journals. That last step, however, presents a problem.

Manuscript at the click of a button

October 13, 2015

Data collection and analysis are at the core of modern research, and often take months or even years during which researchers remain uncredited for their contribution. A new plugin to a workflow previously developed by the ...

Recommended for you

Astronomers use bubbles to look for WIMPs

May 23, 2017

Invisible, imperceptible and yet far more common than ordinary matter, dark matter makes up an astounding 85 percent of the universe's mass. Physicists are slowly but steadily tracking down the nature of this unidentified ...

Weyl fermions exhibit paradoxical behavior

May 23, 2017

Theoretical physicists have found Weyl fermions to exhibit paradoxical behavior in contradiction to a 30-year-old fundamental theory of electromagnetism. The discovery has possible applications in spintronics. The study ...

Turmoil in sluggish electrons' existence

May 22, 2017

An international team of physicists has monitored the scattering behavior of electrons in a non-conducting material in real-time. Their insights could be beneficial for radiotherapy.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.