April 4, 2022

Chemical data management: An open way forward

by Ecole Polytechnique Federale de Lausanne

One of the most challenging aspects of modern chemistry is managing data. For example, when synthesizing a new compound, scientists will go through multiple attempts of trial-and-error to find the right conditions for the reaction, generating in the process massive amounts of raw data. Such data is of incredible value, as, like humans, machine-learning algorithms can learn much from failed and partially successful experiments.

The current practice is, however, to publish only the most successful experiments, since no human can meaningfully process the massive number of failed ones. But AI has changed this; it is exactly what these machine-learning methods can do, provided the data are stored in a machine-actionable format for anyone to use.

"For a long time, we needed to compress information due to the limited page count in printed journal articles," says Professor Berend Smit, who directs the Laboratory of Molecular Simulation at EPFL Valais Wallis. "Nowadays, many journals do not even have printed editions anymore; however, chemists still struggle with reproducibility problems because journal articles are missing crucial details. Researchers 'waste' time and resources replicating 'failed' experiments of authors and struggle to build on top of published results as raw data are rarely published."

But volume is not the only problem here; data diversity is another: research groups use different tools like Electronic Lab Notebook software, which store data in proprietary formats that are sometimes incompatible with each other. This lack of standardization makes it nearly impossible for groups to share data.

Now, Smit, with Luc Patiny and Kevin Jablonka at EPFL, have published a perspective in Nature Chemistry presenting an open platform for the entire chemistry workflow: from the inception of a project to its publication.

The scientists envision the platform as "seamlessly" integrating three crucial steps: data collection, data processing, and data publication—all with minimal cost to researchers. The guiding principle is that data should be FAIR: easily findable, accessible, interoperable, and re-usable. "At the moment of data collection, the data will be automatically converted into a standard FAIR format, making it possible to automatically publish all 'failed' and partially successful experiments together with the most successful experiment," says Smit.

But the authors go a step further, proposing that data should also be machine-actionable. "We are seeing more and more data-science studies in chemistry," says Jablonka. "Indeed, recent results in machine learning try to tackle some of the problems chemists believe are unsolvable. For instance, our group has made enormous progress in predicting optimal reaction conditions using machine-learning models. But those models would be much more valuable if they could also learn reaction conditions that fail, but otherwise, they remain biased because only the successful conditions are published."

Finally, the authors propose five concrete steps that the field must take to create a FAIR data-management plan:

The chemistry community should embrace its own existing standards and solutions.
Journals need to make deposition of reusable raw data, where community standards exist, mandatory.
We need to embrace the publication of "failed" experiments.
Electronic Lab Notebooks that do not allow exporting all data into an open machine-actionable form should be avoided.
Data-intensive research must enter our curricula.

"We think there is no need to invent new file formats or technologies," says Patiny. "In principle, all the technology is there, and we need to embrace existing technologies and make them interoperable."

The authors also point out that just storing data in any electronic lab notebook—the current trend—does not necessarily mean that humans and machines can reuse the data. Rather, the data must be structured and published in a standardized format, and they also must contain enough context to enable data-driven actions.

"Our perspective offers a vision of what we think are the key components to bridge the gap between data and machine learning for core problems in chemistry," says Smit. "We also provide an open science solution in which EPFL can take the lead."

More information: Luc Patiny, Making the collective knowledge of chemistry open and machine actionable, Nature Chemistry (2022). DOI: 10.1038/s41557-022-00910-7. www.nature.com/articles/s41557-022-00910-7

Journal information: Nature Chemistry

Provided by Ecole Polytechnique Federale de Lausanne

Citation: Chemical data management: An open way forward (2022, April 4) retrieved 24 April 2024 from https://phys.org/news/2022-04-chemical.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Machine learning cracks the oxidation states of crystal structures

70 shares

Feedback to editors

Chemical data management: An open way forward

Artificial intelligence helps scientists engineer plants to fight climate change

Ultrasensitive photonic crystal detects single particles down to 50 nanometers

Scientists map soil RNA to fungal genomes to understand forest ecosystems

Researchers show it's possible to teach old magnetic cilia new tricks

Mantle heat may have boosted Earth's crust 3 billion years ago

Study suggests that cells possess a hidden communication system

Researcher finds that wood frogs evolved rapidly in response to road salts

Imaging technique shows new details of peptide structures

Cows' milk particles used for effective oral delivery of drugs

New research confirms plastic production is directly linked to plastic pollution

Relevant PhysicsForums posts

Very confused about Naunyn definition of acid and base

Can you eat the Periodic Table?

Ideas for a project in computational chemistry?

New Insight into the Chemistry of Solvents

Separation of KCl from potassium chromium(III) PDTA

Zirconium Versus Zirconium Carbide For Use With Galinstan

Machine learning cracks the oxidation states of crystal structures

Machine-learning helps sort out massive materials' databases

Comparing machine learning models for earthquake detection

Chemists show how bias can crop up in machine learning algorithm results

Machine learning innovation to develop chemical library for drug discovery

Intuition and failure are valuable ingredients in chemistry

New method could cut waste from drug production

Some cannabis rolling papers may contain unhealthy levels of heavy metals

AI designs active pharmaceutical ingredients quickly and easily based on protein structures

Scientists develop novel liquid metal alloy system to synthesize diamond under moderate conditions

A chemical mystery solved—the reaction that explains large carbon sinks

Scientists study lipids cell by cell, making new cancer research possible

Medical Xpress

Tech Xplore

Science X

Chemical data management: An open way forward

Artificial intelligence helps scientists engineer plants to fight climate change

Ultrasensitive photonic crystal detects single particles down to 50 nanometers

Scientists map soil RNA to fungal genomes to understand forest ecosystems

Researchers show it's possible to teach old magnetic cilia new tricks

Mantle heat may have boosted Earth's crust 3 billion years ago

Study suggests that cells possess a hidden communication system

Researcher finds that wood frogs evolved rapidly in response to road salts

Imaging technique shows new details of peptide structures

Cows' milk particles used for effective oral delivery of drugs

New research confirms plastic production is directly linked to plastic pollution

Relevant PhysicsForums posts

Related Stories

Machine learning cracks the oxidation states of crystal structures

Machine-learning helps sort out massive materials' databases

Comparing machine learning models for earthquake detection

Chemists show how bias can crop up in machine learning algorithm results

Machine learning innovation to develop chemical library for drug discovery

Intuition and failure are valuable ingredients in chemistry

Recommended for you

New method could cut waste from drug production

Some cannabis rolling papers may contain unhealthy levels of heavy metals

AI designs active pharmaceutical ingredients quickly and easily based on protein structures

Scientists develop novel liquid metal alloy system to synthesize diamond under moderate conditions

A chemical mystery solved—the reaction that explains large carbon sinks

Scientists study lipids cell by cell, making new cancer research possible

Newsletter sign up

Donate and enjoy an ad-free experience