Scientists working in the field of organic chemistry create and study new molecules using magnetic resonance. The standards used to re-transcribe the collected data is, however, specific to each laboratory or publication, making it difficult to export the information electronically and thus to be used by the scientific community. An international team headed by chemists from the University of Geneva (UNIGE) has developed a new common electronic language that translates the data of each molecule in exactly the same way and makes it simple to export it from one information system to another. This means that chemists everywhere can access directly reusable data easily, resulting in significant time savings for future research. This study, published in the journal Magnetic Resonance in Chemistry, paves the way for creating an international, open-access database and specific tools, including artificial intelligence analysis.

Organic chemists create new molecules based on carbon atoms; these are so small, however, that it is impossible to see what they synthesise. Researchers use to verify these compositions that are made "blind": Every atom that makes up the molecule emits a signal, whose frequency is translated in the form of a spectrum that the chemists can then decode. To determine the structure of a molecule, researchers must be able to "read" the magnetic resonance spectra.

Chemists have a specific vocabulary for describing spectra and detailing the resonance of the atoms. But the way the raw data is translated into a written varies depending on the individual laboratory, the software used and the particular publication. In short, there is no database available for assigned molecular structures or any uniformity in the way the spectra are processed and the data attributed to them. "That's why it is very difficult to reuse data generated by other laboratories," explains Damien Jeannerat, a researcher in the Department of Organic Chemistry in UNIGE's Faculty of Science. "So we came up with the idea of devising a single electronic language that can be used to switch from one system to another without losing any precision, and to build an international, open-access database."

The UNIGE chemists teamed up with field specialists and introduced a new electronic language that can serve as the standard for processing organic molecule data. "Our new format, called NMReDATA, operates according to a system of labels that are assigned to each item of data extracted from the spectra in a defined order, and which can be easily read by a computer," says Marion Pupier, a chemical engineer in the Department of Organic Chemistry at UNIGE. The frequency of each atom will be described in a sequence showing the chemical shift, the number of atoms, the couplings, the interatomic correlations and finally the assignments. "Until now, everyone has used his own sequence to transmit the same information, making electronic transfer from one computer to another impossible and forcing the researchers to monitor and constantly reorganise the information. But there will be no need to do this with our system, thanks to the uniform nature of the language," says Damien Jeannerat.

The idea of a common electronic language is closely linked to the desire to create open-access databases. "This would enable chemists to find the exact composition of the they're studying without having to re-do the work that has already been done in the past," says Marion Pupier. The information will be visible and available anywhere and at any time, saving considerable time and money for research.

All that now remains is to disseminate the new format and to establish it as the norm for publishing articles in the major international journals. "We hope that all the software will be fully operational in around a year, and that NMReDATA will be used by everyone," says Jeannerat by way of conclusion.

More information: Marion Pupier et al, NMReDATA, a standard to report the NMR assignment and parameters of organic compounds, Magnetic Resonance in Chemistry (2018). DOI: 10.1002/mrc.4737