Researchers review the state-of-the-art text mining technologies for chemistry

June 22, 2017, Centro Nacional de Investigaciones Oncológicas (CNIO)

In a recent Chemical Reviews article, Spanish researchers have published the first exhaustive revision of the state-of-the-art methodologies underlying chemical search engines, named entity recognition and text mining systems.

The rapidly growing field of big data applications in biomedical research, together with the use of machine learning and artificial intelligence technologies for text data mining, has resulted in promising tools. The authors write, "This review is organised to serve as a practical guide to researchers entering in this field but also to help them to envision the next steps in this emerging data science field."

"Through the release of Gold Standard datasets and the organisation of several community challenge benchmark events, the Biological Text Mining Unit has played a critical role in the development and evaluation of current text mining systems, as highlighted in this article," explains Martin Krallinger, head of the unit and co-first author of the review.

A huge amount of unstructured data

A considerable fraction of biomedically relevant data is only available in the form of unstructured data. This type of data includes rapidly growing scientific literature, medicinal chemistry patents, electronic health records and clinical trial documents. In fact, every year, over 20,000 new compounds are published in medicinal and biological chemistry journals.

Being able to transform unstructured biomedical research data into structured databases that can be more efficiently processed by machines or queried by humans is critical for a range of heterogeneous applications. These include the identification of new drug targets and chemical probes to validate/discard those new potential targets, re-purposing of approved drugs, the identification of adverse drug events or retrieval of systems biology associated with chemical-disease or chemical-gene networks.

As a therapeutic strategy to treat medical needs, chemical compounds constitute a key entity type of critical relevance for . "The construction of large chemical knowledge bases, integrating chemical information with biological and clinical data, is crucial to identify and validate new therapeutic targets for unmet medical needs as well as to speed up the drug discovery process," says Julen Oyarzabal, director of Translational Sciences at CIMA and co-leader of this report.

Explore further: Team presents an online tool to extract drug toxicity information from text

More information: Martin Krallinger et al, Information Retrieval and Text Mining Technologies for Chemistry, Chemical Reviews (2017). DOI: 10.1021/acs.chemrev.6b00851

Related Stories

Text mining for chemists

June 6, 2016

A collaboration between two companies in Hungary and the UK has resulted in the inception of the first ever interactive text mining platform for chemists, overcoming difficulties with extracting information about chemicals ...

Recommended for you

Research team uncovers lost images from the 19th century

June 22, 2018

Art curators will be able to recover images on daguerreotypes, the earliest form of photography that used silver plates, after a team of scientists led by Western University learned how to use light to see through degradation ...

Detecting metabolites at close range

June 22, 2018

A novel concept for a biosensor of the metabolite lactate combines an electron transporting polymer with lactate oxidase, which is the enzyme that specifically catalyzes the oxidation of lactate. Lactate is associated with ...

CryoEM study captures opioid signaling in the act

June 22, 2018

Opioid drugs like morphine and fentanyl are a mainstay of modern pain medicine. But they also cause constipation, are highly addictive, and can lead to fatal respiratory failure if taken at too high a dose. Scientists have ...

Researchers achieve unprecedented control of polymer grids

June 21, 2018

Synthetic polymers are ubiquitous—nylon, polyester, Teflon and epoxy, to name just a few—and these polymers are all long, linear structures that tangle into imprecise structures. Chemists have long dreamed of making polymers ...

Template to create superatoms could make for better batteries

June 21, 2018

Virginia Commonwealth University researchers have discovered a novel strategy for creating superatoms—combinations of atoms that can mimic the properties of more than one group of elements of the periodic table. These superatoms ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.