Toward language inference in medicine

October 29, 2018 by Chaitanya Shivade, IBM
Toward language inference in medicine
Prompt shown to clinicians for annotations. Credit: IBM

Recent times have witnessed significant progress in natural language understanding by AI, such as machine translation and question answering. A vital reason behind these developments is the creation of datasets, which use machine learning models to learn and perform a specific task. Construction of such datasets in the open domain often consists of text originating from news articles. This is typically followed by collection of human annotations from crowd-sourcing platforms such as Crowdflower, or Amazon Mechanical Turk.

However, language used in specialized domains such as medicine is entirely different. The vocabulary used by a physician while writing a clinical note is quite unlike the words in a news article. Thus, in these knowledge-intensive domains cannot be crowd-sourced since such annotations demand domain expertise. However, collecting annotations from domain experts is also very expensive. Moreover, clinical data is privacy-sensitive and hence cannot be shared easily. These hurdles have inhibited the contribution of language datasets in the medical domain. Owing to these challenges, validation of high-performing algorithms from the open domain on remains uninvestigated.

In order to address these gaps, we worked with the Massachusetts Institute of Technology to build MedNLI, a annotated by doctors, performing a natural language inference (NLI) task and grounded in the medical history of patients. Most importantly, we make it publicly available for researchers to advance natural language processing in medicine.

We worked with the MIT Critical Data research labs to construct a dataset for natural language inference in medicine. We used clinical notes from their "Medical Information Mart for Intensive Care" (MIMIC) database, which is arguably the largest publicly available database of patient records. The clinicians in our team suggested that the past medical history of a patient contains vital information from which useful inferences can be drawn. Therefore, we extracted the past medical history from clinical notes in MIMIC and presented a sentence from this history as a premise to a clinician. They were then requested to use their medical expertise and generate three sentences: a sentence that was definitely true about the patient, given the premise; a sentence that was definitely false, and finally a sentence that could possibly be true.

Over a few months, we randomly sampled 4,683 such premises and worked with four clinicians to construct MedNLI, a dataset of 14,049 premise-hypothesis pairs.  In the open domain, other examples of similarly built datasets include the Stanford Natural Language Inference dataset, which was curated with the help of 2,500 workers on Amazon Mechanical Turk and consists of 0.5M premise-hypothesis pairs where premise sentences were drawn from captions of Flickr photos. MultiNLI is another and consists of premise text from specific genres such as fiction, blogs, phone conversations, etc.

Dr. Leo Anthony Celi (Principal Scientist for MIMIC) and Dr. Alistair Johnson (Research Scientist) from MIT Critical Data worked with us for making MedNLI publicly available. They created the MIMIC Derived Data repository, to which MedNLI acted as the first dataset contribution. Any researcher with access to MIMIC can also download MedNLI from this repository.

Although of a modest size compared with the open datasets, MedNLI is large enough to inform researchers as they develop new machine learning models for language inference in medicine. Most importantly, it presents interesting challenges that call for innovative ideas. Consider a few examples from MedNLI:

Toward language inference in medicine

In order to conclude entailment in the first example, one should be able to expand the abbreviations ALT, AST, and LFTs; understand that they are related; and further conclude that an elevated measurement is abnormal. The second example depicts a subtle inference of concluding that emergence of an infant is a description of its birth. Finally, the last example shows how common world knowledge is used to derive inferences.

State-of-the-art deep learning algorithms can perform highly on language tasks because they have the potential to become very good at learning an accurate mapping from inputs to outputs. Thus, training on a large dataset annotated using crowd-sourced annotations is the often a recipe for success. However, they still lack generalization capabilities in conditions that differ from the ones encountered during training. This is even more challenging in specialized and knowledge-intensive domains such as medicine, where training data is limited and language is much more nuanced.

Finally, although great strides have been made in learning a task end-to-end, there is still a need for additional techniques that can incorporate expert curated knowledge bases into these models. For example, SNOMED-CT is an expert curated medical terminology with 300K+ concepts and relations between the terms in its dataset. Within MedNLI, we made simple modifications to existing deep neural network architectures to infuse knowledge from knowledge bases such as SNOMED-CT. However, a large amount of knowledge still remains untapped.

We hope MedNLI opens up new directions of research in the processing community.

Explore further: A new open source dataset links human motion and language

Related Stories

A new open source dataset links human motion and language

February 10, 2017

Researchers have created a large, open source database to support the development of robot activities based on natural language input. The new KIT Motion-Language Dataset will help to unify and standardize research linking ...

Using multi-task learning for low-latency speech translation

August 13, 2018

Researchers from the Karlsruhe Institute of Technology (KIT), in Germany, have recently applied multi-task machine learning to low-latency neural speech translation. Their study, which was pre-published on ArXiv, addresses ...

AI-assisted note-taking for electronic health records

August 22, 2018

Physicians currently spend a lot of time writing notes about patients and inserting them into electronic health record (EHR) systems. According to a 2016 study, doctors spend approximately two hours on administrative work ...

Teaching AI to learn from non-experts

August 1, 2018

Today my IBM team and my colleagues at the UCSF Gartner lab reported in Nature Methods an innovative approach to generating datasets from non-experts and using them for training in machine learning. Our approach is designed ...

Using machine learning to detect software vulnerabilities

July 24, 2018

A team of researchers from R&D company Draper and Boston University developed a new large-scale vulnerability detection system using machine learning algorithms, which could help to discover software vulnerabilities faster ...

Recommended for you

Semimetals are high conductors

March 18, 2019

Researchers in China and at UC Davis have measured high conductivity in very thin layers of niobium arsenide, a type of material called a Weyl semimetal. The material has about three times the conductivity of copper at room ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.