Researchers design machine learning technique to improve consumer medical searches

Nov 17, 2010

Medical websites provide consumers with more access than ever before to comprehensive health and medical information, but the sites' utility becomes limited if users use unclear or unorthodox language to describe conditions in a site search. However, a group of Georgia Tech researchers have created a machine-learning model that enables the sites to "learn" dialect and other medical vernacular, thereby improving their performance for users who use such language themselves.

Called "diaTM" (short for "dialect topic modeling"), the system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, dia eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the "language gap" between consumers with health questions and the medical databases they turn to for answers.

"The language gap problem seems to be the most acute in the medical domain," said Hongyuan Zha, professor in the School of Computational Science & Engineering and a paper co-author. "Providing a solution for this domain will have a high impact on maintaining and improving people's health."

To educate dia in various modes of medical language, Crain and his fellow researchers pulled publicly available documents not only from WebMD but also Yahoo! Answers, PubMed Central, the Centers for Disease Control & Prevention website, and other sources. After processing enough documents, he said, dia can learn that the word "gunk," for example, is often a vernacular term for "discharge," and it can process user searches that incorporate the word "gunk" appropriately.

In this initial study using small-scale experiments, the researchers found that dia can achieve a 25 percent improvement in nDCG ("normalized discounted cumulative gain"), a scientific term that refers to the relevance of information retrieval in a web search. Zha, whose research focuses on Internet search engines and their related algorithms, said a 5 percent improvement in nDCG is "very significant."

"Dia figures out enough language relationships that over time it does quite well," said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes dia. "Another benefit is we're not doing word-for-word equivalencies, so 'gunk' doesn't necessarily have to be connected to 'discharge,' as long as it's recognized that 'gunk' is related to infections."

Also, dia is not limited to medical search; it is a machine-learning technique that would work equally well in any topic-related search. In addition to approaching websites about incorporating dia into their search engines, Crain said one next stop is to develop the model so that it can learn dialects by looking at patterns that do not make sense from a topical perspective. For example, using a similar algorithm he was able to automatically discover dialects including text-speak dialect (e.g. "b4" as a subsititue for "before"), but the dialects were mixed in with topically-related groups of words.

"We're trying to get to where you can isolate just the dialects," Crain said.

"This feature will help common users of medical websites," Zha said. "It will help enable with a relatively low level of health literacy to access the critical medical information they need."

Explore further: Avatars make the Internet sign to deaf people

Provided by Georgia Institute of Technology

not rated yet
add to favorites email to friend print save as pdf

Related Stories

Dialect Detectives

Apr 16, 2009

(PhysOrg.com) -- Technology under development by Pedro Torres-Carrasquillo and his colleagues at Lincoln Laboratory may lead to a dialect identification system that compensates for a translator's inexperience ...

New search method tracks down influential ideas

Oct 20, 2010

(PhysOrg.com) -- Princeton computer scientists have developed a new way of tracing the origins and spread of ideas, a technique that could make it easier to gauge the influence of notable scholarly papers, ...

Online tools help students search for meaning

Nov 11, 2008

(PhysOrg.com) -- With universities storing ever more teaching resources online, how do students and tutors find what they need? European researchers have devised novel ways to classify and locate teaching materials – and ...

Answers.com sues Babylon

Mar 09, 2006

Search engine Answers.com has sued Babylon, creators of language-translation software, for copyright infringement and violation of intellectual property.

Recommended for you

Ride-sharing could cut cabs' road time by 30 percent

14 hours ago

Cellphone apps that find users car rides in real time are exploding in popularity: The car-service company Uber was recently valued at $18 billion, and even as it faces legal wrangles, a number of companies ...

Avatars make the Internet sign to deaf people

Aug 29, 2014

It is challenging for deaf people to learn a sound-based language, since they are physically not able to hear those sounds. Hence, most of them struggle with written language as well as with text reading ...

Chameleon: Cloud computing for computer science

Aug 26, 2014

Cloud computing has changed the way we work, the way we communicate online, even the way we relax at night with a movie. But even as "the cloud" starts to cross over into popular parlance, the full potential ...

User comments : 0