Researchers Use Wikipedia To Make Computers Smarter

January 6, 2007

Using Wikipedia, Technion researchers have developed a way to give computers knowledge of the world to help them “think smarter,” making common sense and broad-based connections between topics just as the human mind does. The new method will help computers filter e-mail spam, perform Web searches and even conduct intelligence gathering at more sophisticated levels than current programs.

Researchers at the Technion-Israel Institute of Technology have found a way to give computers encyclopedic knowledge of the world to help them “think smarter,” making common sense and broad-based connections between topics just as the human mind does.

The new method will help computers filter e-mail spam, perform Web searches and even conduct electronic intelligence gathering at a much more sophisticated level than current programs, according to researchers Evgeniy Gabrilovich and Shaul Markovitch of the Technion Faculty of Computer Science. The findings will be presented next week in Hyderabad, India during the Twentieth International Joint Conference for Artificial Intelligence.

The program devised by the Technion researchers helps computers map single words and larger fragments of text to a database of concepts built from the online encyclopedia Wikipedia, which has over one million articles in its English-language version. The Wikipedia-based concepts act as “background knowledge” to help computers figure out the meaning of the text entered into a Web search, for instance.

Giving computers this deeper knowledge has been a long-standing problem in artificial intelligence, according to Markovitch. “Humans use a significant amount of background knowledge” to understand text, “but we didn’t know how to have computers access such knowledge,” he said.

Most Web search and e-mail filter programs appear smart by calculating how often certain words appear in two texts, Markovitch explained. “But what is common to all these applications is that the programs that actually do this kind of thing don’t understand text. They treat text as a collection of words, but they don’t understand the meaning of words.”

This shallow understanding is what makes an e-mail spam filter block all messages containing the word “vitamin,” but fail to block messages containing the word “B12.” “If the program never saw “B12” before, it’s just a word without any meaning. But you would know it’s a vitamin,” Markovitch said.

“With our methodology, however, the computer will use its Wikipedia-based knowledge base to infer that "B12" is strongly associated with the concept of vitamins, and will correctly identify the message as spam," he added.

Or, computers could look at a chunk of text about Saddam Hussein and weapons of mass destruction and know that it is conceptually related to topics such as the Iraq war and U.S. Senate debates on intelligence—even if those terms do not appear anywhere in the original text.

The method also helps computers figure out ambiguous terms—deciding, for instance, whether the word “mouse” refers to the computer device or the fuzzy animal. This can be especially important in translated documents, Markovitch said.

In the near future, the Technion researchers hope to improve their method by adding information from the Web page links inside Wikipedia articles. They are already pursuing a patent on their work, which they say will be of interest to the intelligence community and Web search engine companies, among others.

Source: American Technion Society

Explore further: When data's deep, dark places need to be illuminated

Related Stories

When data's deep, dark places need to be illuminated

February 6, 2017

Much of the data of the World Wide Web hides like an iceberg below the surface. The so-called 'deep web' has been estimated to be 500 times bigger than the 'surface web' seen through search engines like Google. For scientists ...

An ethical hacker explains how to track down the bad guys

February 2, 2017

When a cyberattack occurs, ethical hackers are called in to be digital detectives. In a certain sense, they are like regular police detectives on TV. They have to search computer systems to find ways an intruder might have ...

Building a Google for the dark web

January 9, 2017

In today's data-rich world, companies, governments and individuals want to analyze anything and everything they can get their hands on – and the World Wide Web has loads of information. At present, the most easily indexed ...

Computer coding in science

November 29, 2016

To be a successful scientist in academia it is no longer sufficient to be good at science. In addition to expertise in experimental methods and data analysis, scientists must also excel in public speaking and writing. Furthermore, ...

Why learn spelling or maths if there's an app for that?

January 18, 2017

There is no doubt that digital technologies have disrupted our modes of teaching. The resources and inputs into teaching have changed to incorporate computer-aided approaches such as "flipped" classrooms, mobile-phone-enabled ...

Recommended for you

Light beam replaces blood test during heart surgery

February 27, 2017

A University of Central Florida professor has invented a way to use light to continuously monitor a surgical patient's blood, for the first time providing a real-time status during life-and-death operations.

5 bn mobile phone users in 2017: study

February 27, 2017

The number of mobile phone users globally will surpass five billion by the middle of this year, according to a study released Monday by GSMA, the association of mobile operators.

Tracking the movement of cyborg cockroaches

February 27, 2017

New research from North Carolina State University offers insights into how far and how fast cyborg cockroaches - or biobots - move when exploring new spaces. The work moves researchers closer to their goal of using biobots ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.