Researchers Use Wikipedia To Make Computers Smarter

Jan 06, 2007

Using Wikipedia, Technion researchers have developed a way to give computers knowledge of the world to help them “think smarter,” making common sense and broad-based connections between topics just as the human mind does. The new method will help computers filter e-mail spam, perform Web searches and even conduct intelligence gathering at more sophisticated levels than current programs.

Researchers at the Technion-Israel Institute of Technology have found a way to give computers encyclopedic knowledge of the world to help them “think smarter,” making common sense and broad-based connections between topics just as the human mind does.

The new method will help computers filter e-mail spam, perform Web searches and even conduct electronic intelligence gathering at a much more sophisticated level than current programs, according to researchers Evgeniy Gabrilovich and Shaul Markovitch of the Technion Faculty of Computer Science. The findings will be presented next week in Hyderabad, India during the Twentieth International Joint Conference for Artificial Intelligence.

The program devised by the Technion researchers helps computers map single words and larger fragments of text to a database of concepts built from the online encyclopedia Wikipedia, which has over one million articles in its English-language version. The Wikipedia-based concepts act as “background knowledge” to help computers figure out the meaning of the text entered into a Web search, for instance.

Giving computers this deeper knowledge has been a long-standing problem in artificial intelligence, according to Markovitch. “Humans use a significant amount of background knowledge” to understand text, “but we didn’t know how to have computers access such knowledge,” he said.

Most Web search and e-mail filter programs appear smart by calculating how often certain words appear in two texts, Markovitch explained. “But what is common to all these applications is that the programs that actually do this kind of thing don’t understand text. They treat text as a collection of words, but they don’t understand the meaning of words.”

This shallow understanding is what makes an e-mail spam filter block all messages containing the word “vitamin,” but fail to block messages containing the word “B12.” “If the program never saw “B12” before, it’s just a word without any meaning. But you would know it’s a vitamin,” Markovitch said.

“With our methodology, however, the computer will use its Wikipedia-based knowledge base to infer that "B12" is strongly associated with the concept of vitamins, and will correctly identify the message as spam," he added.

Or, computers could look at a chunk of text about Saddam Hussein and weapons of mass destruction and know that it is conceptually related to topics such as the Iraq war and U.S. Senate debates on intelligence—even if those terms do not appear anywhere in the original text.

The method also helps computers figure out ambiguous terms—deciding, for instance, whether the word “mouse” refers to the computer device or the fuzzy animal. This can be especially important in translated documents, Markovitch said.

In the near future, the Technion researchers hope to improve their method by adding information from the Web page links inside Wikipedia articles. They are already pursuing a patent on their work, which they say will be of interest to the intelligence community and Web search engine companies, among others.

Source: American Technion Society

Explore further: Ant colonies help evacuees in disaster zones

add to favorites email to friend print save as pdf

Related Stories

US secretly created 'Cuban Twitter' to stir unrest

Apr 03, 2014

In July 2010, Joe McSpedon, a U.S. government official, flew to Barcelona to put the final touches on a secret plan to build a social media project aimed at undermining Cuba's communist government.

Democratizing data visualization

Mar 27, 2014

In 2007, members of the Haystack Group in MIT's Computer Science and Artificial Intelligence Laboratory released a set of Web development tools called "Exhibit." Exhibit lets novices quickly put together ...

Researchers enabling smartphones to identify objects

Mar 19, 2014

(Phys.org) —Researchers are working to enable smartphones and other mobile devices to understand and immediately identify objects in a camera's field of view, overlaying lines of text that describe items ...

Recommended for you

Fired Yahoo exec gets $58M for 15 months of work

6 minutes ago

Yahoo's recently fired chief operating officer, Henrique de Castro, left the Internet company with a severance package of $58 million even though he lasted just 15 months on the job.

Simplicity is key to co-operative robots

8 hours ago

A way of making hundreds—or even thousands—of tiny robots cluster to carry out tasks without using any memory or processing power has been developed by engineers at the University of Sheffield, UK.

Freight train industry to miss safety deadline

8 hours ago

The U.S. freight railroad industry says only one-fifth of its track will be equipped with mandatory safety technology to prevent most collisions and derailments by the deadline set by Congress.

IBM posts lower 1Q earnings amid hardware slump (Update)

9 hours ago

IBM's first-quarter earnings fell and revenue came in below Wall Street's expectations amid an ongoing decline in its hardware business, one that was exacerbated by weaker demand in China and emerging markets.

User comments : 0

More news stories

Simplicity is key to co-operative robots

A way of making hundreds—or even thousands—of tiny robots cluster to carry out tasks without using any memory or processing power has been developed by engineers at the University of Sheffield, UK.

Microsoft CEO is driving data-culture mindset

(Phys.org) —Microsoft's future strategy: is all about leveraging data, from different sources, coming together using one cohesive Microsoft architecture. Microsoft CEO Satya Nadella on Tuesday, both in ...

Floating nuclear plants could ride out tsunamis

When an earthquake and tsunami struck the Fukushima Daiichi nuclear plant complex in 2011, neither the quake nor the inundation caused the ensuing contamination. Rather, it was the aftereffects—specifically, ...

Patent talk: Google sharpens contact lens vision

(Phys.org) —A report from Patent Bolt brings us one step closer to what Google may have in mind in developing smart contact lenses. According to the discussion Google is interested in the concept of contact ...

Quantenna promises 10-gigabit Wi-Fi by next year

(Phys.org) —Quantenna Communications has announced that it has plans for releasing a chipset that will be capable of delivering 10Gbps WiFi to/from routers, bridges and computers by sometime next year. ...