Through analysis of 'named entities', computers can extract more information from texts

May 23, 2014 by Kim Bekmann

Mena B. Habib, a researcher at the University of Twente CTIT research institute, teaches computers to improve their reading comprehension. He developed a method by which computers can detect and interpret 'named entities' in a text. These are, for example, names of people, places and organizations, whose importance is dependent upon the context. Habib's method allows computers to analyze the context and thus determine what is meant by the named entity.

Named entities

Maurice van Keulen, senior lecturer for Data Management Technology at the University of Twente, supervised Habib during his doctoral research. He explains: "An example of a named entity is rijksmuseum. The context determines which 'rijksmuseum' (national museum) is referred to. This may be related to the author, the subject of discussion, what was said before or after and sometimes even the location or the time. If the author lives in Enschede, then he or she is probably referring to the 'rijksmuseum' in Enschede. But he could also be referring to one of the many other national museums in the Netherlands. Another example is Paris Hilton: does this refer to the celebrity, the hotel in Paris, or something else?" With Habib's method, the detects which part of the text is a named entity and what is meant by the named entity.

Reading comprehension

There is considerable demand for new methods to extract information from texts. At present, computers can already retrieve quite a lot of information from texts, including the mood and even the age of the writer. Van Keulen: "These techniques are often based on a superficial analysis of plain words. As a result, most of the information remains 'hidden' and is only accessible to computers to a limited extent, unless they learn to read in an understanding manner. With greater understanding of the entities referred to and information available about these entities, computers are better able to extract a lot more information from texts for analysis purposes."


Van Keulen: "We are involved in a number of projects in the scope of which we will apply the method. For the TEC4SE project, for example, we will use the software in the emergency rooms of the Twente fire brigade and police. At major events, the emergency services would like to be aware of what is happening. For example, if there is a disturbance, it is interesting to monitor a channel like Twitter. Our software can read all tweets with some understanding, and is thus able to better detect where and when something is wrong.

Van Keulen: "Habib made sure his method is as strong and robust as possible. The method also works well even if you do not have a lot of texts available to learn from. In addition, his approach is language independent: it doesn't only work for texts in Dutch; it works for texts in any language."

With this research, Habib won the Making Sense of Microposts challenge: #Microposts2013 and came second in 2014. This challenge is an international competition in which research groups perform a joint '' task with their research prototypes.

The title of Mena Badieh Habib Morgan's PhD thesis is: 'Named Entity Extraction and Disambiguation for Informal Text - The Missing Link'. Habib will defend his PhD thesis on 9 May at the Databases department of the University of Twente CTIT research institute. He conducted his research under the supervision of dr. ir. Maurice van Keulen and prof. dr. Peter Apers.

Explore further: Gauging the risk of fraud from social media

Related Stories

Gauging the risk of fraud from social media

June 21, 2013

Are there indicators of whether people present an increased risk of fraudulent behaviour? This is a question that fascinates Dr Maurice van Keulen, a researcher at the University of Twente's Centre for Telematics and Information ...

Software maps ambiguous names in texts to the right person

February 26, 2014

If a name is ambiguous and given without context, even humans struggle. When reading the last name "Merkel", people do not know if it refers to the Chancellor of Germany Angela Merkel or the famous soccer coach Max Merkel. ...

Recommended for you

Computer model demonstrates how human spleen filters blood

June 27, 2016

Researchers, led by Carnegie Mellon University President Subra Suresh and MIT Principal Research Scientist Ming Dao, have created a new computer model that shows how tiny slits in the spleen prevent old, diseased or misshapen ...

Mapping coal's decline and the renewables' rise

June 23, 2016

Even as coal-fired power plants across the U.S. are shutting down in response to new environmental regulations and policy mandates, defenders of the emissions-heavy fuel still have cost on their side. Coal, after all, is ...

Electric racing car breaks world record

June 23, 2016

The Formula Student team at the Academic Motorsports Club Zurich (AMZ) accomplished its mission today: the grimsel electric racing car accelerated from 0 to 100 km/h in just 1.513 seconds and set a new world record. It reached ...

Flower power—photovoltaic cells replicate rose petals

June 24, 2016

With a surface resembling that of plants, solar cells improve light-harvesting and thus generate more power. Scientists of KIT (Karlsruhe Institute of Technology) reproduced the epidermal cells of rose petals that have particularly ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.