Big data for text: Next-generation text understanding and analysis

March 7, 2016

News portals and social media are rich information sources, for example for predicting stock market trends. Today, numerous service providers allow for searching large text collections by feeding their search engines with descriptive keywords. Keywords tend to be highly ambiguous, though, and quickly show the limits of current search technologies. Computer scientists from Saarbruecken developed a novel text analysis technology that considerably improves searching very large text collections by means of artificial intelligence.

Beyond search, this technology also assists authors in researching and even in writing texts by automatically providing background information and suggesting links to relevant web sites. Living in the age of business smartphones and enterprise chatrooms, most information in companies is not distributed via spoken words but rather through e-mails, databases, and internal news portals. "According to a survey by the market analyst Gartner, a mere quarter of all companies are using automatic methods to analyze their textual information. By 2021, Gartner predicts 65 per cent will do so. This is because the amount of data inside companies is continuously growing and hence, it becomes more and more costly to have it structured and to search it successfully," says Johannes Hoffart, a researcher at the Max Planck Institute for Informatics and founder of Ambiverse. His team developed a novel text analysis technology for analyzing huge amounts of text where massive computing power and (AI) are continuously "thinking along" in the background.

"For analyzing texts, we rely on extremely large knowledge graphs which are built upon freely available sources such as Wikipedia or large media portals on the web. These graphs can be augmented with domain- or company-specific knowledge, such as product catalogs or customer correspondences," says Hoffart. By applying complex algorithms, these texts are screened further and analyzed with linguistic tools. "Our software then assigns companies and areas of business to their corresponding categories, which allows us to gather valuable insights on how well one's own products are positioned in the market in comparison to those of the competitors," he explains. Particularly challenging hereby is the fact that product or company names are anything but unique and tend to have completely different meanings in different contexts, making them highly ambiguous.

"Our technology helps to map words and phrases to their correct objects of the real-world, that way resolving ambiguities automatically," explains the computer scientist. "Paris" for example stands for the city of light and the French capital, but also for a figure from Greek mythology or a millionfold-mentioned party girl with German ancestors - always depending on context. "Efficiently searching huge text collections is only possible if the different meanings of a name or a concept are correctly resolved," says Hoffart. The smart search engine developed by his team continuously learns and improves over time, and also automatically associates new text entries to matching categories. "These algorithms are hence attractive for companies that analyze online media or social networks to measure the degree of brand awareness for a product or the success of a marketing campaign," says Hoffart further.

At Cebit, Ambiverse will further present a smart authoring platform that assists authors in researching and writing texts. Users who enter texts are automatically provided with background information, for example company-internal guidelines and manuals or web links. "Relevant concepts are linked automatically and links for further research are show", says the computer scientist.

Visitors to the Ambiverse Cebit booth (hall 6, booth 28) will also have the opportunity to compete with their novel AI technology by playing a question-answering game. Ambiverse is funded by the German Federal Ministry for Economic Affairs through an EXIST Transfer of Research grant.

Ambiverse, a spin-off company from the Max Planck Institute for Informatics in Saarbruecken, will be presenting this novel technology during Cebit 2016 in Hannover from 14 to 18 March at Saarland's research booth.

Explore further: Software maps ambiguous names in texts to the right person

Related Stories

Software maps ambiguous names in texts to the right person

February 26, 2014

If a name is ambiguous and given without context, even humans struggle. When reading the last name "Merkel", people do not know if it refers to the Chancellor of Germany Angela Merkel or the famous soccer coach Max Merkel. ...

Smart search engines for news videos

January 7, 2013

Searching for video recordings regularly pushes search engines to their limit. The truth of the matter is that purely automatic algorithms are not enough; user knowledge has to be harnessed, too. Now, researchers are making ...

How to feed and raise a Wikipedia robo-editor

December 11, 2015

Wikipedia is to put artificial intelligence to the enormous task of keeping the free, editable online encyclopedia up-to-date, spam-free and legal. The Objective Revision Evaluation Service uses text-processing AI algorithms ...

Recommended for you

Samsung to disable Note 7 phones in recall effort

December 9, 2016

Samsung announced Friday it would disable its Galaxy Note 7 smartphones in the US market to force remaining owners to stop using the devices, which were recalled for safety reasons.

Swiss unveil stratospheric solar plane

December 7, 2016

Just months after two Swiss pilots completed a historic round-the-world trip in a Sun-powered plane, another Swiss adventurer on Wednesday unveiled a solar plane aimed at reaching the stratosphere.

Solar panels repay their energy 'debt': study

December 6, 2016

The climate-friendly electricity generated by solar panels in the past 40 years has all but cancelled out the polluting energy used to produce them, a study said Tuesday.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.