Fast algorithm extracts, compares document meaning

September 25, 2012

A computer program could compare two documents and work spot the differences in their meaning using a fast semantic algorithm developed by information scientists in Poland.

Writing in the International Journal of Intelligent Information and Database Systems, Andrzej Sieminski of the Technical University of Wroclaw, explains that extracting meaning and calculating the level of semantic similarity between two pieces of texts is a very difficult task, without . There have been various methods proposed by for addressing this problem, but they all suffer from , he says.

Sieminski has now attempted to reduce this complexity by merging a computationally efficient to text analysis with a semantic component. Tests of the on English and Polish tests work well. The test set consisted of 4,890 English sentences with 142,116 words and 11,760 Polish sentences with 184,524 words scraped from online services via their newsfeeds over the course of five days. Sieminski points out that the complexity of the algorithm used on the Polish documents required an additional level of sophistication in terms of computing word means and disambiguation.

Traditional "manual" methods of indexing simply cannot now cope with the vast quantities of information generated on a daily basis by humanity as a whole in scientific research more specifically. The new algorithm once optimised could radically change the way in which we make archived documents searchable and allow knowledge to be extracted far more readily than is possible with standard indexing and search tools.

The approach also circumvents three critical problems faced by most users of conventional search engines: First, the lack of familiarity with the advanced search options of search engines, with a semantic algorithm advanced options become almost unnecessary. Secondly, the rigid nature of the options that are unable to catch the subtle nuance of user information needs, again a tool that understands the meaning of a search and the meaning of the results it offers avoids this problem. Finally, the unwillingness or unacceptably long time necessary to type a long query, semantically aware will require only simply input.

Sieminski points out that the key virtue of the research is the idea of using the statistical similarity measures to assess semantic similarity. He explains that semantic similarity of words could be inferred from the WordNet database. He proposes using this database only during text indexing. "Indexing is done only once so the inevitably long processing time is not an issue," he says. "From that point on we use only statistical algorithms, which are fast and high performance."

Explore further: Online tools help students search for meaning

More information: "Fast algorithm for assessing semantic similarity of texts" in Int. J. Intelligent Information and Database Systems, 2012, 6, 495-512

Related Stories

Online tools help students search for meaning

November 11, 2008

(PhysOrg.com) -- With universities storing ever more teaching resources online, how do students and tutors find what they need? European researchers have devised novel ways to classify and locate teaching materials – and ...

Google search gets semantic

March 24, 2009

Google on Tuesday modified its globally popular Internet search service to understand relationships between words, as the company bids to better grasp what Web users are looking for.

Ranking research

May 3, 2011

A new approach to evaluating research papers exploits social bookmarking tools to extract relevance. Details are reported in the latest issue of the International Journal of Internet Technology and Secured Transactions.

Recommended for you

Netherlands bank customers can get vocal on payments

August 1, 2015

Are some people fed up with remembering and using passwords and PINs to make it though the day? Those who have had enough would prefer to do without them. For mobile tasks that involve banking, though, it is obvious that ...

Power grid forecasting tool reduces costly errors

July 30, 2015

Accurately forecasting future electricity needs is tricky, with sudden weather changes and other variables impacting projections minute by minute. Errors can have grave repercussions, from blackouts to high market costs. ...

Microsoft describes hard-to-mimic authentication gesture

August 1, 2015

Photos. Messages. Bank account codes. And so much more—sit on a person's mobile device, and the question is, how to secure them without having to depend on lengthy password codes of letters and numbers. Vendors promoting ...

3 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

Squirrel
not rated yet Sep 26, 2012
A copy of the paper can be found here: http://www.alphag...eCode=en
jason_ratzlaff
not rated yet Sep 26, 2012
Google has had this for a few days ;)
beedaan
not rated yet Sep 27, 2012
Google has had this for a few days ;)

Do you have a link?

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.