Team develops software for automatic summarization of long texts

August 7, 2014

While long-form writing, epic cinematic tales and hefty tomes have had something of a renaissance recently, the continued popularity of the so-called microblogging platform Twitter and other such tools highlights the fact that many people still like to be very succinct. The terse commentary, the abstract, the executive summary: all still favored by many of us at some time or in some context.

Moreover, who has the time to read long texts when a chunk of pithy sound bites is needed. Thankfully, researchers in India are developing new software that can make longwinded prose short and sweet.

Esther Hannah of St. Joseph's College of Engineering, in Sholinganallur and Saswati Mukherjee of Anna University, Guindy Campus, both in Chennai, have developed a classification-based summarization model that performs automatic summarization of . The direct application of the software will be to remove extraneous noise sentences from bulk text allowing much more efficient and faster text mining to be carried out. Of course, the same summarization would allow a reader to extract the salient points from any given text too. The team suggests that the automatic summarization is comparable to that which might be carried out by an expert editor in terms of removing the redundancy and irrelevance.

The team "trained" their software with 60% of the 105 English-language documents from the Document Understanding Conference (DUC-2002), checking and correcting errors the algorithm makes along the way and thus teaching the software what would be an appropriate summarization and what would not. They then tested the remainder at various levels of summarization – 10%, 20% and 30%. The system works well with grammatically well-constructed, even very long, documents, has problems if there are extensive mathematical and scientific symbols in the text.

Precision was optimal between 20 and 30% percent, which is a significant reduction in text length for parsing by text-mining software. They obtained a precision value of about 0.65, which is significantly better than fuzzy logic summarization , which scores around 0.47 and far better than Microsoft Word 2007 inbuilt summarization tool, which is a little over 0.46.

Explore further: Simple technique may help older adults better remember written information

More information: Esther Hannah, M. and Mukherjee, S. (2014) 'A classification-based summarization model for summarising text documents', Int. J. Information and Communication Technology, Vol. 6, Nos. 3/4, pp.292–308.

Related Stories

Software provides a clear overview in long documents

July 25, 2014

In the future, a software will help users better analyze long texts such as the documents for calls for bids, which are often more than one thousand pages long. Experts at Siemens' global research unit Corporate Technology ...

Algorithm automatically cuts boring parts from long videos

June 25, 2014

Smartphones, GoPro cameras and Google Glass are making it easy for anyone to shoot video anywhere. But, they do not make it any easier to watch the tedious videos that can result. Carnegie Mellon University computer scientists, ...

Recommended for you

A not-quite-random walk demystifies the algorithm

December 15, 2017

The algorithm is having a cultural moment. Originally a math and computer science term, algorithms are now used to account for everything from military drone strikes and financial market forecasts to Google search results.

US faces moment of truth on 'net neutrality'

December 14, 2017

The acrimonious battle over "net neutrality" in America comes to a head Thursday with a US agency set to vote to roll back rules enacted two years earlier aimed at preventing a "two-speed" internet.

FCC votes along party lines to end 'net neutrality' (Update)

December 14, 2017

The Federal Communications Commission repealed the Obama-era "net neutrality" rules Thursday, giving internet service providers like Verizon, Comcast and AT&T a free hand to slow or block websites and apps as they see fit ...

The wet road to fast and stable batteries

December 14, 2017

An international team of scientists—including several researchers from the U.S. Department of Energy's (DOE) Argonne National Laboratory—has discovered an anode battery material with superfast charging and stable operation ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.