Mining the blogosphere—Researchers develop tools that make sense of social media

Sep 06, 2012

Can a computer "read" an online blog and understand it? Several Concordia computer scientists are helping to get closer to that goal.

Leila Kosseim, associate professor in Concordia's Faculty of Engineering and Computer Science, and a recently-graduated doctoral student, Shamima Mithun, have developed a system called BlogSum that has potentially vast applications. It allows an organization to pose a question and then find out how a large number of people talking online would respond. The system is capable of gauging things like and voter intentions by sorting through websites, examining real-life self-expression and conversation, and producing summaries that focus exclusively on the original question.

"Huge quantities of electronic texts have become easily available on the Internet, but people can be overwhelmed, and they need help to find the real content hiding in the mass of information," explains Kosseim, one of the lead researchers at Concordia's Laboratory (CLaC lab).

Analyzing informally-written language poses unique challenges compared to analyzing, for example, a news article. Blogs, forums and the like contain opinions, emotions and , not to mention spelling errors and poor grammar. A summarization tool must address two particular problems, question irrelevance ( that are not relevant to the main question), and discourse incoherence, (sentences in which the intent of the writer is unclear).

BlogSum met these challenges with demonstrable efficiency. The researchers developed and tested their tool by examining a set of blogs and review sites. BlogSum used "discourse relations" to crunch the data – ways of filtering and ordering sentences into coherent summaries. BlogSum was measured against prior computational rankings and achieved mostly superior results. In addition, it was evaluated by actual human subjects, who also found it to be superior. Summaries produced by BlogSum reduced question irrelevance and discourse incoherence, successfully distilling large amounts of text into highly readable summaries.

This study is an example of Natural Language Processing (NLP), in which Concordia, through the CLaC lab, is a leader. NLP stands at the intersection of artificial intelligence and linguistics, seeking to enable computers to derive meaning from human language.

"The field of natural language processing is starting to become fundamental to , with many everyday applications – making search engines find more relevant documents or making smart phones even smarter," explained Kosseim.

Explore further: Computerized emotion detector

Related Stories

Can't Make it to a Meeting? Send a Computer Instead

Aug 06, 2009

(PhysOrg.com) -- If you’ve ever wished you had an assistant to attend meetings with you, take notes and produce a concise summary, then you’ll be pleased to know that UT Dallas computer scientist Yang ...

Mining the language of science

Nov 18, 2011

(PhysOrg.com) -- Scientists are developing a computer that can read vast amounts of scientific literature, make connections between facts and develop hypotheses.

Machines to compare notes online?

Jul 15, 2011

The best way for autonomous machines, networks and robots to improve in future will be for them to publish their own upgrade suggestions on the Internet. This transparent dialogue will help humans to both guide and trust ...

Recommended for you

Computerized emotion detector

15 hours ago

Face recognition software measures various parameters in a mug shot, such as the distance between the person's eyes, the height from lip to top of their nose and various other metrics and then compares it with photos of people ...

Cutting the cloud computing carbon cost

Sep 12, 2014

Cloud computing involves displacing data storage and processing from the user's computer on to remote servers. It can provide users with more storage space and computing power that they can then access from anywhere in the ...

Teaching computers the nuances of human conversation

Sep 12, 2014

Computer scientists have successfully developed programs to recognize spoken language, as in automated phone systems that respond to voice prompts and voice-activated assistants like Apple's Siri.

Mapping the connections between diverse sets of data

Sep 12, 2014

What is a map? Most often, it's a visual tool used to demonstrate the relationship between multiple places in geographic space. They're useful because you can look at one and very quickly pick up on the general ...

User comments : 0