Linguist uses Internet to study how we say things

January 4, 2010 By Linda Glaser

( -- Mats Rooth, a Cornell linguist, will use software to study distinctions of prosody (rhythm, stress and intonation) in language by hunting for word patterns on the Internet.

How would you analyze the contents of a million books? Or a million podcasts? Mats Rooth, Cornell professor of linguistics and computing and information sciences, will do it by using software to search for word patterns in text transcriptions of audio and video files.

Rooth is one of eight winners of an international competition, Digging into Data, that challenged scholars to devise innovative humanities and social science research projects using large-scale data analysis. His project, Harvesting Speech Datasets for Linguistic Research on the Web, is based on a pilot project Rooth conducted with graduate student Jonathan Howell. It will look at distinctions of prosody (rhythm, stress and intonation) in spoken .

According to Rooth, native speakers easily identify what prosody is appropriate in a given sentence, but hypotheses explaining why people have this ability have been controversial to prove because of the difficulty of identifying enough examples of a given phenomenon. "Many of the things we study are so immediate and yet so subtle," he said.

Using the Internet to harvest hundreds or thousands of examples of spontaneous rather than lab-created use of word patterns will enable researchers to evaluate theories about the form and meaning of prosody on an unprecedented scale. Rooth expects his project to have a transformative effect on the understanding of prosody.

"I'm very excited," Rooth said. "It's a new methodology, and we think a lot of new information will come out."

Four leading research agencies sponsored the Digging into Data competition, with the intention of encouraging international partnerships: the National Endowment for the Humanities, the National Science Foundation, the United Kingdom's Joint Information Systems Committee, and Canada's Social Sciences and Humanities Research Council. Approximately $2 million will be divided among the eight winners.

Linguist Michael Wagner of McGill University is Rooth's international partner on the project. The Cornell team will be responsible for data retrieval and programming, while McGill researchers will focus on data analysis.

The computer programs, datasets and research products developed in the project will be openly available to the research community via a Web site, … ody/Prosody+Datasets . The Web site already contains a sample dataset which, when played, provides a fascinating cacophony of voices saying "than I did," demonstrating the wide range of meaning arising from varied intonation.

Explore further: 'Digging into Data Challenge' grant awarded

Related Stories

'Digging into Data Challenge' grant awarded

December 4, 2009

A professor at Tufts University will lead a team of international researchers to explore how humanities scholars can use data analysis to track topics about the Greco-Roman world as they appear in a million documents, spanning ...

Facial expressions say more than 1,000 words

October 15, 2008

People talk to exchange information. Yet understanding another person involves far more than just the content of the message. Only with the correct intonation and facial expression does the message acquire meaning. People ...

Scholar helps classify clicks in African languages

October 22, 2009

( -- Linguistics scholar Amanda Miller is doing research with high-speed ultrasound technology to help her and fellow researchers successfully record and classify clicks in an endangered African language.

Language of music really is universal, study finds

March 19, 2009

Native African people who have never even listened to the radio before can nonetheless pick up on happy, sad, and fearful emotions in Western music, according to a new report published online on March 19th in Current Biology. ...

IBM Researchers Lower Language Barrier With Text Translator

November 23, 2009

IBM Researchers are helping to break the language barrier with the advent of technology dubbed "n.Fluent" -- smart software that translates text between English and 11 other languages. IBM employees use it to instantaneously ...

Recommended for you

Forget oil, Russia goes crazy for cryptocurrency

August 16, 2017

Standing in a warehouse in a Moscow suburb, Dmitry Marinichev tries to speak over the deafening hum of hundreds of computers stacked on shelves hard at work mining for crypto money.

Researchers clarify mystery about proposed battery material

August 15, 2017

Battery researchers agree that one of the most promising possibilities for future battery technology is the lithium-air (or lithium-oxygen) battery, which could provide three times as much power for a given weight as today's ...

Signs of distracted driving—pounding heart, sweaty nose

August 15, 2017

Distracted driving—texting or absent-mindedness—claims thousands of lives a year. Researchers from the University of Houston and the Texas A&M Transportation Institute have produced an extensive dataset examining how ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.