Linguist uses Internet to study how we say things

Jan 04, 2010 By Linda Glaser

( -- Mats Rooth, a Cornell linguist, will use software to study distinctions of prosody (rhythm, stress and intonation) in language by hunting for word patterns on the Internet.

How would you analyze the contents of a million books? Or a million podcasts? Mats Rooth, Cornell professor of linguistics and computing and information sciences, will do it by using software to search for word patterns in text transcriptions of audio and video files.

Rooth is one of eight winners of an international competition, Digging into Data, that challenged scholars to devise innovative humanities and social science research projects using large-scale data analysis. His project, Harvesting Speech Datasets for Linguistic Research on the Web, is based on a pilot project Rooth conducted with graduate student Jonathan Howell. It will look at distinctions of prosody (rhythm, stress and intonation) in spoken .

According to Rooth, native speakers easily identify what prosody is appropriate in a given sentence, but hypotheses explaining why people have this ability have been controversial to prove because of the difficulty of identifying enough examples of a given phenomenon. "Many of the things we study are so immediate and yet so subtle," he said.

Using the Internet to harvest hundreds or thousands of examples of spontaneous rather than lab-created use of word patterns will enable researchers to evaluate theories about the form and meaning of prosody on an unprecedented scale. Rooth expects his project to have a transformative effect on the understanding of prosody.

"I'm very excited," Rooth said. "It's a new methodology, and we think a lot of new information will come out."

Four leading research agencies sponsored the Digging into Data competition, with the intention of encouraging international partnerships: the National Endowment for the Humanities, the National Science Foundation, the United Kingdom's Joint Information Systems Committee, and Canada's Social Sciences and Humanities Research Council. Approximately $2 million will be divided among the eight winners.

Linguist Michael Wagner of McGill University is Rooth's international partner on the project. The Cornell team will be responsible for data retrieval and programming, while McGill researchers will focus on data analysis.

The computer programs, datasets and research products developed in the project will be openly available to the research community via a Web site,… ody/Prosody+Datasets . The Web site already contains a sample dataset which, when played, provides a fascinating cacophony of voices saying "than I did," demonstrating the wide range of meaning arising from varied intonation.

Explore further: Computer scientists can predict the price of Bitcoin

Related Stories

'Digging into Data Challenge' grant awarded

Dec 04, 2009

A professor at Tufts University will lead a team of international researchers to explore how humanities scholars can use data analysis to track topics about the Greco-Roman world as they appear in a million documents, spanning ...

Facial expressions say more than 1,000 words

Oct 15, 2008

People talk to exchange information. Yet understanding another person involves far more than just the content of the message. Only with the correct intonation and facial expression does the message acquire meaning. People ...

Scholar helps classify clicks in African languages

Oct 22, 2009

( -- Linguistics scholar Amanda Miller is doing research with high-speed ultrasound technology to help her and fellow researchers successfully record and classify clicks in an endangered African ...

Language of music really is universal, study finds

Mar 19, 2009

Native African people who have never even listened to the radio before can nonetheless pick up on happy, sad, and fearful emotions in Western music, according to a new report published online on March 19th in Current Biology. The re ...

IBM Researchers Lower Language Barrier With Text Translator

Nov 23, 2009

IBM Researchers are helping to break the language barrier with the advent of technology dubbed "n.Fluent" -- smart software that translates text between English and 11 other languages. IBM employees use it to instantaneously ...

Recommended for you

Tablets, cars drive AT&T wireless gains—not phones

5 hours ago

AT&T says it gained 2 million wireless subscribers in the latest quarter, but most were from non-phone services such as tablets and Internet-connected cars. The company is facing pricing pressure from smaller rivals T-Mobile ...

Twitter looks to weave into more mobile apps

5 hours ago

Twitter on Wednesday set out to weave itself into mobile applications with a free "Fabric" platform to help developers build better programs and make more money.

Blink, point, solve an equation: Introducing PhotoMath

7 hours ago

"Ma, can I go now? My phone did my homework." PhotoMath, from the software development company MicroBlink, will make the student's phone do math homework. Just point the camera towards the mathematical expression, ...

Google unveils app for managing Gmail inboxes

7 hours ago

Google is introducing an application designed to make it easier for its Gmail users to find and manage important information that can often become buried in their inboxes.

User comments : 0