Linguist uses Internet to study how we say things

Jan 04, 2010 By Linda Glaser

(PhysOrg.com) -- Mats Rooth, a Cornell linguist, will use software to study distinctions of prosody (rhythm, stress and intonation) in language by hunting for word patterns on the Internet.

How would you analyze the contents of a million books? Or a million podcasts? Mats Rooth, Cornell professor of linguistics and computing and information sciences, will do it by using software to search for word patterns in text transcriptions of audio and video files.

Rooth is one of eight winners of an international competition, Digging into Data, that challenged scholars to devise innovative humanities and social science research projects using large-scale data analysis. His project, Harvesting Speech Datasets for Linguistic Research on the Web, is based on a pilot project Rooth conducted with graduate student Jonathan Howell. It will look at distinctions of prosody (rhythm, stress and intonation) in spoken .

According to Rooth, native speakers easily identify what prosody is appropriate in a given sentence, but hypotheses explaining why people have this ability have been controversial to prove because of the difficulty of identifying enough examples of a given phenomenon. "Many of the things we study are so immediate and yet so subtle," he said.

Using the Internet to harvest hundreds or thousands of examples of spontaneous rather than lab-created use of word patterns will enable researchers to evaluate theories about the form and meaning of prosody on an unprecedented scale. Rooth expects his project to have a transformative effect on the understanding of prosody.

"I'm very excited," Rooth said. "It's a new methodology, and we think a lot of new information will come out."

Four leading research agencies sponsored the Digging into Data competition, with the intention of encouraging international partnerships: the National Endowment for the Humanities, the National Science Foundation, the United Kingdom's Joint Information Systems Committee, and Canada's Social Sciences and Humanities Research Council. Approximately $2 million will be divided among the eight winners.

Linguist Michael Wagner of McGill University is Rooth's international partner on the project. The Cornell team will be responsible for data retrieval and programming, while McGill researchers will focus on data analysis.

The computer programs, datasets and research products developed in the project will be openly available to the research community via a Web site, confluence.cornell.edu/display… ody/Prosody+Datasets . The Web site already contains a sample dataset which, when played, provides a fascinating cacophony of voices saying "than I did," demonstrating the wide range of meaning arising from varied intonation.

Explore further: Powerful new software plug-in detects bugs in spreadsheets

Related Stories

'Digging into Data Challenge' grant awarded

Dec 04, 2009

A professor at Tufts University will lead a team of international researchers to explore how humanities scholars can use data analysis to track topics about the Greco-Roman world as they appear in a million documents, spanning ...

Facial expressions say more than 1,000 words

Oct 15, 2008

People talk to exchange information. Yet understanding another person involves far more than just the content of the message. Only with the correct intonation and facial expression does the message acquire meaning. People ...

Scholar helps classify clicks in African languages

Oct 22, 2009

(PhysOrg.com) -- Linguistics scholar Amanda Miller is doing research with high-speed ultrasound technology to help her and fellow researchers successfully record and classify clicks in an endangered African ...

Language of music really is universal, study finds

Mar 19, 2009

Native African people who have never even listened to the radio before can nonetheless pick up on happy, sad, and fearful emotions in Western music, according to a new report published online on March 19th in Current Biology. The re ...

IBM Researchers Lower Language Barrier With Text Translator

Nov 23, 2009

IBM Researchers are helping to break the language barrier with the advent of technology dubbed "n.Fluent" -- smart software that translates text between English and 11 other languages. IBM employees use it to instantaneously ...

Recommended for you

Researchers developing algorithms to detect fake reviews

Oct 21, 2014

Anyone who has conducted business online—from booking a hotel to buying a book to finding a new dentist or selling their wares—has come across reviews of said products and services. Chances are they've also encountered ...

User comments : 0