This Science News Wire page contains a press release issued by an organization and is provided to you "as is" with little or no review from Science X staff.

Working group seeks a lingua franca for linguistics research

August 21st, 2015

Over time, English has swirled into dialects so different that speakers from the same country cannot always understand each other. Similarly, linguists – as they have catalogued words, spellings, pronunciations, and meanings – have stylized their individual academic databases to suit the needs of their own research.

In an age of computational linguistics, that can be a problem. Computers offer vastly improved capabilities for finding patterns and connections. But while human brains are good at smoothing over minor inconsistencies, computers tend to be very literal.

And data that can't be understood can't be part of the conversation. "Because of the large quantities of data that can be brought to bear on a problem, for many studies occasional data quality issues are not fatal," explains SFI Professor Tanmoy Bhattacharya, who leads SFI's linguistics program. But, he says, "the next advance in linguistics will need to understand weak signals or complicated histories deep in the data, and in these situations data issues will be very important. We will need to understand how the data being used are selected, curated, and presented."

Further, language databases will need to adopt coding conventions that allow them to talk to one another. "We need to develop a lingua franca for all linguistics databases to speak," he says. "Whatever way databases organize their own data, or speak their own internal dialect, we should be able to translate them all into something universally understandable and answer queries using the same code all others use."

Bhattacharya, SFI Distinguised Fellow Murray Gell-Mann, and longtime SFI collaborator George Starostin are hosting an invitation-only working group this week at SFI to address this challenge. Conventional and computational linguists will evaluate existing relevant online and offline databases, explore optimal data formats, and discuss– perhaps even establish – the most useful programmed analysis tools for historical linguistics research.

"What is going to come of this is the preparation to enable the next big advance in computational linguistics," Bhattacharya says.

More information:
www.santafe.edu/gevent/detail/science/2098/

Provided by Santa Fe Institute

Citation: Working group seeks a lingua franca for linguistics research (2015, August 21) retrieved 23 April 2024 from https://sciencex.com/wire-news/201596009/working-group-seeks-a-lingua-franca-for-linguistics-research.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.