Tracing languages back to their earliest common ancestor through sound shifts
A team of researchers in the U.S. and U.K. has developed a statistical technique that sorts out when changes to words' pronunciations most likely occurred in the evolutionary history of related languages.
Their model, presented recently in the journal Current Biology, gives researchers a renewed opportunity to trace words and languages back to their earliest common ancestor or ancestors – potentially thousands of years further into prehistory than previous techniques can do with any statistical rigor.
Led by Santa Fe Institute Professor Tanmoy Bhattacharya and University of Reading Professor Mark Pagel, the technique detects historical "concerted sound changes" – a linguistic evolutionary phenomenon where a specific sound changes to another specific sound in many words simultaneously.
For example, the modern languages of English and Latin descended from a common predecessor called proto-Indoeuropean. In English, the words father and foot took on an initial f sound, but in Latin those words retained their p sound, as in pater and ped. This transition occurred across the English language in many words that had featured a p sound.
The researchers tested their new model on Turkic, a family of at least 35 languages spoken by Turkic peoples from Southeastern Europe and the Mediterranean to Siberia and Western China. Their computer analysis automatically considered and evaluated the likelihood (without the potentially biased input of humans) that more than 70 regular sound changes had occurred throughout the 2000-year history of the Turkic languages.
An example is the word pas (head in English) in the Khakassian language. In Turkish, Uzbek, and 16 other Turkic languages the initial sound is b instead, yielding baš. Similarly, pel- (meaning louse) in Khakassian is bil- or bel- in the other languages. The ubiquity of this sound difference strongly supports the hypothesis that a regular sound shift occurred.
"Computers so far have mainly used the presence or absence of words with a common origin in various languages to stitch together trees that describe the descent of the various languages from a common ancestor," say Bhattacharya. "This has left out the vastly richer data residing in sounds, primarily because sound changes in different words are not independent, as most mutations in genetics are, for example."
In the paper, the researchers developed the mathematics for detecting and evaluating hypotheses concerning concerted sound changes and showed that the technique provided vastly superior dated trees for the Turkic language family than previous methods.
"Regular correspondences between sounds in different languages have long been an important test for establishing linguistic relatedness in traditional historical linguistics," Bhattacharya says. "Being able to detect them automatically and score them within a probabilistic framework is a major step forward. Such a probabilistic framework may be able to help us detect and evaluate signals of very old regular sound changes that are much weaker than what is possible to determine unambiguously in a manual fashion."
"Our new method is another exciting step to understanding how languages and genes evolve," says Pagel. "It will allow us to go back in time further than before, making it possible to reconstruct ancient proto-languages, words that might have been spoken many thousands of years ago."