Four-thousand years ago, an urban civilization lived and traded on what is now the border between Pakistan and India. During the past century, thousands of artifacts bearing hieroglyphics left by this prehistoric people have been discovered. Today, a team of Indian and American researchers are using mathematics and computer science to try to piece together information about the still-unknown script.

The team led by a University of Washington researcher has used computers to extract patterns in ancient Indus symbols. The study, published this week in the *Proceedings of the National Academy of Sciences,* shows distinct patterns in the symbols' placement in sequences and creates a statistical model for the unknown language.

"The statistical model provides insights into the underlying grammatical structure of the Indus script," said lead author Rajesh Rao, a UW associate professor of computer science. "Such a model can be valuable for decipherment, because any meaning ascribed to a symbol must make sense in the context of other symbols that precede or follow it."

Co-authors are Nisha Yadav and Mayank Vahia of the Tata Institute of Fundamental Research and Centre for Excellence in Basic Sciences in Mumbai; Hrishikesh Joglekar of Mumbai; R. Adhikari of the Institute of Mathematical Sciences in Chennai; and Iravatham Mahadevan of the Indus Research Centre in Chennai.

Despite dozens of attempts, nobody has yet deciphered the Indus script. The symbols are found on tiny seals, tablets and amulets, left by people inhabiting the Indus Valley from about 2600 to 1900 B.C. Each artifact is inscribed with a sequence that is typically five to six symbols long.

Some people have questioned whether the symbols represent a language at all, or are merely pictograms of political or religious icons.

The new study looks for mathematical patterns in the sequence of symbols. Calculations show that the order of symbols is meaningful; taking one symbol from a sequence found on an artifact and changing its position produces a new sequence that has a much lower probability of belonging to the hypothetical language. The authors said the presence of such distinct rules for sequencing symbols provides further support for the group's previous findings, reported earlier this year in the journal Science, that the unknown script might represent a language.

"These results give us confidence that there is a clear underlying logic in Indus writing," Vahia said.

Seals with sequences of Indus symbols have been found as far away as West Asia, in the region historically known as Mesopotamia and site of modern-day Iraq. The statistical results showed that the West-Asian sequences are ordered differently from sequences on artifacts found in the Indus valley. This supports earlier theories that the script may have been used by Indus traders in West Asia to represent different information compared to the Indus region.

"The finding that the Indus script may have been versatile enough to represent different subject matter in West Asia is provocative. This finding is hard to reconcile with the claim that the script merely represents religious or political symbols," Rao said.

The researchers used a Markov model, a statistical method that estimates the likelihood of a future event (such as inscribing a particular symbol) based on patterns seen in the past. The method was first developed by Russian mathematician Andrey Markov a century ago and is increasingly used in economics, genetics, speech-recognition and other fields.

"One of the main purposes of our paper is to introduce Markov models, and statistical models in general, as computational tools for investigating ancient scripts," Adhikari said.

One application described in the paper uses the statistical model to fill in missing symbols on damaged archaeological artifacts. Such filled-in texts can increase the pool of data available for deciphering the writings of ancient civilizations, Rao said.

Source: University of Washington (news : web)

**Explore further:**
Communication-optimal algorithms for contracting distributed tensors