Entropy study suggests Pictish symbols likely were part of a written language

Jun 10, 2010 by Lisa Zyga report
This Pictish symbol stone shows two symbols: a divided rectangle with a Z rod and a triple disc, as well as battle imagery. Image credit: Lee, et al.

(PhysOrg.com) -- How can you tell the difference between random pictures and an ancient, symbol-based language? A new study has shown that concepts in entropy can be used to measure the degree of repetitiveness in Pictish symbols from the Dark Ages, with the results suggesting that the inscriptions appear be much closer to a modern written language than to random symbols. The Picts, a group of Celtic tribes that lived in Scotland from around the 4th-9th centuries AD, left behind only a few hundred stones expertly carved with symbols. Although the symbols appear to convey information, it has so far been impossible to prove that this small sample of symbols represents a written language.

As the team of UK researchers including Rob Lee of the University of Exeter explains, a fundamental characteristic of any is that there is a degree of uncertainty of character occurrence. This uncertainty can also be thought of as entropy or information, since it differs from arranged randomly or in a simple repetitive pattern.

In their study, the UK researchers analyzed different texts in terms of the average uncertainty of a character’s appearance when the preceding character is known. The less uncertainty (or the more difficult it is to predict the second character), the greater the entropy of the character pair, and the greater the probability that the characters are part of a written language. The researchers call this measurement the Shannon di-gram entropy, where the term “di-gram” refers to two characters.

However, when applying this measurement to more than 400 datasets of known written languages, the researchers found that it didn’t work for small sample sizes due to their insufficient representation of characters, or “incompleteness.” This shortcoming posed a problem for the small sample sizes of Pictish symbols.

To confront this challenge, the researchers had to modify the entropy . They proposed that a sample’s degree of incompleteness could be determined by the ratio of the number of different di-grams to the number of different uni-grams (single characters) in the sample. In other words, a sample with more ways of pairing its characters has a greater completeness than samples with relatively few ways of pairing them. Since the Shannon di-gram entropy depends upon this measurement of the degree of completeness, the measurement offers a way to normalize the Shannon di-gram entropy. The researchers also accounted for the fact that some languages (e.g. Morse code) have more repetitious di-grams than others by proposing a di-gram repetition factor (the ratio of the number of di-grams that appear only once in a sample to the total number of di-grams in the sample).

The researchers showed that the degree of completeness and the di-gram repetition factor could be calculated for any sample size. They demonstrated how to use the two parameters to identify characters as words, syllables, letters, and other elements of language. For the small set of Pictish symbols, the researchers concluded that the symbols likely correspond to words, based on their degree of Shannon di-gram modified by these two parameters. In addition to showing that it’s very unlikely that the Pictish are simply random pictures, these methods used with verbal datasets could be applied to investigating the level of information communicated by animal languages, which are often hampered by small sample datasets.

Explore further: Archaeologists document highest altitude ice age human occupation in Peruvian Andes

More information: Rob Lee, Philip Jonathan, and Pauline Ziman. “Pictish symbols revealed as a written language through application of Shannon entropy.” Proc. R. Soc. A, doi:10.1098/rspa.2010.0041
via: CERN Courier

Related Stories

Sign language puzzle solved

Dec 15, 2009

(PhysOrg.com) -- Scientists have known for 40 years that even though it takes longer to use sign language to sign individual words, sentences can be signed, on average, in the same time it takes to say them, ...

Computerized treatment of manuscripts

Sep 06, 2007

Researchers at the UAB Computer Vision Centre working on the automatic recognition of manuscript documents have designed a new system that is more efficient and reliable than currently existing ones.

Researcher finds optimal fix-free codes

Apr 03, 2009

(PhysOrg.com) -- More than 50 years after David Huffman developed Huffman coding, an entropy encoding algorithm used for lossless data compression in computer science and information theory, an electrical ...

XMM-Newton probes formation of galaxy clusters

Aug 31, 2005

ESA’s X-ray observatory, XMM-Newton, has for the first time allowed scientists to study in detail the formation history of galaxy clusters, not only with single arbitrarily selected objects, but with a complete ...

Recommended for you

Cloning whistle-blower: little change in S. Korea

8 hours ago

The whistle-blower who exposed breakthrough cloning research as a devastating fake says South Korea is still dominated by the values that allowed science fraudster Hwang Woo-suk to become an almost untouchable ...

User comments : 12

Adjust slider to filter visible comments by rank

Display comments: newest first

3 / 5 (1) Jun 10, 2010
"This uncertainty can also be thought of as entropy or information"

I know entropy has a lot of differing uses in different fields, but the classical definition is "a measure of change in an ordered system", or chaos... That sure doesn't fit their description "a measure of order" which is basically the opposite of the classical definition.
not rated yet Jun 10, 2010
The definition of entropy has changed significantly based on what sort of system you're referring to. Within closed systems entropy is defined as the transit from a more ordered to less ordered state.

Entropy can also been seen as the increase in information of a system (open or closed).
3 / 5 (2) Jun 10, 2010
In this case I'd say they were using Shannon's entropy from information theory.

That said this is not really an impressive piece of scientific work, since it simply means they tallied up how often each type of symbol was used and applied the entropy formula to it (which is an effort that should take...oh...a good hour or two)
3.4 / 5 (5) Jun 10, 2010
It's not surprising to me.

I have often found it absurd when archeologists and anthropologists make the claim that certain advanced ancient civilizations had no form of writing (particularly south american and central american natives.)

While they may not have had pen and paper, it is completely un-believable to claim their degree of accuracy in astronomy, as well as the engineering feats in pyramid making, city building, and irrigation which were done, and all supposedly with no written language.

A particular example which I mention is Puma Punku, which archeologists claim the makers had no writing. This is impossible, because the original tmeple structures at Puma Punku would rival even the egyptian pyramids in terms of advancement if they stood to this day. This requires pythagorean theorem and other advanced geometry and trigonometry principles that nobody could learn and teach for generations without writing of some form.
3.7 / 5 (3) Jun 11, 2010
The earliest symbols, including rock paintings, are most likely prompts and aids for story tellers and story telling. Thus the ratio of information presented to interpretation needed changed progressively over time from 100% story teller with no symbols to near 100% symbolic representation with no interpretation needed.

Considering the evolution of language and art in this context puts art, story telling and symbolic language into perspective. The three forms, once inseparably entwined, now have separate and independent forms as well as various mixtures/blends depending on the medium of presentation eg film, novel, text or reference book etc.
1 / 5 (3) Jun 11, 2010

The word ENTROPY in its simple version of explication does offers a meaning to understand = "AN ORDER TO BREAK INTO DISORDER"

The integral element of an Entropy is the Orderliness created by the Thermal-Valuity ( A Thermal Quality... dominating the conditioning values to conduct changes in the entity to ahead for completion of its beingness)

Example: Decrepitude, dilapidation or decay is a modulus of entropy to measure all of an age nearing the completion. A signature of an energy working within us whose latent powers viz... as an Orderliness of the Light, Heat, Electric and Tithonicity (chemical) are aheading to break into entities from which it was made-up of.

ENTROPY is an Quantum Creative Act... a process Annihilation` to the process Creation` for an entity through out its journey of being a Being in limitation of Time & Space as a flux of Light rendering Heat and Electric for the Tithonicity is a process Procreation`.

~astral scientist
5 / 5 (2) Jun 11, 2010
Entropy has a different meaning in information theory.

Since we are talking about language here we are talking about information - not thermodynamics.

People have tried, in the past, to equate the two but that has failed (the information paradox being presented by black holes being just one of the troubles).

So we must accept that there are two definitions of entropy: one for a physical state of order in a closed system and one for the information inherent in a string of symbols.

For the latter the entropy is computed as
H = -Sum[Pxi * log(Pxi)]

Where Pxi is the probability of symbol xi appearing.

If you take the log to base two then the unit is [bit].

E.g. If you have 8 symbols which are all equally probable (Pxi = 1/8 for all i) then the information inherent in the symbols is 3bit.

A language would simply have an entropy lower than a random string of symbols (because the random string would contain all symbols equally likely and therefore have maximal entropy)
1 / 5 (2) Jun 11, 2010
RobertKarl, above, gives the obvious assessment. This stone is certainly a narration of a story. prompting the storyteller of chronological events or perhaps a partial life narration of an important elder. It can be thought of as the gray zone between a comic book and written language. The uniform size, spacing and standardized, but detail-varying patterns, show these are not random drawings.
not rated yet Jun 11, 2010
A language would simply have an entropy lower than a random string of symbols (because the random string would contain all symbols equally likely and therefore have maximal entropy)

Perfect comprehension of open system entropy, bravo.
1 / 5 (2) Jun 12, 2010
Entropy has a different meaning in information theory.

Since we are talking about language here we are talking about information - not thermodynamics.

People have tried, in the past, to equate the two but that has failed (the information paradox being presented by black holes being just one of the troubles).

Professor Hawking solved this a decade ago, wot wot? Unless you can defeat his math.

Can you defeat his math? No? Pity.
5 / 5 (1) Jun 12, 2010
If you go to the wikipedia definition of entropy in thermodynamics and information theory:


They list a number of points where the understanding of one doesn't completely mesh with the other.

Hawking's theories are still without observation - so like String theory the vote is still out on that one.

especially in quantum theory the equality is dubious
not rated yet Jun 13, 2010
A language would simply have an entropy lower than a random string of symbols (because the random string would contain all symbols equally likely and therefore have maximal entropy)

I Agree,but arent there states in between of language and randomness?
Whatever a human draws wont be absolutely random (except in modern art maybe),as he/she will be trying to tell something,but it could still not have the necessary characteristics to be written language.
What is the entrophy of poor grammar? =P
I dont think im getting it x)