New study suggests Voynich text is not a hoax

Jun 24, 2013 by Bob Yirka report
Comparison of the Voynich manuscript and different information carrying sequences. A) Information in word distribution as a function of the scale for the Voynich manuscript compared to other five language and symbolic sequences (F: Fortran; C: Chinese; V: Voynich; E: English; L: Latin; Y: yeast DNA). The number of words in all sequences was equal to that of the Voynich text; if the original sequence was longer, the additional words were not considered. B) Scale of maximal information for the sequences considered in A. Credit: doi:10.1371/journal.pone.0066344.g001

(Phys.org) —Theoretical physicist Marcelo Montemurro and colleague Damián H. Zanette have published a paper in the journal PLOS ONE claiming that the Voynich text is likely not a hoax as some have suggested. The two researchers along with others at the University of Manchester in the U.K. analyzed a digital copy of the text and say that computer assisted analyses of the "book" suggest it does harbor meaning, though what that might be is still a mystery.

The Voynich text is a book made up of 104 folios—each page has graphemes (arrays of characters) and drawings on it. It first came to light in 1912 when Wilfrid Voynich claimed to have found it in an Italian Monastery. The graphemes suggest words made up of characters that do not appear in any other known language. Since the time of its discovery, various researchers have sought to determine if the text is written in an unknown language, or if it is instead a book created by someone as a hoax. Adding to the mystery of the text are the drawings of plants on most of the pages—none of them are known to exist in nature. Carbon dating of the text suggests it was created sometime in the 1400s—but that that doesn't offer proof that the writing on the parchment was done during that period, leaving some to suggest it was Voynich himself who created the characters and drawings. To date, no one has been able to prove whether the text has meaning or if it is simply pages of gibberish. To learn more, Montemurro and his team turned to advanced .

To analyze the text, researchers assign letters to characters; this allows for the application of algorithms. In this case, the team looked at global patterns of "words" that appear throughout the text. This process represents a novel way to view the semantics. One type of pattern distribution known as "" allows researchers to compare documents to one another using a computer. The method offers a single number that describes the complexity of the text. The Voynich text received a score of 805, compared to 728 for text samples written in English and 580 for those written in Chinese. A comparison of the Voynich score to yeast DNA samples (25) and a program written in Fortran (285) suggests the Voynich text is more complicated than simple gibberish.

The team notes that the text also conforms to Zipf's law—it states that words in real languages are inversely proportional their rank in a frequency table. Taken together, the researchers conclude that the Vonynich text mostly likely contains real information and thus, is not a .

Explore further: Artificial intelligence identifies the musical progression of the Beatles

More information: Montemurro MA, Zanette DH (2013) Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis. PLoS ONE 8(6): e66344. doi:10.1371/journal.pone.0066344

Abstract
The Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the text written on medieval parchment -using an unknown script system- shows basic statistical patterns that bear resemblance to those from real languages, there are features that suggested to some researches that the manuscript was a forgery intended as a hoax. Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book.

Related Stories

Sore thumbs? US text messaging declines

May 02, 2013

Americans are saying goodbye to text messaging, a wireless industry group says, as Internet-based applications such as Apple's Messages are starting to taking over from what was once a cash cow for phone companies.

Texting affects ability to interpret words

Feb 20, 2012

(Medical Xpress) -- Research designed to understand the effect of text messaging on language found that texting has a negative impact on people's linguistic ability to interpret and accept words.

Recommended for you

Designing exascale computers

Jul 23, 2014

"Imagine a heart surgeon operating to repair a blocked coronary artery. Someday soon, the surgeon might run a detailed computer simulation of blood flowing through the patient's arteries, showing how millions ...

User comments : 26

Adjust slider to filter visible comments by rank

Display comments: newest first

Infinite Fractal Consciousness
4.6 / 5 (7) Jun 24, 2013
The theory that rang with the most truth, to me, was that it was an expensive prop for a traveling "healer," drawn by someone who was illiterate but familiar with writing. The prop gives legitimacy to the healer's claims of "ancient and secret wisdom." Being familiar with writing, the artist emulated features of authentic writing.
praos
1.3 / 5 (13) Jun 24, 2013
Italy is next door to Croatia, with very strong bilingual ties; try Croatian.
fmfbrestel
5 / 5 (7) Jun 24, 2013
@ Praos

I cant tell if your being serious and just skimmed the article missing a bunch of key information, or your just troll baiting.

It's not Croatian.
FredB
3.2 / 5 (11) Jun 24, 2013
Actually that test for complexity just says it is not random strings of gibberish. If you look at that complexity measured on works of fiction, you get a high number, as there is plenty of complexity. However that doesn't mean deliberately concocted works of untruth are somehow true, just because they are complicated. Where I come from a "tightly woven plot" means tons of complexity, but it doesn't imply true.
fmfbrestel
4.4 / 5 (7) Jun 24, 2013
Actually that test for complexity just says it is not random strings of gibberish.


That's all that's being claimed -- that the text isn't gibberish. What are you trying to refute?
no fate
3.7 / 5 (3) Jun 24, 2013
Actually that test for complexity just says it is not random strings of gibberish.


That's all that's being claimed -- that the text isn't gibberish. What are you trying to refute?


I think his point is that there is a large gap between -not gibberish- and useful
fmfbrestel
5 / 5 (3) Jun 24, 2013
@No fate:

Well, "useful" is not a term used in the article. They claim "genuine message" and "real information". Both terms are obvious analogues of "not gibberish".
gurloc
3 / 5 (4) Jun 24, 2013
Structure is not evidence for there being a message. For example if you used a real language to help keep track of what nonsensical symbols you are writing you would still pass this structure test without necessarily having meaning or being a workable translation of whatever you used as a guide.

Without knowing the details of their algorithm I would think that too high a score could indicate the author of the text was lazy and reused the same pattern repeatedly. Which, given human nature, is much more likely than someone putting down symbols in a random pattern.

It is a big leap from "not random" to "genuine message".

You can never prove that a one-off text isn't a personal cipher that you can't decode. The imaginary plants are which actually points to it being a hoax.
antialias_physorg
3.2 / 5 (6) Jun 24, 2013
A comparison of the Voynich score to yeast DNA samples (25) and a program written in Fortran (285) suggests the Voynich text is more complicated than simple gibberish.

As programs go one written in Fortran is certainly a good standard for gibberish (couldn't resist)

Well, "useful" is not a term used in the article. They claim "genuine message" and "real information". Both terms are obvious analogues of "not gibberish".

To be fair though there is a difference between information (as defined by information theory) and meaningful information. You can create many sentences in english that have correct grammar (and hence would show up under such analysis as "information carrying") but which mean absolutely nothing.

ValeriaT
1.6 / 5 (15) Jun 24, 2013
IMO it's a list of prohibited herbal drugs, i.e. the plants, the usage of which had been prohibited in medieval times for various "witchcraft" purposes up to level, so that even their description must remained secret.
Shabs42
5 / 5 (3) Jun 24, 2013
Once again, XKCD wins: http://xkcd.com/593/

Probably just an obscure work of fiction by somebody with a talent for languages or ciphers. Still would be really fun to actually translate and figure out what it says. I don't think anyone is expecting life altering secrets to emerge from it.
baudrunner
2 / 5 (12) Jun 24, 2013
In my opinion, this is a work done by a partially deaf, illiterate person with a severe learning disability and a serious speech impediment, who had very little or no exposure to the printed word, or any form of education for that matter, because he would have been considered an idiot by the community's lay population. Armed nonetheless with a very high degree of intelligence, he created his own written language with its own inexplicable rules, as he would have yet been provided with the means to record things on paper in some form, as the authorities (clerics) in those days thought these people to have been 'touched', and that they were 'holy', and therefore that somehow God might communicate through them. The work may yet be some form of ordered nonsense, with a simple but undecipherable pattern. In this way, the author might acquire some modicum of respect, in spite of his disabilities.
Ober
5 / 5 (1) Jun 24, 2013
A test I like to do when examining encrypted data, is a compressibility test. The basic idea behind encryption is to scramble the data so much that frequency analysis is useless. Compressing data makes use of frequency analysis, regardless of the algorithm you use. For example ZIP is merely a hoffman tree, ie binary tree loaded via frequency analysis. So if data has been encrypted, rather than pure gibberish, the encrypted text will have less compressibility. It's "DISORDER" measure, will be very high, if it is encrypted well. So any text may have more than one level of "intelligence" invoked on it. The first can be plain text, ie a story. The next level is then encryption. This can go on and on.
Of course if it isn't encrypted (high compressibility), then we know there are many repeated constructs. Then it is up to a Symantec processor to see if it matches the constructs of any known language. It's a form of "pattern" recognition if you like.
Infinum
2 / 5 (8) Jun 24, 2013
Statistical analysis of text is hardly high-end science. I am surprised it took 100 years to perform such a basic test. I wonder if "Lorem Ipsum ..." has the same properties.
Skepticus
1.6 / 5 (7) Jun 24, 2013
Decode this:
XY<-gacuu .
If you "experts" can't, I'd say all they are doing are just guessing and if they fail, they would blame "hoax" for their ignorance..., pride, stupidity, study grants, tenures??>
Ducklet
2 / 5 (8) Jun 24, 2013
I'm not sure I understand why they think high entropy is indicative of a real language. High entropy is indicative that it's a random sequence or encrypted (such as substitutions made from a random table). The result would match English where individual characters have been substituted from a small table, leading to a slight increase in entropy, but retaining the overall structure.
Howhot
5 / 5 (3) Jun 24, 2013
Ahh, the good old, Voynich text. Great book to cozy up in bed with. Such an interesting plot with all of the little twists and artwork to keep the readers interest (satire of course). In all seriousness, I'm a coder and an artistic type so this book fascinated me when I first saw it. I was hoping that chapter by chapter you could pick up broad meaning from the pictures in each, on section is about plants, another about women birthing, another looks like it's about astronomy and another looks to be about math. Every chapter seems to have a code-wheel that to me looks like the key to cipher. So each chapter would have it's own cipher. It's all hand painted page by page, and the whole book had to take years make.

Thats what makes me believe its not a hoax either. It looks like a diary of what a Monk might write as observations of subjects without the church's influences. For example the taboo subject of women in birthing tubs in appearance. Fascinating and weird.

NikFromNYC
1.5 / 5 (8) Jun 24, 2013
Terence Mckenna said the same.
Ober
5 / 5 (1) Jun 25, 2013
Skepticus, you obviously haven't dealt with cyphers before. You need LOTS of text to analyse to gain any sort of meaning.
Midcliff
2 / 5 (4) Jun 25, 2013
It was obviously dropped by an alien and it's just a keepsake from home.
antialias_physorg
3 / 5 (2) Jun 25, 2013
Statistical analysis of text is hardly high-end science. I am surprised it took 100 years to perform such a basic test.

If you look at the manuscript you may notice that it's sometimes difficult to identify what makes up a 'letter', or a word. Getting at the basic units of the text is already very hard.

You need LOTS of text to analyse to gain any sort of meaning.

And you also need some sort of crib or partial translation. Otherwise you just have the 'Chinese room' problem. Yes, you will eventually be able to produce valid 'Voynich-sentences' (and be able to distinguish valid from non-valid ones) - but it still won't tell you what they mean.

But in all seriousness: It looks like a (much better) version of the the fantasy doodles of animals and star maps I used to draw as a kid. And the 'information' could just be the natural tendency to repeat pleasing shapes.

But nevertheless an interesting exercise in cryptography.
Doug_Huffman
1.4 / 5 (9) Jun 25, 2013
Quite a coincidence with the current NSA PRISM kerfuffle and the article here today on password fatigue.

So many are crowing the hypothetical omnipotence of the National Supercomputer Agency, perhaps the Voynich would be a better subject than your pornography.

A password of Roman analogues of Voynich 'letters' might be useful.
Feldagast
1.7 / 5 (6) Jun 25, 2013
I liked this explanation
http://edithsherw...ndex.php
lonewolfmtnz
3 / 5 (2) Jun 27, 2013
Misery Monkeys can't be bothered to pay-attention (lend credence) to demonstrable facts invading in their very faces, so how could anyone give a flaccid rat's ass what some mythical voodoo con-man crafted centuries ago? The 'good news' is things that cannot continue indefinitely DON'T.
Moebius
1.7 / 5 (6) Jun 30, 2013
The theory that rang with the most truth, to me, was that it was an expensive prop for a traveling "healer," drawn by someone who was illiterate but familiar with writing. The prop gives legitimacy to the healer's claims of "ancient and secret wisdom." Being familiar with writing, the artist emulated features of authentic writing.


That would only make sense if the plants were real.
meBigGuy
5 / 5 (1) Jun 30, 2013
People sure get mixed up between information and meaning. A purely random string of characters is uncompressable and as such contains more information (and higher entropy) than an equally long proper sentence. The latter has meaning, the former does not.
The article's use of entropy seems in reverse of what I expected. Lower entropy implies more structure and compressibility, and, possibly, meaning.

Higher entropy implies less structure/predictablility, so unless the text is scrambled or compressed it would appear to be gibberish. I'm not sure what measure they are using such that yeast DNA and Fortran get lower numbers. They called it "entropy" with quotes.
.