Formula to detect an author's literary 'fingerprint'

Dec 10, 2009

Using literature written by Thomas Hardy, DH Lawrence and Herman Melville, physicists in Sweden have developed a formula to detect different authors' literary 'fingerprints'.

New research published today, Thursday 10 December, in (co-owned by the Institute of Physics and German Physical Society), describes a new concept from a group of Swedish physicists from the Department of Physics at Umeľ University called the meta book which uses the frequency with which authors use new words in their to find distinct patterns in authors' written styles.

For more than 75 years George Kingsley Zipf's maxim, based on a carefully selected compilation of American English called Brown Corpus, suggested a universal pattern for the frequency of new words used by authors. Zipf's law suggests that the frequency ranking of a word is inversely proportional to its occurrence.

New research suggests however that the truth behind word frequency is less universal than Zipf asserted and has more to do with the author's linguistic ability than any over-arching linguistic rule.

The researchers first found that the occurrence of new words in the texts by Hardy, Lawrence and Melville did begin to drop off in their texts as their book gets longer, despite new settings and plot-twists.

Their evidence also shows however that the rate of unique word drop-off varies for different authors and, most significantly, is consistent across the entire works of any one of the three authors they analysed.

The statistical analysis was applied to entire novels, sections from novels, complete works and amalgamations from different works by the same authors - they all had a unique word-frequency 'fingerprint'.

By using the statistical patterns evident from their study, the researchers have pondered the idea of a meta-book - a code for each author which could represent their entire work, completed or in the mental pipeline.

As the researchers write, "These findings lead us towards the meta book concept - the writing of a text can be described by a process where the author pulls a piece of text out of a large mother book (the meta book) and puts it down on paper. This meta book is an imaginary infinite book which gives a representation of the word frequency characteristics of everything that a certain author could ever think of writing."

More information: The meta book and size-dependent properties of written language, Sebastian Bernhardsson et al. 2009 New J. Phys. 11 123015 (15pp); doi:10.1088/1367-2630/11/12/123015

Source: Institute of Physics (news : web)

Explore further: New insights found in black hole collisions

Related Stories

Revolutionary chefs? Not likely, shows physics research

Jul 10, 2008

However much the likes of Jamie Oliver or Gordon Ramsay might want to shake up our diets, culinary evolution dictates that our cultural cuisines remain little changed as generations move on, shows new research, published ...

Recommended for you

New insights found in black hole collisions

Mar 27, 2015

New research provides revelations about the most energetic event in the universe—the merging of two spinning, orbiting black holes into a much larger black hole.

X-rays probe LHC for cause of short circuit

Mar 27, 2015

The LHC has now transitioned from powering tests to the machine checkout phase. This phase involves the full-scale tests of all systems in preparation for beam. Early last Saturday morning, during the ramp-down, ...

Swimming algae offer insights into living fluid dynamics

Mar 27, 2015

None of us would be alive if sperm cells didn't know how to swim, or if the cilia in our lungs couldn't prevent fluid buildup. But we know very little about the dynamics of so-called "living fluids," those ...

Fluctuation X-ray scattering

Mar 26, 2015

In biology, materials science and the energy sciences, structural information provides important insights into the understanding of matter. The link between a structure and its properties can suggest new ...

Hydrodynamics approaches to granular matter

Mar 26, 2015

Sand, rocks, grains, salt or sugar are what physicists call granular media. A better understanding of granular media is important - particularly when mixed with water and air, as it forms the foundations of houses and off-shore ...

User comments : 5

Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Dec 10, 2009
It will be great for distingushing of spammers in discussions, for various parental checks etc. Even better, the computer keyboards can implement protocol, in which they would record the frquency of keystrokes and send it over net, so that every people on the net would be identifiable in such way. I'm pretty sure, these controls will be implemented at the moment, when technology will enable it. And because evolution in this field is undeniable, this "Big Brother" perspective is undeniable, too.
1 / 5 (1) Dec 10, 2009
This resembles another idea I've read about nearly 10 years ago: By first compressing a sufficient large text corpus of individual authors they gathered parameters of word patterns. By comparing the parameters of texts with unknown authors and the parameters of known authors they could quite reliably identify the authors of those texts.

But, Alexa, this has not really a BigBrother perspective because the majority of internet texters doesn't use much more than the basic active vocabulary of about 200 words und thus they are not really identifiable.
not rated yet Dec 11, 2009
Well, there is a "Big Brother' issue, frajo. It's not the low-vocabulary morons who make points with the reading public. To put it in context, while avoiding Godwin's Law, let's posit that the British intelligence service of the day would have had no interest in who painted anti-imperialist slogans on the bathroom walls of Boston taverns --- but great interest in who composed "Common Sense" or "The Rights of Man."
1 / 5 (1) Dec 11, 2009
let's posit that the British intelligence service of the day would have had no interest in who painted anti-imperialist slogans on the bathroom walls of Boston taverns --- but great interest in who composed "Common Sense" or "The Rights of Man."
Ok. But this is no reason for pessimism. They can eliminate the author but they can't eliminate the idea once it is published.
The internet is the next evolutionary step after the printing press. Many frown on it but nobody can stop it.
not rated yet Dec 12, 2009
...and a friend of mine was the first person to make a successful and communicative 'bot' chat program for his BBS, in the wayback days--back when we had BBS's, that is... It was pretty good, considering the limitations of programming back then, and to be done by one guy, well..

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.