Small words in an email can reveal a person's identity

October 30, 2017, Nottingham Trent University
Credit: CC0 Public Domain

It's possible to identify the author of an email by analysing as little as two words, research by Nottingham Trent University suggests.

Dr David Wright, an expert in forensic linguistics, examined thousands of emails to show it's possible to identify someone by analysing small sequences of and prove them as the author.

The research aims to address the challenges experts face when analysing language evidence in court proceedings or in reports.

Computer scientists use methods such as algorithms and statistical analysis to measure the similarity between texts. However it can be difficult for experts to explain why these techniques could and should distinguish between people's unique writing styles.

As part of the study, Dr Wright analysed thousands of emails from 12 employees at a former energy and correctly identified authors 95% of the time, when the samples were longer than around 1,000 words.

He did this by comparing how often employees used particular sequences of words in their emails.

These word sequences were between two and six words long and were as basic as "Please review and let's discuss" and "A clean and redlined version."

The research is based on thousands of emails from American energy company Enron.

More than 1.7 million emails from the company were released into the public domain and have since been used for research purposes.

By analysing these emails, Dr Wright also found that the way people join small words together is unique to them and is influenced by the different speech and writing they are exposed to in their lifetime.

Dr Wright focused on a case study of one employee in the study, who was a lawyer at the company.

He compared their emails against samples from 175 other employees and discovered that their most distinctive phrases were the five sequences of words "A clean and redlined version" and "Please review and let's discuss."

While other lawyers at the company used phrases beginning with "Please review," they didn't use it in exactly the same way as the lawyer, suggesting that these particular clusters of words were unique to themselves.

Dr Wright, of the university's School of Arts and Humanities, said: "The repetitiveness of these phrases shows that the individual has developed their own tried and tested phrases, which they know will work to get a job done while working in their role of a lawyer.

"This shows that when faced with written evidence in cases, of which authorship is disputed, clues to the writer's identity can reside in small, common, everyday phrases. This may lead to improving the reliability of evidence given to the courts, and ultimately the delivery of justice."

Explore further: Justices to hear government's email dispute with Microsoft

Related Stories

Email language tips off work hierarchy

February 14, 2012

Members of the modern workforce might be surprised to learn that if they use the word "weekend" in a workplace email, chances are they're sending the message up the org chart. The same is true for the words "voicemail," "driving," ...

HBO plays down threat of hacked internal emails

August 2, 2017

HBO, which acknowledged Monday that hackers had broken into its systems and stolen "proprietary information," now says the attackers likely haven't breached the network's entire email system.

Identifying 'anonymous' email authors

March 8, 2011

A team of researchers from Concordia University has developed an effective new technique to determine the authorship of anonymous emails. Tests showed their method has a high level of accuracy – and unlike many other ...

USA Today owner Gannett warns workers of possible breach

May 2, 2017

Gannett, the publisher of USA Today and other newspapers, has warned about 18,000 current and former employees that hackers may have had access to their personal information after breaking into the emails of members of its ...

Recommended for you

'iPal' robot companion for China's lonely children

June 14, 2018

It speaks two languages, gives math lessons, tells jokes and interacts with children through the tablet screen in its chest—China's latest robot is the babysitter every parent needs.

Apple closing iPhone security gap used by law enforcement

June 14, 2018

Apple is closing a security gap that allowed outsiders to pry personal information from locked iPhones without a password, a change that will thwart law enforcement agencies that have been exploiting the vulnerability to ...

7 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

daqddyo
not rated yet Oct 30, 2017
..."clear and redined version"?
Even I could tie together emails using this phrase. What does it mean anyway?
physman
1 / 5 (1) Oct 30, 2017
cunt flaps
dudester
not rated yet Oct 31, 2017
..."clear and redined version"?
Even I could tie together emails using this phrase. What does it mean anyway?


Well, unless they corrected the article since you read it, what I am seeing is "redlined" not "redined". I suppose it means some kind of marking, probably using red lines, as opposed to a version without such markings which could logically be called clear.
daqddyo
not rated yet Oct 31, 2017
The article was posted twice - yesterday and today so maybe it was a reissue.
BubbaNicholson
1.5 / 5 (2) Nov 01, 2017
It's good old Bletchley Park all over again, no biggie anyway. Pat phrase detection identifying authors is now ancient tech. So why should it be useful today? Anybody can design a mimicking editor to maintain privacy of communication even discovered.
barakn
not rated yet Nov 01, 2017
The method produces a 5% false negative rate in a pool of only 12 writers when looking at samples that are over 1000 characters. Hardly confidence inspiring.
mackita
5 / 5 (1) Nov 05, 2017
It's an example of so-called cherry picking the data for to prove the point checked. The lawyers - especially those of big energy companies - communicate in phrases more than the rest of population. The article undoubtedly has the point, but larger and more neutral cohort of people wouldn't be so convincing.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.