Identifying 'anonymous' email authors

Mar 08, 2011
Benjamin Fung, a professor of Information Systems Engineering at Concordia University, has developed an effective new technique to determine the authorship of anonymous emails. Credit: Concordia University

A team of researchers from Concordia University has developed an effective new technique to determine the authorship of anonymous emails. Tests showed their method has a high level of accuracy – and unlike many other methods of ascertaining authorship, it can provide presentable evidence in courts of law. Findings on the new technique are published in the journal Digital Investigation.

"In the past few years, we've seen an alarming increase in the number of cybercrimes involving anonymous emails," says study co-author Benjamin Fung, a professor of Information Systems Engineering at Concordia University and an expert in data mining – extracting useful, previously unknown knowledge from a large volume of raw data. "These emails can transmit threats or child pornography, facilitate communications between criminals or carry viruses."

While police can often use the IP address to locate the house or apartment where an originated, they may find many people at that address. They need a reliable, effective way to determine which of several suspects has written the emails under investigation.

Fung and his colleagues developed a novel method of authorship attribution to meet this need, based on techniques used in speech recognition and data mining. Their approach relies on the identification of frequent patterns – unique combinations of features that recur in a suspect's emails.

To determine whether a suspect has authored the target email, they first identify the patterns found in emails written by the subject. Then, they filter out any of these patterns which are also found in the emails of other suspects.

The remaining frequent patterns are unique to the author of the emails being analyzed. They constitute the suspect's 'write-print,' a distinctive identifier like a fingerprint. "Let's say the anonymous email contains typos or grammatical mistakes, or is written entirely in lowercase letters," says Fung. "We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of who wrote a given email, and infer the gender, nationality and education level of the author."

To test the accuracy of their technique, Fung and his colleagues examined the Enron Email Dataset, a collection which contains over 200,000 real-life emails from 158 employees of the Enron Corporation. Using a sample of 10 emails written by each of 10 subjects – 100 emails in all – they were able to identify authorship with an accuracy of 80 percent to 90 percent.

"Our technique was designed to provide credible evidence that can be presented in a court of law," says Fung. "For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this."

The new authorship identification technique was developed in collaboration with Mourad Debbabi, a Concordia expert in cyber forensics, and PhD student Farkhund Iqbal. "Our different backgrounds allowed us to apply data mining techniques to real-life problems in cyber forensics," says Fung. "This is an excellent illustration of how effective interdisciplinary research can be."

Explore further: Researchers developing algorithms to detect fake reviews

More information: Cited research: www.dfrws.org/2008/proceedings/p42-iqbal.pdf

Provided by Concordia University

3.4 /5 (13 votes)

Related Stories

No holiday e-mail break for Americans: survey

Nov 23, 2010

Americans will take a break from the office over the Thanksgiving and Christmas holidays but most won't stop checking their work emails, according to a survey released on Tuesday.

Revolutionary Spam Firewall

Aug 23, 2004

The email spam nightmare could be halted in cyberspace by a groundbreaking firewall developed at The University of Queensland. The new technology is the only true spam firewall in existence, according to co-developer Matthew Sulli ...

Inquiry begins into leaked climate emails in Britain

Feb 11, 2010

An independent investigation began Thursday into leaked emails from a British climate research centre which appeared to show scientists trying to manipulate the data, and sparked a major global row.

Phishing Attacks in May Jumped More Than 200 Percent

Jun 30, 2005

The phishing season is officially open. Phishing – using fraudulent emails to try to dupe recipients into revealing personal or financial information -- reached its highest level in May, according to IBM. The month Global ...

White House changes email rules

Aug 17, 2009

The White House said Monday it will tighten its email sign-up rules after drawing fire from some recipients of a message about health care policies who complained they had not asked for such updates.

Recommended for you

Government ups air bag warning to 7.8M vehicles

2 hours ago

The U.S. government is adding more than 3 million vehicles to a rare warning about faulty air bags that have the potential to kill or injure drivers or passengers in a crash.

Fighting cyber-crime one app at a time

2 hours ago

This summer Victoria University of Wellington will be home to four Singaporean students researching cyber threats. The students have been working with Dr Ian Welch, a lecturer in Victoria's School of Engineering and Computer ...

Using sound to picture the world in a new way

3 hours ago

Have you ever thought about using acoustics to collect data? The EAR-IT project has explored this possibility with various pioneering applications that impact on our daily lives. Monitoring traffic density ...

User comments : 8

Adjust slider to filter visible comments by rank

Display comments: newest first

wealthychef
4.4 / 5 (5) Mar 08, 2011
80 to 90 percent accuracy and this is going to be used in a court of law? I can just hear a good defense lawyer shredding this to pieces. Also, these things seem pretty easy to fake.
elginz
3.7 / 5 (3) Mar 08, 2011
1. COmpose an email
2. Run it through a translator
3. Translate back to English
4. Send email
that_guy
5 / 5 (3) Mar 08, 2011
80 to 90 percent accuracy and this is going to be used in a court of law? I can just hear a good defense lawyer shredding this to pieces. Also, these things seem pretty easy to fake.

It's not enough for a conviction, but it's good enough to catch a warrant in conjunction with the ip. it would be part of a case. It would be stupid and abusive to try to convict someone on one piece of evidence.

If someone sees a blue pickup truck leaving the site of a crime, do they ignore that piece of evidence because there is a small individual chance of that being the perpetrator's case, or do they use that in conjunction with the tire tracks, the shoe tracks, the hole in his story, etc?

This is similar to handwriting analysis, and that is admissible in court with even lower efficacy/reliability.
Newbeak
2 / 5 (2) Mar 08, 2011
Send your email through a proxy server..
Skeptic_Heretic
5 / 5 (2) Mar 08, 2011
Send your email through a proxy server..

Still traceable for the most part, depends on the proxy. Very few lvl 5 proxies aren't watched on both ends.
Froob
5 / 5 (1) Mar 09, 2011
Emails tend to be relatively short, especially those trying to do spam selling, therefore the statistical basis of any technique based on the content of such emails, is suspect.
frajo
5 / 5 (2) Mar 09, 2011
I'm underwhelmed. Text pattern analysis is very interesting but older than the internet.

[1] IP numbers can be faked. In order to find the true sender you'd have to trace back the whole chain of mail servers some of which could be compromised.
[2] Threats and CP? Rubbish. I never read or open spam. (Yes - no Microsoft inside.)
[3] To be precise text analyses need huge bodies of texts. Not one-liners like "buy cheap viagra www.cheap-viagra.bum.
[4] I'd really like to put those spamming bastards and their sponsors into jail. But this method won't help me.
J-n
not rated yet Mar 09, 2011
This, unfortunately, has nothing to do with spammers. They know who sends out the spam, they know from where, and they turn a blind eye because it would mean getting involved internationally, and someone somewhere down the line would loose money.

The biggest problem with this idea in my mind is that it will instill into people that this form of identification is plausible and as good as fingerprints. Fingerprints are very difficult to change, and changing them will usually not get them confused with someonelse's. Changing your speech (text) patterns you could impersonate someone, or completely change your "identity" in the eyes of this system.

While it wouldn't be the only thing used to convict someone of a crime, it's a heck of a good way to get a warrant to search their computer for whatever crimes they "might" be committing.