Automated analysis of digital records could expose intimate details, personality traits of millions, study finds

Mar 11, 2013
This is a graphic from the You Are What You Like Facebook app. Credit: David Stillwell, University of Cambridge

New research, published today in the journal PNAS, shows that surprisingly accurate estimates of Facebook users' race, age, IQ, sexuality, personality, substance use and political views can be inferred from automated analysis of only their Facebook Likes - information currently publicly available by default.

In the study, researchers describe Facebook Likes as a "generic class" of digital record - similar to web and browsing histories - and suggest that such techniques could be used to extract sensitive information for almost anyone regularly online.

Researchers at Cambridge's Psychometrics Centre, in collaboration with Microsoft Research Cambridge, analysed a dataset of over 58,000 US Facebook users, who volunteered their Likes, demographic profiles and psychometric testing results through the myPersonality application. Users opted in to provide data and gave consent to have profile information recorded for analysis.

Facebook Likes were fed into algorithms and corroborated with information from profiles and . Researchers created statistical models able to predict personal details using Facebook Likes alone.

Models proved 88% accurate for determining male sexuality, 95% accurate distinguishing African-American from Caucasian American and 85% accurate differentiating Republican from Democrat. Christians and Muslims were correctly classified in 82% of cases, and good prediction accuracy was achieved for relationship status and substance abuse – between 65 and 73%.

But few users clicked Likes explicitly revealing these attributes. For example, less that 5% of gay users clicked obvious Likes such as Gay Marriage. relied on 'inference' - aggregating huge amounts of less informative but more popular Likes such as music and TV shows to produce incisive personal profiles.

Even seemingly opaque personal details such as whether users' parents separated before the user reached the age of 21 were accurate to 60%, enough to make the information "worthwhile for advertisers", suggest the researchers.

While they highlight the potential for personalised marketing to improve online services using predictive models, the researchers also warn of the threats posed to users' privacy.

They argue that many online consumers might feel such levels of digital exposure exceed acceptable limits - as corporations, governments, and even individuals could use predictive software to accurately infer highly sensitive information from Facebook Likes and other digital 'traces'.

The researchers also tested for personality traits including intelligence, emotional stability, openness and extraversion.

While such latent traits are far more difficult to gauge, the accuracy of the analysis was striking. Study of the openness trait – the spectrum of those who dislike change to those who welcome it – revealed that observation of Likes alone is roughly as informative as using an individual's actual personality test score.

Some Likes had a strong but seemingly incongruous or random link with a personal attribute, such as Curly Fries with high IQ, or That Spider is More Scared Than U Are with non-smokers.

When taken as a whole, researchers believe that the varying estimations of personal attributes and personality traits gleaned from Facebook Like analysis alone can form surprisingly accurate personal portraits of potentially millions of users worldwide.

They say the results suggest a possible revolution in psychological assessment which – based on this research – could be carried out at an unprecedented scale without costly assessment centres and questionnaires.

"We believe that our results, while based on Facebook Likes, apply to a wider range of online behaviours." said Michal Kosinski, Operations Director at the Psychometric Centre, who conducted the research with his Cambridge colleague David Stillwell and Thore Graepel from Microsoft Research.

"Similar predictions could be made from all manner of digital data, with this kind of secondary 'inference' made with remarkable accuracy - statistically predicting sensitive information people might not want revealed. Given the variety of digital traces people leave behind, it's becoming increasingly difficult for individuals to control.

"I am a great fan and active user of new amazing technologies, including Facebook. I appreciate automated book recommendations, or Facebook selecting the most relevant stories for my newsfeed," said Kosinski. "However, I can imagine situations in which the same data and technology is used to predict political views or sexual orientation, posing threats to freedom or even life."

"Just the possibility of this happening could deter people from using digital technologies and diminish trust between individuals and institutions – hampering technological and economic progress. Users need to be provided with transparency and control over their information."

Thore Graepel from Microsoft Research said he hoped the research would contribute to the on-going discussions about user privacy:

"Consumers rightly expect strong privacy protection to be built into the products and services they use and this research may well serve as a reminder for consumers to take a careful approach to sharing information online, utilising privacy controls and never sharing content with unfamiliar parties."

David Stillwell from Cambridge University added: "I have used Facebook since 2005, and I will continue to do so. But I might be more careful to use the privacy settings that provides."

Explore further: Coping with floods—of water and data

More information: "Private traits and attributes are predictable from digital records of human behavior," by Michal Kosinski, David Stillwell, and Thore Graepel, PNAS, 2013.

Related Stories

Facebook cracks down on insincere "Likes"

Sep 01, 2012

Facebook ramped up efforts Friday to get rid of "Likes" that aren't from people genuinely interested in giving a virtual thumbs up to pages at the world's leading social network.

Privacy groups ask FTC to investigate Facebook

Sep 29, 2011

(AP) -- Nine privacy groups have sent a joint letter to the Federal Trade Commission saying it should investigate the ways Facebook collects data about users' online activity after recent changes to its site.

Facebook 'Likes' a good indicator of quality hospital care

Mar 01, 2013

While those active on social media aren't shy about expressing opinions on their Facebook pages, how much do their "Likes" really reflect the quality of an organization? American Journal of Medical Quality recently publis ...

Recommended for you

Coping with floods—of water and data

Dec 19, 2014

Halloween 2013 brought real terror to an Austin, Texas, neighborhood, when a flash flood killed four residents and damaged roughly 1,200 homes. Following torrential rains, Onion Creek swept over its banks and inundated the ...

Cloud computing helps make sense of cloud forests

Dec 17, 2014

The forests that surround Campos do Jordao are among the foggiest places on Earth. With a canopy shrouded in mist much of time, these are the renowned cloud forests of the Brazilian state of São Paulo. It is here that researchers ...

User comments : 22

Adjust slider to filter visible comments by rank

Display comments: newest first

DarkWingDuck
3 / 5 (10) Mar 11, 2013
Minority Report... It's coming for you!
baudrunner
2.8 / 5 (9) Mar 11, 2013
I never "like" anything. But I really don't like analysts who perceive all internet users as consumers who need to be directed to products that they might buy via commercial advertising intruding in on the relevant content that brought them to a site in the first place. The internet exists for the vast majority, if not for all of us, as a resource for information and research. It is not of necessity advertisement driven, as much as they like us to believe that it is. We pay our ISP's for the delivery of the information on sites that we would all be much happier to view without any commercial content.

Furthermore, people who browse sites on their cell phones are often paying up front for limited bandwidth. The bulk of the bytes that are sent to a mobile device constitute advertisement and that is just plain unjust.

See how many people would actually choose to visit sites that existed solely for their advertising content and nothing else. NOBODY!
gwrede
3.2 / 5 (9) Mar 11, 2013
But I really don't like analysts who perceive all internet users as consumers who need to be directed to products that they might buy via commercial advertising intruding in on the relevant content that brought them to a site in the first place.
Who do you think pays for all the sites that you visit? Does Facebook, the news sites and others that you visit, let you use their server just out of love for you? These sites have to make a living, and that comes either from advertising or subscriptions.

Would you like to pay for the right to write comments here?
ValeriaT
1.4 / 5 (9) Mar 11, 2013
You can be sure, Google knows about all animal sex videos, you watched last year.. You can fool the God and your confessor - but not Google.
DarkWingDuck
1 / 5 (5) Mar 11, 2013
Furthermore, people who browse sites on their cell phones are often paying up front for limited bandwidth. The bulk of the bytes that are sent to a mobile device constitute advertisement and that is just plain unjust.

I like this point but you do have a choice on whether or not use the site. I for one find that sigalert is quite useful and accept that adds go with it. I have unlimited data though and never even get near the "alert" limit set befor throttleing begins. I don like how google gets to auto update all the crud I don't want which uses up bytes but it's forced on you with a smart phone.
Doug_Huffman
2.6 / 5 (9) Mar 11, 2013
"Smartphone" beggars the meaning of smart. Maybe dummies-phone would more apt.

When something online is free, you're not the customer, you are the product.

http://futureofth...-product

The Future of the Internet - And How to Stop It

http://futureofth...rnet.pdf
ScooterG
3 / 5 (10) Mar 11, 2013
I bet they could even lay out a profile of those of us who do not subscribe to facebook - all seven of us!
bredmond
3.7 / 5 (3) Mar 11, 2013
This is exceptionally convenient to help people in many ways. The kind of knowledge from these studies could potentially lead to ways to help people understand what they really like and dont like and what they are suited for and who they are suited for. Many people dont realize what work they would really like to be doing and this might help people know what kinds of jobs are out there that they didnt know before. facebook profiles could be linked to companies looking for employees. And I tell ya, trying to find material for further education now that i am out of school is a little difficult. If this kind of data analysis could be used to connect me with adult ongoing education materials that I never knew i liked or had trouble conveniently accessing, i would be better off. Especially if it could lead to some kind of standardized certificate. That could help me and many other people get better jobs.
bredmond
4.3 / 5 (3) Mar 11, 2013
this technology might also help to uncover patterns of diseases that could be solved simply by knowing what to look for. I dont know which diseases, but maybe looking for associated things such as whetever allows it to predict drug usage. maybe something like this could help those drug users connect with more effective diversions which then lead towards something constructive which they can use to get a better education or a better job and stay more stable so that drug dependency is gradually minimized or phased out. combining with technologies such as the netflix challenge recommendation software, people could lead much more comfortable lives. that is to say, when people want or need something, even if they dont know what they want or need it, it can be delivered to them quite quickly and accurately.
Telekinetic
2.9 / 5 (8) Mar 11, 2013
The analysts will also be able to peg masochists who love to read what their "friends" had for breakfast and sit quietly while watching an internet version of a slide show of someone else's vacation. They will also discover a new personality disorder manifested by the collecting of friends like sea shells- the more you have, the higher your status.
baudrunner
1.7 / 5 (6) Mar 11, 2013
Who do you think pays for all the sites that you visit? Does Facebook, the news sites and others that you visit, let you use their server just out of love for you? These sites have to make a living, and that comes either from advertising or subscriptions.

Facebook is a social networking site. It, and others like it, should be funded by subscription, not by advertising. Nobody reads Facebook for research or information gathering purposes unless they are doing an anthropological study.

The big ISP's are making money hand over fist from subscribers, to the tune of billions, especially from the mobile market. Advertising is overkill and hogs bandwidth and detracts from the quality of the internet experience. Sites that get visited a lot, such as news and information sites like this one, could get a share of that money.
Sanescience
2.3 / 5 (6) Mar 11, 2013
To most of the statements in the article and comments: duh.
Telekinetic
1.8 / 5 (5) Mar 12, 2013
"Facebook is a social networking site. It, and others like it, should be funded by subscription, not by advertising. Nobody reads Facebook for research or information gathering purposes unless they are doing an anthropological study."- baudrunner

Many users of Facebook have a commercial agenda- promoting their businesses of services and products at no cost for advertising. Oddly, fb users are more likely to trust a product in that atmosphere of "we're all friends here." The paying advertisers on fb and particularly Google are what make those corporations the juggernauts that they are, and why they're in business in the first place.

Professor Plum
4.3 / 5 (3) Mar 12, 2013
Maybe substance abuse scored lower because fewer people wanted to admit the results were correct ;-)
JRi
3 / 5 (4) Mar 12, 2013
Maybe substance abuse scored lower because fewer people wanted to admit the results were correct ;-)


Maybe not all were _abusing_ substances, just enjoying them.
Expiorer
1 / 5 (4) Mar 12, 2013
And that is why I don`t "like" that
QuixoteJ
1 / 5 (6) Mar 12, 2013
The kind of knowledge from these studies could potentially lead to ways to help people understand what they really like and dont like and what they are suited for and who they are suited for...
Umm... What?? You're actually saying it's better if people don't have to think for themselves?... "Helping" people understand what they really like and don't like? That's too close to blatant mind control for me.
bredmond
not rated yet Mar 12, 2013
Umm... What?? You're actually saying it's better if people don't have to think for themselves?... "Helping" people understand what they really like and don't like? That's too close to blatant mind control for me.


No. I mean it can help introduce people to things they hadn't encountered before. Additionally, maybe people will realize what they liked about something else was not direct, it was because they didnt realize what they were missing.
abinico_warez_7
5 / 5 (1) Mar 12, 2013
The relationship between IQ and Facebook use is pretty simple: inverse.
cyberCMDR
5 / 5 (1) Mar 16, 2013
I wonder if this technology would have identified those people who committed mass shootings last year. If there was a way to keep it from being abused, it could help identify people who need help before they make national news.
Lobo Tommy
1 / 5 (3) Mar 17, 2013
I don't use facebook for precisely that kind of reasons. Just because I'm paranoid it doesn't mean they're not out to get me. :)

On a serious note though, I think this is (potentially) unacceptable intrusion of privacy.
gwrede
1 / 5 (3) Mar 19, 2013
I wonder if this technology would have identified those people who committed mass shootings last year. If there was a way to keep it from being abused, it could help identify people who need help before they make national news.
Definitely this could be used for it. But then, there are all kinds of moral, legal, and practical issues involved. And if word gets out that it's done, just guess if all those who "suspect they are suspect" will skip FB, just to be sure.

But yes, it could easily be done, and given enough data (i.e. that from a few actual mass shooters) it could be pretty accurate.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.