Twitter behavior can predict users' income level, new research shows

September 29, 2015, University of Pennsylvania
Twitter behavior can predict users' income level, new Penn research shows
Computer scientists from the University of Pennsylvania and elsewhere have linked the online behavior of more than 5,000 Twitter users to their income bracket. This graph shows salary in relation to how users display objectivity and emotion, what topics they discuss, and the ratio of followers to followees. Credit: University of Pennsylvania

The words people use on social media can reveal hidden meaning to those who know where to look.

Linguists have long been fascinated by this notion, connecting a person's to age, , even socioeconomic status. Now computer scientists from the University of Pennsylvania and elsewhere have gone a step further, linking the online behavior of more than 5,000 Twitter users to their . They published their results in the journal PLOS ONE.

Daniel Preotiuc-Pietro a post-doctoral researcher in Penn's Positive Psychology Center in the School of Arts & Sciences led the research, collaborating with Svitlana Volkova of Johns Hopkins University, Vasileios Lampos and Nikolaos Aletras of University College London and Yoram Bachrach of Microsoft Research.

The team took an opposite approach to what psychologists and have historically done: Rather than asking direct questions, the scientists looked at participants' social media posts, often full of intimate details despite the lack of privacy these outlets afford. Researchers from Penn's World Well-Being Project, of which Preotiuc-Pietro is a part, are curious about as a research tool that can support, or even replace, expensive, limited and potentially biased surveying.

For this experiment, the researchers started by looking at Twitter users' self-described occupations.

Computer scientists from the University of Pennsylvania and elsewhere have linked the online behavior of more than 5,000 Twitter users to their income bracket. This graph shows average income across groups, then reveals traits (age, gender etc.) predicted from the tweets. Credit: University of Pennsylvania

In the United Kingdom, a job code system sorts occupation into nine classes. Using that hierarchy, the researchers determined average income for each code, then sought a representative sampling from each. After manually removing ambiguous profiles—for example, listings referencing the film Coal Miner's Daughter grouped as "coal miner" for profession—the team ended up with 5,191 Twitter users and more than 10 million tweets to analyze.

"It's the largest dataset of its kind for this type of research," said Preotiuc-Pietro. "The dataset enabled us to do something no one has really done before."

From there, they created a statistical natural language processing algorithm that pulled in words that people in each code class use distinctly. Most people tend to use the same or similar words, so the algorithm's job was to "understand" which were most predictive for each class. Humans analyzed these groupings and assigned them qualitative signifiers.

Some of the results validated what's already known, for instance, that a person's words can reveal age and gender, and that these are tied to income. But Preotiuc-Pietro said there were also some surprises; for example, those who earn more tend to express more fear and anger on Twitter. Perceived optimists have a lower mean income. Text from those in lower income brackets includes more swear words, whereas those in higher brackets more frequently discuss politics, corporations and the nonprofit world.

Aletras noted an overall picture that emerged about Twitter use.

"Lower-income users or those of a lower use Twitter more as a communication means among themselves," he said. "High-income people use it more to disseminate news, and they use it more professionally than personally."

Strong correlations like these, between what the researchers describe as online expression and offline demographics—for example, occupation grouping or income level—also proved intriguing, Lampos added. "This work attempts to highlight some of the potential causal factors in these relationships."

Such findings will act as a baseline for future work, some of which will investigate how perceptions about user align with reality.

Explore further: Study finds people's conservative and liberal traits show up in their Twitter vocabulary

More information: PLOS ONE, … journal.pone.0138717

Related Stories

How hashtags and @ symbols affect language on Twitter

September 9, 2015

Despite all the shortened words and slang seen on Twitter, it turns out that people follow many of the same communication etiquette rules on social media as they do in speech. Research from the Georgia Institute of Technology ...

Twitter data may help shed light on sleep disorders

June 11, 2015

Researchers from Boston Children's Hospital and Merck have built the beginnings of "digital phenotype" of insomnia and other sleep disorders based on data from Twitter. This study, published today in the Journal of Medical ...

Finding psychological insights through social media

February 28, 2015

Social media has opened up a new digital world for psychology research. Four researchers will be discussing new methods of language analysis, and how social media can be leveraged to study personality, mental and physical ...

Recommended for you

In colliding galaxies, a pipsqueak shines bright

February 20, 2019

In the nearby Whirlpool galaxy and its companion galaxy, M51b, two supermassive black holes heat up and devour surrounding material. These two monsters should be the most luminous X-ray sources in sight, but a new study using ...

Research reveals why the zebra got its stripes

February 20, 2019

Why do zebras have stripes? A study published in PLOS ONE today takes us another step closer to answering this puzzling question and to understanding how stripes actually work.

When does one of the central ideas in economics work?

February 20, 2019

The concept of equilibrium is one of the most central ideas in economics. It is one of the core assumptions in the vast majority of economic models, including models used by policymakers on issues ranging from monetary policy ...

Correlated nucleons may solve 35-year-old mystery

February 20, 2019

A careful re-analysis of data taken at the Department of Energy's Thomas Jefferson National Accelerator Facility has revealed a possible link between correlated protons and neutrons in the nucleus and a 35-year-old mystery. ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.