IBM researchers' algorithm explores tweets for home location cues

March 24, 2014 by Nancy Owano weblog
Credit: arXiv:1403.2345 [cs.SI]

( —By drawing on the content of users' tweets and their tweeting behavior, a team of three IBM researchers said they have a new algorithm to infer the home location of Twitter users at different granularities, including city, state, time zone or geographic region. The algorithm makes use of the person's last 200 tweets for tracking. The scientists described their approach as an "ensemble of statistical and heuristic classifiers" and with this approach they said they could predict locations and make use of a geographic gazetteer dictionary (USGS [United States Geological Survey] gazetteer) to identify place-name entities. They analyzed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time and used that to further improve their detection accuracy.

The paper, "Home Location Identification of Twitter Users," submitted earlier this month on, is by Jalal Mahmud, Jeffrey Nichols and Clemens Drews of IBM Research. They said they had experimental evidence to suggest their algorithm works well in practice. In fact, they said it "outperforms the best existing algorithms for predicting the of Twitter users."

From July 2011 to Aug 2011, they collected from the top 100 cities in US by population. They invoked the Twitter REST API to collect each user's 200 most recent tweets (less if that user had fewer than 200 total tweets). Some users discovered to have private profiles were eliminated. The final data set had 1.5 million tweets by 9551 users.

In listing their contributions, the IBM researchers said, when tested using the 1.52-million tweet dataset from 9551 users from 100 US cities, that their algorithm outperforms the best existing algorithms for home location prediction from tweets. "Our best method achieves accuracies of 64% for cities, 66% for states, 78% for time zones and 71% for regions."

Microblogging is of great interest to scientists seeking various research answers. As for Twitter, scientists regard this as an ideal laboratory for mining data. The paper's authors, however, noted that less than 1% of tweets are geotagged and they said that information available from the location fields in users' profiles is unreliable at best.

" In this paper, we aim to overcome this location sparseness problem by developing algorithms to predict the home, or primary, locations of Twitter users from the content of their tweets and their tweeting behavior. Ultimately, we would like to be able to predict the location of each tweet and our work to predict a user's home location is a key step towards achieving that goal."

Among their future research goals are to incorporate more domain knowledge in their location prediction models, such as a landmark database. They said they hoped to integrate their algorithm "into various applications to explore its usefulness in real world deployments."

Explore further: Twitter lets advertisers better target tweets

More information: Home Location Identification of Twitter Users, arXiv:1403.2345 [cs.SI]

We present a new algorithm for inferring the home location of Twitter users at different granularities, including city, state, time zone or geographic region, using the content of users tweets and their tweeting behavior. Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities. We find that a hierarchical classification approach, where time zone, state or geographic region is predicted first and city is predicted next, can improve prediction accuracy. We have also analyzed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time and use that to further improve the location detection accuracy. Experimental evidence suggests that our algorithm works well in practice and outperforms the best existing algorithms for predicting the home location of Twitter users.

Related Stories

Twitter says its ads pay off for candidates

October 10, 2012

Twitter released a study Wednesday showing its paid messages pay off for political candidates, not only in garnering attention but in driving campaign contributions.

Recommended for you

For these 'cyborgs', keys are so yesterday

September 4, 2015

Punching in security codes to deactivate the alarm at his store became a thing of the past for Jowan Oesterlund when he implanted a chip into his hand about 18 months ago.

How to curb emissions? Put a price on carbon

September 3, 2015

Literally putting a price on carbon pollution and other greenhouse gasses is the best approach for nurturing the rapid growth of renewable energy and reducing emissions.

Magnetic fields provide a new way to communicate wirelessly

September 1, 2015

Electrical engineers at the University of California, San Diego demonstrated a new wireless communication technique that works by sending magnetic signals through the human body. The new technology could offer a lower power ...


Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Mar 24, 2014
looks like work should be labeled spammers delight.
not rated yet Mar 25, 2014
A little more elaboration on who's interested in this information and why would be appreciated.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.