Predicting what topics will trend on Twitter

November 1, 2012 by Larry Hardesty
Credit: Christine Daniloff

Twitter's home page features a regularly updated list of topics that are "trending," meaning that tweets about them have suddenly exploded in volume. A position on the list is highly coveted as a source of free publicity, but the selection of topics is automatic, based on a proprietary algorithm that factors in both the number of tweets and recent increases in that number.

At the Interdisciplinary Workshop on Information and Decision in Social Networks at MIT in November, Associate Professor Devavrat Shah and his student, Stanislav Nikolov, will present a new algorithm that can, with 95 percent accuracy, predict which topics will trend an average of an hour and a half before Twitter's algorithm puts them on the list—and sometimes as much as four or five hours before.

The algorithm could be of great interest to Twitter, which could charge a premium for ads linked to popular topics, but it also represents a new approach to that could, in theory, apply to any quantity that varies over time: the duration of a bus ride, for films, maybe even .

Like all machine-learning algorithms, Shah and Nikolov's needs to be "trained": it combs through data in a sample set—in this case, data about topics that previously did and did not trend—and tries to find meaningful patterns. What distinguishes it is that it's nonparametric, meaning that it makes no assumptions about the shape of patterns.

Let the data decide

In the standard approach to machine learning, Shah explains, researchers would posit a "model"—a general hypothesis about the shape of the pattern whose specifics need to be inferred. "You'd say, 'Series of trending things … remain small for some time and then there is a step,'" says Shah, the Jamieson Associate Professor in the Department of Electrical Engineering and Computer Science. "This is a very simplistic model. Now, based on the data, you try to train for when the jump happens, and how much of a jump happens.

"The problem with this is, I don't know that things that trend have a step function," Shah explains. "There are a thousand things that could happen." So instead, he says, he and Nikolov "just let the data decide."

In particular, their algorithm compares changes over time in the number of about each new topic to the changes over time of every sample in the training set. Samples whose statistics resemble those of the new topic are given more weight in predicting whether the new topic will trend or not. In effect, Shah explains, each sample "votes" on whether the new topic will trend, but some samples' votes count more than others'. The weighted votes are then combined, giving a probabilistic estimate of the likelihood that the new topic will trend.

In Shah and Nikolov's experiments, the training set consisted of data on 200 Twitter topics that did trend and 200 that didn't. In real time, they set their algorithm loose on live tweets, predicting trending with 95 percent accuracy and a 4 percent false-positive rate.

Shah predicts, however, that the system's accuracy will improve as the size of the training set increases. "The training sets are very small," he says, "but we still get strong results."

Keeping pace

Of course, the larger the training set, the greater the computational cost of executing Shah and Nikolov's algorithm. Indeed, Shah says, curbing computational complexity is the reason that typically employ parametric models in the first place. "Our computation scales proportionately with the data," Shah says.

But on the Web, he adds, computational resources scale with the data, too: As Facebook or Google add customers, they also add servers. So his and Nikolov's algorithm is designed so that its execution can be split up among separate machines. "It is perfectly suited to the modern computational framework," Shah says.

In principle, Shah says, the new algorithm could be applied to any sequence of measurements performed at regular intervals. But the correlation between historical data and future events may not always be as clear cut as in the case of Twitter posts. Filtering out all the noise in the historical data might require such enormous training sets that the problem becomes computationally intractable even for a massively distributed program. But if the right subset of training data can be identified, Shah says, "It will work."

"People go to social-media sites to find out what's happening now," says Ashish Goel, an associate professor of management science at Stanford University and a member of 's technical advisory board. "So in that sense, speeding up the process is something that is very useful." Of the MIT researchers' nonparametric approach, Goel says, "it's very creative to use the data itself to find out what trends look like. It's quite creative and quite timely and hopefully quite useful."

Explore further: Improving recommendation system algorithms

Related Stories

Improving recommendation system algorithms

July 8, 2011

Recommendation algorithms are a vital part of today’s Web, the basis of the targeted advertisements that account for most commercial sites’ revenues and of services such as Pandora, the Internet radio site that ...

Most influential tweeters of all

September 9, 2010

Tweet this, Ashton Kutcher, Lady Gaga and Britney Spears. Just because you have a ton of followers on Twitter doesn't necessarily mean you're among the most influential people in the Twitterverse, according to researchers ...

2010's biggest stories influenced by celebrity tweets

January 10, 2011

Tweets from popular news organizations have a major influence on hot Twitter topics, but a Northwestern University analysis of the Top Twitter Trends in 2010 shows that celebrities, such as Adam Lambert and Conan O’Brien, ...

Recommended for you

Volumetric 3-D printing builds on need for speed

December 11, 2017

While additive manufacturing (AM), commonly known as 3-D printing, is enabling engineers and scientists to build parts in configurations and designs never before possible, the impact of the technology has been limited by ...

Tech titans ramp up tools to win over children

December 10, 2017

From smartphone messaging tailored for tikes to computers for classrooms, technology titans are weaving their way into childhoods to form lifelong bonds, raising hackles of advocacy groups.


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.