October 20, 2010 feature
Model describes Web page popularity
(PhysOrg.com) -- How do some Web pages become popular? In a recent study, researchers have analyzed Wikipedia articles and a collection of all the Web pages of Chile to better understand the dynamics of online popularity. They observed that online popularity is characterized not by a gradual accumulation process, but by "bursts" that display many of the same features of critical systems, such as stock market crashes and natural phenomena. They also developed a model that captures these critical features of online popularity.
We see that Internet popularity behaves in unpredictable ways, with big shifts in attention causing changes which have statistical signatures like those seen in earthquakes and avalanches, Jacob Ratkiewicz from Indiana University told PhysOrg.com.
Ratkiewicz and his coauthors from Indiana University and the Institute for Scientific Interchange in Torino, Italy, have published their study on online popularity in a recent issue of Physical Review Letters. As they explain, online information that becomes popular has formidable power to impact opinions, culture, and policy, as well as earn higher advertising profits. Achieving online popularity is obviously highly desired for these reasons, but as previous studies have found, very few sites become tremendously popular.
In the researchers' analysis, the popularity of a Wikipedia article or Web page is expressed by the number of clicks to that page and the number of external links to that page. While previous studies have found that the popularity distribution of Web pages follows power-law behavior, it has been difficult to observe the growth in popularity of individual pages due to the lack of data with temporal information. Here, the researchers gathered the traffic data of millions of pages (3 million Wikipedia articles with a one-second time resolution during 2001-2007; 3 million Wikipedia articles with a one-hour time resolution during 2008-2010; and 3 million Web pages from Chile's .cl domain with a one-year time resolution during 2002-2006). They obtained the Wikipedia data by mining the full edit history of every article and the Chilean Web page data using the country's TODOCL search engine.
Among their results, the researchers found that almost all pages experience a burst of popularity near the beginning of their lives. Then, some pages maintain a constant exponential growth, while many other pages experience intermittent bursts. Looking at these bursts more closely, the researchers found that their distribution follows a heavy-tail behavior, which is a common feature of critical systems. In a heavy-tail distribution, most of the items exhibit small values, but a few items exhibit very large values that dominate the overall volume of traffic. As the researchers noted, these bursts are different from those observed in news-driven events, where attention fades rapidly; instead, sequences of bursts occur for certain Web pages and these pages accumulate popularity.
The researchers developed a ranking model that could reproduce some of the features of the popularity burst distribution, but they had to add a reranking mechanism to reproduce the heavy tail. The reranking mechanism randomly boosts the popularity value of a Web page, and enables the model to more closely represent the features in the actual data. Although the model is mostly descriptive, its ability to reproduce the dynamics of online popularity could lead to a better understanding of how online information becomes popular.
We hope that deeper understanding of how popularity evolves could lead to methods for predicting things that will become popular before they actually do, Ratkiewicz said.
I'm not sure that this understanding could be used to legitimately improve the popularity of specific Web pages, he added. However, recent experience in another project of ours suggests that people are trying to exploit social media to generate bursts of attention toward specific Web sites. It's been shown that these 'twitter-bombs' can catapult a page to the top of Google search results.
Copyright 2010 PhysOrg.com.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of PhysOrg.com.