Mathematically ranking ranking methods

May 24, 2011

In a world where everything from placement in a Google search result to World Cup eligibility depends on ranking and numerical ratings of some kind, it is becoming increasingly important to analyze the algorithms and techniques that underlie such ranking methods in order to ensure fairness, eliminate bias, and tailor them to specific applications.

In a paper published this month in the SIAM Journal on , authors Timothy Chartier, Erich Kreutzer, Amy Langville, and Kathryn Pedings mathematically analyze three commonly-used ranking methods. "We studied the sensitivity and stability of three popular ranking methods: PageRank, which is the method has used to rank web pages, and the Colley and Massey methods, which have been used by the Bowl Championship Series to rank U.S. college football teams," explains Langville.

All three methods analyzed – the Colley and the Massey ranking techniques and the Markov web page rankings—which is a generalized version of PageRank—are linear algebra-based with simple elegant formulations. Here, the authors apply a modified version of PageRank to a sports season.

"Both web page authors and teams sometimes try to game, or spam, ranking systems to achieve a higher ranking. For instance, web page authors try to modify their incoming and outgoing links while teams try to run up the score against weak opponents," says Langville, pointing out the significance of studying such methods. "Mathematically, such spamming can be viewed as changes to the input data required by the ranking method."

Most methods, including the aforementioned three, produce "ratings" of numerical scores for each team, which represents their playing ability. When sorted, these ratings produce ranks with integer values for each team, simply representing a numerical listing of the teams based on their rating.

In the first step of their analysis, the authors assume a simple rating scheme with constant difference of 1 in scores and apply it to a perfect sports season. In a perfect season, each team plays every other team only once and there are no upset victories or losses. In such an ideal scenario, a highly-ranked team would always beat a lower-ranked team. Thus, in a system with teams numbered 1 through 4 for their ranks, team 1 would beat all other teams; team 2 would beat teams 3 and 4, and lose to 1; team 3 would beat team 4, losing to teams 1 and 2; and team 4 would lose to all other teams. They then compute the output rating for each of the three methods and compare them to the input rating.

The three methods are applied to this ideal data, and all three methods recover the input ranking. However, while the Colley and Massey methods produce ratings that are uniformly spaced as would be desirable in a rating system, the Markov method, produces non-uniformly spaced ratings.

The authors analyze the sensitivity of the methods to small perturbations and determine how much the rating and ranking is affected by these changes. If, for instance, small changes in input data cause large changes in the output ratings, the method is considered sensitive. Similar discrepancies in the input and output ranking data would show instability of the ranking method.

The authors conclude that while the Colley and Massey methods are insensitive to small changes, the Markov method (or Page Rank method) is highly sensitive to such changes, often resulting in anomalies in rankings. For instance, there are cases of a single upset in a perfect season resulting in rearrangements of rankings for all teams because of the Markov method's high sensitivity. In these cases, the Colley and Massey methods would have an isolated response, resulting in changes to the rankings of only the two teams in question.

In addition, the sensitivity of the PageRank or Markov method gets more pronounced further down in the rankings. "The PageRank vector is quite sensitive to small changes in the input data. Further, this sensitivity increases as the rank position increases," Langville explains. "In other words, values in the tail (low-ranked positions) of the PageRank vector are extremely sensitive, which calls into question PageRank's use to produce a full ranking, as opposed to a simply top-k ranking. It also partially explains PageRank's susceptibility to spam. On the other hand, the Colley and Massey methods are stable throughout the entire ranking."

PageRank has recently evolved from being used exclusively for web pages to rank various entities, from species to social networks, reinforcing the ubiquity of these ranking systems.

But the stability displayed by the Colley and Massey methods in this study shows that these two methods would perhaps be effective even in ranking other entities, such as and movies, though originally conceived for sports rankings.

"As future work, we are exploring the use of the Colley and Massey methods in other settings beyond sports. For example, we have found that these two methods are more appropriate than PageRank for ranking in social networks such as Twitter," says Langville.

While methods can be applied to a wide range of areas, modifications are often required in order to translate a particular method to suit a specific application, making analyses of sensitivity and stability that much more important.

Explore further: When vaccines are imperfect: What math can tell us about their effects on disease propagation

Related Stories

New Algorithm Ranks Sports Teams like Google's PageRank

Dec 15, 2009

(PhysOrg.com) -- Sports fans may be interested in a new system that ranks NFL and college football teams in a simple, straightforward way, similar to how Google PageRank ranks webpages. The new sports algorithm, ...

Google PageRank-like algorithm dates back to 1941

Feb 19, 2010

(PhysOrg.com) -- When Sergey Brin and Larry Page developed their PageRank algorithm for ranking webpages in 1998, they certainly knew that the seeds of the algorithm had been sown long before that time, as ...

Ranking research

May 03, 2011

A new approach to evaluating research papers exploits social bookmarking tools to extract relevance. Details are reported in the latest issue of the International Journal of Internet Technology and Secured Transactions.

Recommended for you

When shareholders exacerbate their own banks' crisis

Nov 21, 2014

Banks are increasingly issuing 'CoCo' bonds to boost the levels of equity they hold. In a crisis situation, bondholders are forced to convert these bonds into a bank's equity. To date, such bonds have been ...

Trouble with your boss? Own it

Nov 21, 2014

Don't get along with your boss? Your job performance may actually improve if the two of you can come to grips with the poor relationship.

User comments : 2

Adjust slider to filter visible comments by rank

Display comments: newest first

emsquared
not rated yet May 24, 2011
So... you're saying Google should have a Web-page Bowl Play-off Series??
nine1189
not rated yet Jun 23, 2011
This makes sense. Just look at the Google panda update! As I have been into online marketing, this stability issue and sensitivity would be giving us a glimpse or some sort of an explanation why websites ranked low than what they rank before. Interesting!

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.