Real March Madness is relying on seedings to determine Final 4
Think picking all the top-seeded teams as the Final Four in your March Madness bracket is your best bet for winning the office pool? Think again.
According to an operations research analysis model developed by Sheldon H. Jacobson, a professor of computer science and the director of the simulation and optimization laboratory at the University of Illinois, you're better off picking a combination of two top-seeded teams, a No. 2 seed and a No. 3 seed.
"There are patterns that exist in the seeds," Jacobson says. "As much as we like to believe otherwise, the fact of the matter is that we've uncovered a model that captures this pattern. As a result of that, in spite of what we emotionally feel about teams or who's going to win, the reality is that the numbers trump all of these things," Jacobson said. "It's more likely to be 1, 1, 2, 3 in the Final Four than four No. 1's."
Jacobson's model is unique in that it prognosticates not based on who the teams are, but on the seeds they hold. He describes his model in a forthcoming paper in the journal Omega with co-authors Alex Nikolaev, of the University of Buffalo; Adrian Lee, of CITERI (Central Illinois Technology and Education Research Institute); and Douglas King, a graduate student at Illinois.
Jacobson has also integrated the model into a user-friendly website to help March Madness fans determine the relative probability of their chosen team combinations appearing in the final rounds of the NCAA men's basketball tournament.
A number of websites offer assistance to budding bracketologists, such as game-by-game probabilities of certain match-ups or determining the spread on a given team reaching a particular point in the tournament. Jacobson's website is the only one to look at collective groups of seeds within the brackets.
"What we do is use the power of analytics to uncover trends in 'bracketology.' It really is a mathematical science," he said. "What our model enables us to do is look at the likelihood or probability that a certain set of seed combinations will occur as we advance deeper into the tournament."
Jacobson's team applied a statistical method called goodness-of-fit testing to NCAA tournament data from 1985 to 2010, identifying patterns in seed distribution in the Elite Eight, Final Four and national championship rounds. They found that the seeds themselves exhibit certain statistical patterns, independent of the team. They then fit the pattern to a stochastic model they can use to assess probabilities and odds.
Two computer science undergraduates, Ammar Rizwan and Emon Dai, built the website bracketodds.cs.illinois.edu based on Jacobson's model. The publicly accessible website will be up through the entire tournament. Users can evaluate their brackets and also can compare relative likelihood of two sets of seed combinations.
"For each of the rounds that we have available, you could put in what you have so far and even compare it to other possible sets," Rizwan said.
For example, the probability of the Final Four comprising the four top-seeded teams is 0.026, or once every 39 years. Meanwhile, the probability of a Final Four of all No. 16 seeds the lowest-seeded teams in the tournament is so small that it has a frequency of happening once every eight hundred trillion years. (The Milky Way contains an estimated one hundred billion stars.)
"Basically, if every star was given a year, the years it would take for this to occur is 8,000 times all the stars in the galaxy," Jacobson said. "It gives you perspective."
However, sets with long odds do happen. The most unlikely combination in the 26 years studied occurred in 2000, with a Final Four seed combination of 1, 5, 8 and 8. But such a bracket is only predicted to happen once every 32,000 years, so those filling out brackets at home shouldn't hope for a repeat.
What amateur bracketologists can be confident of is upsets. For even the most probable Final Four combination of 1,1,2,3 to occur, two top-seeded schools have to lose.
"In fact, upsets occur with great frequency and great predictability. If you look statistically, there's a certain number of upsets that occur in each round. We just don't know which team they're going to be or when they're going to occur," Jacobson said.
After the 2011 tournament, and in years to come, Jacobson will integrate the new data into the model to continually refine its prediction power. For 2012, Jacobson, Rizwan and Dai hope to integrate a comparative probability feature into the website to allow users to calculate, for example, the probability of a particular set of Final Four seeds if the Elite Eight seeds are given.
Until then, users can find out how likely their picks really are, and compare them against friends' picks or even sports commentators'.
"We're not here specifically to say 'Syracuse is going to beat Kentucky in the Elite Eight.' What we're saying is that the seed numbers have patterns," Jacobson said. "A 1, 1, 2, 3 is the most likely Final Four. I don't know which two 1's, I don't know which No. 2 and I don't know which No. 3. But I can tell you that if you want to go purely with the odds, choose a Final Four with seeds 1, 1, 2, 3."