Democrats and Republicans in recent years haven't seemed able to agree on the time of day, but there is one assertion on which they've found common ground: Polling and data analytics took a spectacular face-plant in the 2016 election.
On Election Day, nearly every public polling firm predicted that Hillary Clinton would win the presidency. The only real debate was by how large a margin. Even leading statistical analysis site FiveThirtyEight.com gave Donald Trump a less than 1 in 3 chance of winning. So when he surged to victory with 306 Electoral College votes, stunned political pundits blamed pollsters and forecasters, proclaiming "the death of data."
But statistician Nate Silver, the founder and editor in chief of FiveThirtyEight.com, says it wasn't data analytics that failed, but the major media outlets that didn't properly understand probability and instead leaned on shopworn conventional wisdom. Silver helped popularize the application of statistical analysis in baseball and then in politics, particularly with Barack Obama's 2008 election as president.
Silver will be a featured speaker at Harvard's second annual "Political Analytics Conference" on Friday (3/31), an event organized by Ryan Enos, associate professor in Harvard's government department, and Kirk Goldsberry, a visiting scholar at Harvard's Center for Geographic Analysis.
Silver spoke with The Gazette about what analysts got wrong—and right—in the 2016 election, and how careful observers could have seen Trump's victory coming.
GAZETTE: At last year's conference here, you were still skeptical of Trump's viability as the Republican Party nominee, which was fairly late. On election night, your site had Hillary's chances at 71 percent; almost everyone else had her up by even more. Why do you think Trump's victory blindsided so many?
SILVER: I think people shouldn't have been so surprised. Clinton was the favorite, but the polls showed, in our view, particularly at the end, a highly competitive race in the Electoral College. We had him with a 30 percent chance, and that's a pretty likely occurrence. Why did people think it was much less than that? I think there are a few things. One is that I don't think people have a good intuitive sense for how to translate polls to probabilities. In theory, that's the benefit of a model. But I think people thought "Well, Clinton's ahead in most of the polls in most states, and I remember that seems similar to Obama four years ago, and therefore I'm very confident that she'll win." It's ad hoc and not really very rigorous, that thought process.
The second part is that there is a certain amount of groupthink. People looking at the polls are mostly in newsrooms in Washington and Boston and New York. These are liberal cities, and so people tend to see evidence (in our view, it was kind of conflicting polling data) as pointing toward a certain thing. People have trouble taking different information about, for example, signs of decline in African-American turnout and reconciling that against supposedly good numbers among Hispanic turnout for Clinton. People weren't using the more thoughtful sides of their brains; they were using the more emotional sides of their brains.
One thing I think is a myth is the notion of, "Oh, polls got the election wrong." I don't think that's true at all. The polls pointed toward a competitive race. The national polls in particular were very close to where the race ended up. So I think it more reflects blind spots in people's thinking.
GAZETTE: What drove the seeming volatility in your model this cycle as compared with 2008 and 2012? Did late-breaking news events have a more dramatic swing effect than they had in the past, or were the high unfavorability ratings of both candidates a more complicating factor than anticipated?
SILVER: Yes. This race was not especially volatile. It was about as volatile as presidential elections have been on average since 1972. The problem is that people assumed that 2012 was the new normal. People have short memories, and they don't remember 1992, when Ross Perot got into and out of the race, and it would swing by 10 points at a time. They don't remember 2000, when there were a lot of surges back and forth between Al Gore and George Bush, including one right at the end that wasn't captured by polls. (In fact, Gore was not supposed to have won the popular vote that year, which he did.) So I think the problem is that people are just referencing 2012, which was a very unusually stable race, and, to a lesser extent, 2008 and 2004.
So the fact that Clinton would go from a 60 percent chance to a 70 percent chance to an 80 percent chance, and back and forth within that range, those aren't huge swings. But I think people were conditioned to have false ideas about how stable a presidential race is.
You should expect people to change their opinions as major, major news events unfold. Every week of the campaign was filled with dramatic events. It's not like any of the big polling shifts were coming out of nowhere. They were all triggered by big news events … that affected voters' decisions, including the last 10 days with FBI Director James Comey's letter. People reacted to that in ways you might expect them to. It wasn't like there were a lot of stories that people expected to be good for Clinton and they were good for Trump, or vice versa. You could see a lot of this coming. To have Clinton go from a 7-point lead to a 3-point lead over Trump only requires 1 in every 50 Americans to change their minds. In some ways, it's amazing that the polls aren't more volatile than that.
The fact that there were a lot of undecided voters was a tip off, and it's related to the fact that voters really didn't like either candidate. So you have this big chunk of the population, 15 or 20 percent, that were like, "Well, I'm not sure what to do. I've never really been in a situation where I dislike both candidates so much." And remember, they have four options: They can vote for Clinton, they can vote for Trump, they can vote for a third-party candidate, or they can not vote. It wasn't just people switching between Clinton and Trump—I think there were actually not that many of those people. It's the fact that you can go from leaning one way to undecided; you can go from undecided to out of the electorate; you can go from being a Gary Johnson or a Jill Stein voter to one of the major party candidates. Historically, when you have more undecided and when you have more third-party voters, then things do swing a little bit more and people aren't that locked in.
I go back to the fact that in the final national polling averages, it was something like Clinton 46 percent, Trump 43 percent. If you leave aside the fact that Clinton won the popular vote, when you see numbers in the mid-40s, you shouldn't assume that anybody's in a particularly safe position. In the Obama-Mitt Romney race, it was tight, but it was also Obama 49, Romney 46, or Obama 50, Romney 47 in a lot of those polls. So there was very little wiggle room. In this one, there were huge numbers of people who were giving every indication that they weren't happy with their choices and were taking their time to make up their minds. And that got lost, even though it really stood out in the data, the huge number of undecided voters.
GAZETTE: So you don't buy into this idea that polling was a massive failure in 2016?
SILVER: Not only am I not on that bandwagon, I think it's pretty irresponsible when people in the mainstream media perpetuate that narrative. Everyone gets elections wrong. We think our take on the primary wasn't very good, whereas we think our model did a good job in the general election. We try and step back afterward and do a self-assessment and say, "Here's an outcome that occurred that was anywhere from 'somewhat unlikely' to 'very unlikely." Is it because we looked at the world in the wrong way, or built a model in the wrong way, or did our reporting in the wrong way? And big news organizations like The New York Times, for example, didn't do that. If you go back and read The Times, they say point-blank, basically time and time again, "This is a sure thing for Hillary Clinton." They don't attach a percentage to it, but that's extremely clear from their reporting. And then, the day after the election, they blame the polls and their data site. That's pretty irresponsible.
The Electoral College is something that a lot of people got very wrong. The assertion from the mainstream media was that the Electoral College was an advantage to Clinton, and of course it was a huge disadvantage to her. In fact, there's never been a candidate since Samuel Tilden who's been so disadvantaged by the Electoral College. We thought that was clear from the data, but the people who weren't looking at the data were instead taking narratives they had about Clinton's more diverse coalition and ascendant America and everything else and then interpreting data through that biased lens.
GAZETTE: When you saw a substantially higher GOP primary turnout in disparate blue states like Massachusetts and Michigan, for example, did that set off any alarm bells, or should it have in hindsight?
SILVER: Any time you have a general election and you don't have an incumbent and you have kind of an average economy, it wasn't ever that farfetched that Trump would win. The primary, at least for us, was a more remarkable occurrence because that did defy a lot of precedent from what Republican voters had done in the past and suggested that our conceptions of how the Republican Party behaved and how primaries worked were quite wrong. I don't think anything about the general election should have been that shocking to people. The shocking event was Trump winning the Republican primaries by a pretty healthy margin in the end, when we thought it was the party of Reagan and Bush, and he really ran against that message in a lot of ways. And the idea that the Republican establishment gets its way, it's how it always is in the Republican Party. Obviously, the opposite happened.
Ironically, the polls in the primaries were actually pretty good on Donald Trump. They correctly showed him leading and were correct in most states. They went high on Trump in a couple of states, like Iowa, but for the most part they did a pretty good job on the Republican primaries. So that was another case where people, including us, were ignoring data, or at least one type of data. Obviously, Trump himself is not a very data-driven president, but the notion that this was some big failure of data doesn't really match with the evidence. It's a giant, enormous, gaping failing for conventional wisdom. But people are often afraid to admit that their perspective on the world is sometimes wrong.
GAZETTE: Have you made any adjustments to your methodology or assumptions?
SILVER: We think our general election model was really good. It said there was a pretty good chance of Trump winning, and it correctly captured that the Electoral College was a big vulnerability for Clinton. So our viewpoint is that, in terms of the general election, if everyone says "Trump has no chance" and you use modeling to say "Hey, look at this more rigorously; he actually has a pretty good chance. Not 50 percent, but 30 percent is pretty good." To me, that's a highly successful application of modeling. That doesn't mean you can't go back and look at things. We provided a lot of insight to our readers to not be complacent about the general election outcome, whereas I think that wasn't true for a lot of places.
In terms of the primaries, we didn't actually have a model, per se. I think that was part of the problem. We weren't being that rigorous about it; we were kind of winging it ourselves. So in some ways I think the mistakes we made in the primaries paralleled the mistakes that other people made in the general election. I don't think either of those are modeling mistakes. I think they're mistakes of looking at things without enough discipline and rigor and getting attached to a narrative and not adjusting enough in the face of evidence. If you had said in June of 2015, when Trump first went down the escalator in Trump Tower, if you had said then, "I really don't expect he'll win the nomination," I can't really blame you for that. It was really early. But the fact that we and a lot of other people were still saying that in November of 2015, when he had survived a lot of trials and tribulations and he had been the central topic of conversation for several months, that I think was a sign that people should have adjusted a little bit quicker.
We think other people should look at our general election model and emulate it [laughs]. We eat a lot of crow when we think we're wrong about something; we did in the primaries, but unambiguously. If we're telling people that Trump winning is a lot more likely than almost anyone gave him credit for, and he wins, then that's a case where you should be happy with that outcome. I know the politics of it aren't that way. Politically, it's like if anything less than 50 percent happens, you're wrong. But we think intellectually the story's pretty clear.
GAZETTE: Did data analytics oversell itself, and did this election do permanent reputational damage to the field?
SILVER: First of all, if you had a forecast that Trump had a 1 percent chance of winning, your reputation should be damaged by that. And if you're a newspaper like The New York Times that says that Clinton was a sure thing, then your reputation should be damaged by that. People in this day and age want information that confirms their prior beliefs and that they can take comfort from. If they came to FiveThirtyEight and were reading not just our forecasts but our coverage saying "this election is competitive" and they want to blame us later on, that's their own fault. I'm not sympathetic to it. But we also get reactions from a lot of people saying "I knew because of reading your site that Trump had a chance, and it wasn't a sure thing."
People are funny about how they view probability. I sympathize with the average reader and what the average person encounters when they see a 30 percent number. But I have no sympathy for journalists who don't get probability. That's unacceptable when there's a lot of illiterate statements made about probability by journalists. By the way, if you look at public opinion, people weren't actually all that confident in Clinton's chances. It was the media who were very confident in Clinton's chances. So it's a failure of conventional wisdom, first and foremost.
I think there are people who feel threatened by the more data-driven types covering campaigns because it undermines their authority, because we report in a transparent way, where we're clear about our assumptions and we present evidence for our claims and we don't just take things at face value.
Another reporting failure was one where people were talking about internal polls and campaign polls from Clinton that had her way ahead. If you had followed that strain of reporting, you were even more off on the result than if you had followed the public polls. If people want to engage in the politics of blame, they can. But if these news outlets are talking about the importance of truth in journalism, then they should go back and look at what we actually wrote about the campaign and what they actually wrote and be more honest with themselves about where the failures were.
Explore further: Like polls, prediction markets failed to see Trump's victory coming, economist says