Alarming error common in survey analyses

July 23, 2018, American Statistical Association

It is difficult to understate the importance of survey data: They tell us who we are and—in the hands of policymakers—what to do.

It had long been apparent to Brady West, an expert on survey methodology at the University of Michigan, Ann Arbor, that the benefits of coexisted with a lack of training in how to interpret them correctly, especially when it came to secondary analyses—researchers reanalyzing survey data that had been collected by a previous study.

"In my consulting work for organizations and businesses, people would come in and say, 'Well, here's my estimate of how often something occurs in a population,' such as the rate of a disease or the preferences for a political party. And they'd want to know how to interpret that. I would respond, 'Have you accounted for weighting in the survey data you're using—or, did you account for the sample design?' And I would say, probably 90 percent of the time, they'd look at me and have no idea what I was talking about. They had never learned about the fundamental principles of working with survey data in their standard Intro to Stats classes."

As a survey methodologist, West wondered whether his experience was indicative of a systemic problem. There wasn't much in the academic literature to answer the question, so he and his colleagues, Joseph Sakshaug and Guy Aurelien, sampled 250 papers, reports and presentations—all available online, all conducting secondary analyses of survey data—to see if these analytic errors were, indeed, common.

"It was quite shocking," says West. "Only about half of these analyses claimed to account for weighting, the impact of sample designs on variance estimates was widely misunderstood and there was no sign of improvement in these problems over time." But possibly worst of all, these problems were just as prevalent in the peer-reviewed literature in their sample as they were in technical reports and conference presentations. "That's what was really most shocking to me," says West. "The peer-review process was not catching these errors."

An alarming example of what can happen when you compute an estimate but ignore the survey weighting can be found in the 2010 National Survey of College Graduates (NSCG). "This is a large national survey of college graduates, and they literally say in their documentation that they're oversampling individuals with science and engineering degrees," says West. "If you take account of the weighting, which corrects for this oversampling, about 30 percent of people are getting science and engineering degrees; if you forget about the weighting, you extrapolate the oversample to the entire population, and suddenly 55 percent of people have science and engineering degrees."

Ironically, better sampling of under-studied populations may be exacerbating the problem. "There's a lot of interest in under-represented populations, such as Hispanics," says West. "So, a lot of national surveys oversample these groups and others to create a big enough sample for researchers to adequately study. But when Average Joe Researcher grabs all the data—not just the data from the subpopulation they're interested in, but everybody, whites, African Americans, and Hispanics—and then they try to analyze all that data collectively, that's when oversampling can have a horrible effect on the overall picture if that feature of the sample design is not accounted for correctly in estimation."

There are many easy-to-use software tools that can easily account for the sampling and weighting complexities associated with survey data, but the fact they are not being used speaks to the underlying problem.

"This problem originates in the fact that the people publishing these articles just aren't told about any of this in their training," says West. "We've known about the importance of survey weighting for nearly a century—but somehow how to deal with weighted survey data hasn't penetrated the statistics classes that researchers take at the undergraduate or graduate level. We spend a fortune on doing national surveys—and who knows how much misinterpreting that data is costing us."

To solve that problem, West is helping design a MOOC (massive open online course) at the University of Michigan introducing statistics with the software Python. Weighting and correct survey analyses will be taught in the very first course of that specialization. "We're really focusing on making sure that before you jump into any analyses of data, you have a really firm understanding of how the data were collected and where they came from."

Explore further: How can one person completely change the results of a survey?

More information: JSM talk: http://ww2.amstat.org/meetings/jsm/2018/onlineprogram/AbstractDetails.cfm?abstractid=326973

Study link: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158120

Related Stories

Pre-election polls not becoming less reliable: study

March 12, 2018

Pre-election polls are not becoming less reliable, said a study Monday addressing public distrust stemming from surprise results in Britain's 2015 general election and the 2016 US presidential vote.

Recommended for you

Cellular microRNA detection with miRacles

March 26, 2019

MicroRNAs (miRNAs) are short noncoding regulatory RNAs that can repress gene expression post-transcriptionally and are therefore increasingly used as biomarkers of disease. Detecting miRNAs can be arduous and expensive as ...

What happened before the Big Bang?

March 26, 2019

A team of scientists has proposed a powerful new test for inflation, the theory that the universe dramatically expanded in size in a fleeting fraction of a second right after the Big Bang. Their goal is to give insight into ...

Probiotic bacteria evolve inside mice's GI tracts

March 26, 2019

Probiotics—which are living bacteria taken to promote digestive health—can evolve once inside the body and have the potential to become less effective and sometimes even harmful, according to a new study from Washington ...

2 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

Cusco
1 / 5 (2) Jul 23, 2018
Having conducted telephone surveys I can confidently say that the 'average' survey respondent is anything but an 'average' person. Many if not most telephone surveys, and almost all online ones, are garbage.
rrwillsj
2 / 5 (2) Jul 24, 2018
Oh cusco, you missed an important insight. What you were actually being paid to produce were the results needed to meet the expectations of the client. And nothing more.

Yeah, I know, the Hangman Paradox masquerading as a self-fulfilling prophecy.

After all, right or wrong, the customer is always right!

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.