A statistician intent on sharing research to promote better science

Dec 18, 2013

For centuries, researchers in fields as disparate as astrophysics and political science have faced the same hurdle before they could win acceptance for their theories—their peers must replicate and verify their results.

"It's the only way we have to decide whether or not we are getting closer to the truth," says Victoria Stodden, an assistant professor of statistics. "Otherwise, how do we know if something is right?"

Yet, most papers—whether on a new solar system or the impact of adding more cops on the streets of St. Louis—are published without the sets and computer codes used to generate the reported results.

Stodden, who arrived at Columbia in 2010 after earning a Ph.D. in statistics and a law degree at Stanford University, is at the forefront of a movement to convince journals, academics and policy makers alike to embrace a new era of data sharing. She has published widely on the subject, testified about it before Congress and is a primary collaborator behind ResearchCompendia.org, an online data repository that academics use to create companion websites for their papers to allow open access to and data.

There's a lot at stake. Sharing code and data, Stodden argues, would not only dramatically speed up the pace at which academics could verify each other's work, but might produce new revelations and theories altogether.

"If we're not sharing code and data," Stodden says, "there's a lot of duplication. If I could get my hands on your data set and maybe combine it with some data I have, I can open up a whole new set of questions."

As an undergraduate at the University of Ottawa, Stodden was fascinated by policy issues and planned to pursue a doctorate in economics. But she soon realized that the questions that really fascinated her depended on the quality of the data she could get her hands on. If two states, for instance, have different welfare-to-work policies, reviewing the data would allow you to evaluate which one worked better.

So, instead of applying to grad school in economics, Stodden pursued statistics at Stanford, hoping to get the best possible "tool kit" to work with data. Fortuitously, her adviser required students to publish code and data with their academic papers, a policy that provided maximum transparency and let readers delve more deeply into subjects.

Stodden went to law school as a path to policy research. But she quickly began to focus on the legal issues that stand in the way of academics interested in publishing data and code with their papers. She's been working to make such an approach standard practice ever since, and a number of academic institutions and government agencies are, too.

In 2011, the National Science Foundation began requiring researchers to include a "data management plan" with grant applications. More recently, the agency began requiring some applicants to describe how they plan to make software available.

The National Institute of Health has unveiled similar policies. And in February, the White House instructed federal funding agencies to develop plans to ensure public access to the results of federally sponsored research results, including data and publications.

Stodden has been working to facilitate and understand the impact of these policy changes and others like it. One of her first papers analyzed the legal barriers to sharing code and data, such as patent, copyright and intellectual property regulations. She came up with an approach that she called the "Reproducible Research Standard," which laid out practices that might help overcome the barriers. Among them: an automatic university approval process for reusing research data and software.

"It should be the case that if you make a really useful algorithm that other people can use in their research, that should accrue to your stature as a researcher," she says.

This past summer, Stodden published a paper in the journal PLOS ONE comparing data and code disclosure requirements at 170 academic journals and demonstrating that the norms are rapidly shifting.

Thirty journals made a data policy change between 2011 and 2012, 12 made changes in their software policies, and 36 made changes in their supplementary data policy.

A second part of Stodden's research involves analyzing rigorous statistical methods for reproducibility, and empirical modeling to understand which policies are most effective at promoting verification efforts. Among other questions Stodden plans to address are: How hard is it to get code from an author if the journal's policy is to make it available upon request? And are the data and code provided under these policies useful in reproducing the results? "If we're not sharing code and data, we're limiting avenues of inquiry," she says. "But this has to happen at the grassroots level. There has to be cooperation from researchers or it won't work."

Explore further: Science is in a reproducibility crisis: How do we resolve it?

add to favorites email to friend print save as pdf

Related Stories

Scientists who share data publicly receive more citations

Oct 01, 2013

A new study finds that papers with data shared in public gene expression archives received increased numbers of citations for at least five years. The large size of the study allowed the researchers to exclude ...

Recommended for you

Not just the poor live hand-to-mouth

7 hours ago

When the economy hits the skids, government stimulus checks to the poor sometimes follow. Stimulus programs—such as those in 2001, 2008 and 2009—are designed to boost the economy quickly by getting cash ...

Math modeling handbook now available

9 hours ago

Math comes in handy for answering questions about a variety of topics, from calculating the cost-effectiveness of fuel sources and determining the best regions to build high-speed rail to predicting the spread ...

Archaeologists, tribe clash over Native remains

10 hours ago

Archaeologists and Native Americans are clashing over Indian remains and artifacts that were excavated during a construction project in the San Francisco Bay Area, but then reburied at an undisclosed location.

Male-biased tweeting

12 hours ago

Today women take an active part in public life. Without a doubt, they also converse with other women. In fact, they even talk to each other about other things besides men. As banal as it sounds, this is far ...

Developing nations ride a motorcycle boom

13 hours ago

Asia's rapidly developing economies should prepare for a full-throttle increase in motorcycle numbers as average incomes increase, a new study from The Australian National University has found.

User comments : 0

More news stories

Male-biased tweeting

Today women take an active part in public life. Without a doubt, they also converse with other women. In fact, they even talk to each other about other things besides men. As banal as it sounds, this is far ...

Not just the poor live hand-to-mouth

When the economy hits the skids, government stimulus checks to the poor sometimes follow. Stimulus programs—such as those in 2001, 2008 and 2009—are designed to boost the economy quickly by getting cash ...

Archaeologists, tribe clash over Native remains

Archaeologists and Native Americans are clashing over Indian remains and artifacts that were excavated during a construction project in the San Francisco Bay Area, but then reburied at an undisclosed location.