June 16, 2015

Statistics education, evidence-based data analysis practices needed to fight reproducibility crisis in science

by American Statistical Association

Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.

"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.

In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.

"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.

This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.

Perhaps the most recent infamous example is a Duke University cancer research project in 2006 in which researchers published a paper claiming they had built an algorithm using genomic microarray data that predicted which cancer patients would respond to chemotherapy. A subsequent attempt to reproduce the results found a morass of poorly conducted data analyses with errors ranging from trivial and strange to devastating. The original study was retracted by Nature Medicine in 2011.

"The common thread between each of these public failings was the poor or questionable quality of the original analysis. The errors that were made showed a lack of judgement, training and quality control," wrote Peng.

Peng said to improve the quality of data analysis in science, stakeholders need to go beyond the call for reproducibility and increase the number of trained data analysts in the scientific community and identify statistical software and tools proven to improve study reproducibility and replicability. These latter items must be moderately robust to user error, noted Peng.

"If we could prevent problematic data analyses from being conducted, we could substantially reduce the burden on the [peer review] community of having to evaluate an increasingly heterogeneous and complex population of studies and research findings," asserted Peng.

Unfortunately, most scientists receive basic to moderate training in data analysis, creating the potential for generating individuals with enough skill to perform data analysis, but without enough knowledge to prevent data mistakes.

To improve the global robustness of scientific data analysis, we must take a two-pronged approach and couple massive-scale education efforts with the identification of data-analytic strategies that are reproducible and replicable in the hands of basic or intermediate data analysts, explained Peng.

Peng said a fundamental component of scaling up data science education is performing empirical studies to identify statistical methods, analysis plans and software that lead to increased replicability and reproducibility by scientists.

"We call this approach 'evidence-based data analysis,'" described Peng. "Just as evidence-based medicine applies the scientific method to the practice of medicine, evidence-based data analysis applies the scientific method to the practice of data analysis. Combining massive-scale education with evidence-based data analysis can allow us to quickly test data-analytic practices in a population most at risk for data analytics mistakes."

Journal information: Significance , Nature Medicine

Provided by American Statistical Association

Citation: Statistics education, evidence-based data analysis practices needed to fight reproducibility crisis in science (2015, June 16) retrieved 17 July 2024 from https://phys.org/news/2015-06-statistics-evidence-based-analysis-crisis-science.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Virtual finger enables scientists to navigate and analyze complex 3D images

62 shares

Feedback to editors

Statistics education, evidence-based data analysis practices needed to fight reproducibility crisis in science

Study identifies RNA molecule that regulates cellular aging

CERN physicist explains how team uses subatomic splashes to restart experiments after annual upgrades

New research sheds light on river dynamics and cutoff regimes

Microbial structures in Antarctic lake could reveal more about how life evolved

Sea ice's cooling power is waning faster than its area of extent, new study finds

Scientists identify brain circuits tied to the behavior of schooling fish

The most endangered fish are the least studied, scientists find

Diatom surprise could rewrite the global carbon cycle

Crown-of-thorns starfish larvae feast on toxic cyanobacteria, study finds

Microbes found to destroy certain 'forever chemicals' by cleaving stubborn fluorine-to-carbon bonds

Relevant PhysicsForums posts

Question about partial derivative relations for complex numbers

Views On Complex Numbers

Can you solve unknown triangle from shared hypotenuse?

Sharing Ratio -- A shop sells a mix of small chocolate bars and large chocolate bars

Understanding why πr^2 works for different area calculations

Implication vs Equivalence

Virtual finger enables scientists to navigate and analyze complex 3D images

Science is in a reproducibility crisis: How do we resolve it?

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

To teach scientific reproducibility, start young

Algorithm reduces size of data sets while preserving their mathematical properties

Climate change analysis predicts increased fatalities from heat waves

Merging AI and human efforts to tackle complex mathematical problems

New mathematical proof helps to solve equations with random components

Study finds cooperation can still evolve even with limited payoff memory

Study shows the power of social connections to predict hit songs

Wire-cut forensic examinations currently too unreliable for court, new study says

How can we make good decisions by observing others? A videogame and computational model have the answer

Medical Xpress

Tech Xplore

Science X

Statistics education, evidence-based data analysis practices needed to fight reproducibility crisis in science

Study identifies RNA molecule that regulates cellular aging

CERN physicist explains how team uses subatomic splashes to restart experiments after annual upgrades

New research sheds light on river dynamics and cutoff regimes

Microbial structures in Antarctic lake could reveal more about how life evolved

Sea ice's cooling power is waning faster than its area of extent, new study finds

Scientists identify brain circuits tied to the behavior of schooling fish

The most endangered fish are the least studied, scientists find

Diatom surprise could rewrite the global carbon cycle

Crown-of-thorns starfish larvae feast on toxic cyanobacteria, study finds

Microbes found to destroy certain 'forever chemicals' by cleaving stubborn fluorine-to-carbon bonds

Relevant PhysicsForums posts

Related Stories

Virtual finger enables scientists to navigate and analyze complex 3D images

Science is in a reproducibility crisis: How do we resolve it?

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

To teach scientific reproducibility, start young

Algorithm reduces size of data sets while preserving their mathematical properties

Climate change analysis predicts increased fatalities from heat waves

Recommended for you

Merging AI and human efforts to tackle complex mathematical problems

New mathematical proof helps to solve equations with random components

Study finds cooperation can still evolve even with limited payoff memory

Study shows the power of social connections to predict hit songs

Wire-cut forensic examinations currently too unreliable for court, new study says

How can we make good decisions by observing others? A videogame and computational model have the answer

Newsletter sign up

Donate and enjoy an ad-free experience