share this!
8
7
Share
Email

April 1, 2019

Is it the end of 'statistical significance'? The battle to make science more uncertain

The scientific world is abuzz following recommendations by two of the most prestigious scholarly journals – The American Statistician and Nature – that the term "statistical significance" be retired.

In their introduction to the special issue of The American Statistician on the topic, the journal's editors urge "moving to a world beyond 'pthe famous 5 percent threshold for determining whether a study's result is statistically significant. If a study passes this test, it means that the probability of a result being due to chance alone is less than 5 percent. This has often been understood to mean that the study is worth paying attention to.

The journal's basic message – but not necessarily the consensus of the 43 articles in this issue, one of which I contributed – was that scientists first and foremost should "embrace uncertainty" and "be thoughtful, open and modest."

While these are fine qualities, I believe that scientists must not let them obscure the precision and rigor that science demands. Uncertainty is inherent in data. If scientists further weaken the already very weak threshold of 0.05, then that would inevitably make scientific findings more difficult to interpret and less likely to be trusted.

Piling difficulty on top of difficulty

In the traditional practice of science, a scientist generates a hypothesis and designs experiments to collect data in support of hypotheses. He or she then collects data and performs statistical analyses to determine if the data did in fact support the hypothesis.

One standard statistical analysis is the p-value. This generates a number between 0 and 1 that indicates strong, marginal or weak support of a hypothesis.

But I worry that abandoning evidence-driven standards for these judgments will make it even more difficult to design experiments, much less assess their outcomes. For instance, how could one even determine an appropriate sample size without a targeted level of precision? And how are research results to be interpreted?

These are important questions, not just for researchers at funding or regulatory agencies, but for anyone whose daily life is influenced by statistical judgments. That includes anyone who takes medicine or undergoes surgery, drives or rides in vehicles, is invested in the stock market, has life insurance or depends on accurate weather forecasts… and the list goes on. Similarly, many regulatory agencies rely on statistics to make decisions every day.

Scientists must have the language to indicate that a study, or group of studies, provided significant evidence in favor of a relationship or an effect. Statistical significance is the term that serves this purpose.

The groups behind this movement

Hostility to the term "statistical significance" arises from two groups.

The first is largely made up of scientists disappointed when their studies produce p=0.06. In other words, those whose studies just don't make the cut. These are largely scientists who find the 0.05 standard too high a hurdle for getting published in the scholarly journals that are a major source of academic knowledge – as well as tenure and promotion.

The second group is concerned over the failure to replicate scientific studies, and they blame significance testing in part for this failure.

For example, a group of scientists recently repeated 100 published psychology experiments. Ninety-seven of the 100 original studies reported a statistically significant finding (p
The failure of so many studies to replicate can be partially blamed on publication bias, which results when only significant findings are published. Publication bias causes scientists to overestimate the magnitude of an effect, such as the relationship between two variables, making replication less likely.

Complicating the situation even further is the fact that recent research shows that the p-value cutoff doesn't provide much evidence that a real relationship has been found. In fact, in replication studies in social sciences, it now appears that p-values close to the standard threshold of 0.05 probably mean that a scientific claim is wrong. It's only when the p-value is much smaller, maybe less than 0.005, that scientific claims are likely to show a real relationship.

The confusion leading to this movement

Many nonstatisticians confuse p-value with the probability that no discovery was made.

Let's look at an example from the Nature article. Two studies examined the increased risk of disease after taking a drug. Both studies estimated that patients had a 20 percent higher risk of getting the disease if they take the drug than if they didn't. In other words, both studies estimated the relative risk to be 1.20.

However, the relative risk estimated from one study was more precise than the other, because its estimate was based on outcomes from many more patients. Thus, the estimate from one study was statistically significant, and the estimate from the other was not.

The authors cite this inconsistency – that one study obtained a significant result and the other didn't – as evidence that statistical significance leads to misinterpretation of scientific results.

However, I feel that a reasonable summary is simply that one study collected statistically significant evidence and one did not, but the estimates from both studies suggested that relative risk was near 1.2.

Where to go from here

I agree with the Nature article and The American Statistician editorial that data collected from all well-designed scientific studies should be made publicly available, with comprehensive summaries of statistical analyses. Along with each study's p-values, it is important to publish estimates of effect sizes and confidence intervals for these estimates, as well as complete descriptions of all data analyses and data processing.

On the other hand, only studies that provide strong evidence in favor of important associations or new effects should be published in premier journals. For these journals, standards of evidence should be increased by requiring smaller p-values for the initial report of relationships and new discoveries. In other words, make scientists publish results that they're even more certain about.

The bottom line is that dismantling accepted standards of statistical evidence will decrease the uncertainty that scientists have in publishing their own research. But it will also increase the public's uncertainty in accepting the findings that they do publish – and that can be problematic.

Journal information: Nature

Provided by The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Citation: Is it the end of 'statistical significance'? The battle to make science more uncertain (2019, April 1) retrieved 4 July 2024 from https://phys.org/news/2019-04-statistical-significance-science-uncertain.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Calling time on 'statistical significance' in science research

20 shares

Feedback to editors

Is it the end of 'statistical significance'? The battle to make science more uncertain

Piling difficulty on top of difficulty

The groups behind this movement

The confusion leading to this movement

Where to go from here

Physicists develop method to detect single-atom defects in semiconductors

New organic molecule shatters phosphorescence efficiency records and paves way for rare metal-free applications

Searching for dark matter with the coldest quantum detectors in the world

Compact cities found to have lower carbon emissions but poorer air quality, less green space and higher mortality rates

New theory reveals fracture mechanism in soft materials

Researchers uncover key mechanisms in chromosome structure development

Energy landscape theory sheds light on evolution of foldable proteins

Researchers discover photo-induced charge-transfer complex between amine and imide

Why do you keep your house so cold? Study suggests childhood home temperature can predict adult thermostat settings

Cryptocurrency investors are more likely to self-report 'Dark Tetrad' personality traits, study shows

Relevant PhysicsForums posts

What motivates famous mathematicians?

Innumeracy in public media today

Shrinking a polygon -- calculation logic

Why are the axes taken as perpendicular to each other?

The sum of positive integers up to infinity: Was Sirinivasa right?

Views On Complex Numbers

Calling time on 'statistical significance' in science research

Is the p-value pointless?

Social media for medical journals operates in 'wild west,' needs more support to succeed

Genocide hoax tests ethics of academic publishing

Chemists warm up to preprint servers

Statistics Professor Hides Pictures, Messages in Problem Solutions

Merging AI and human efforts to tackle complex mathematical problems

New mathematical proof helps to solve equations with random components

Study finds cooperation can still evolve even with limited payoff memory

Study shows the power of social connections to predict hit songs

Wire-cut forensic examinations currently too unreliable for court, new study says

How can we make good decisions by observing others? A videogame and computational model have the answer

Medical Xpress

Tech Xplore

Science X

Is it the end of 'statistical significance'? The battle to make science more uncertain

Piling difficulty on top of difficulty

The groups behind this movement

The confusion leading to this movement

Where to go from here

Physicists develop method to detect single-atom defects in semiconductors

New organic molecule shatters phosphorescence efficiency records and paves way for rare metal-free applications

Searching for dark matter with the coldest quantum detectors in the world

Compact cities found to have lower carbon emissions but poorer air quality, less green space and higher mortality rates

New theory reveals fracture mechanism in soft materials

Researchers uncover key mechanisms in chromosome structure development

Energy landscape theory sheds light on evolution of foldable proteins

Researchers discover photo-induced charge-transfer complex between amine and imide

Why do you keep your house so cold? Study suggests childhood home temperature can predict adult thermostat settings

Cryptocurrency investors are more likely to self-report 'Dark Tetrad' personality traits, study shows

Relevant PhysicsForums posts

Related Stories

Calling time on 'statistical significance' in science research

Is the p-value pointless?

Social media for medical journals operates in 'wild west,' needs more support to succeed

Genocide hoax tests ethics of academic publishing

Chemists warm up to preprint servers

Statistics Professor Hides Pictures, Messages in Problem Solutions

Recommended for you

Merging AI and human efforts to tackle complex mathematical problems

New mathematical proof helps to solve equations with random components

Study finds cooperation can still evolve even with limited payoff memory

Study shows the power of social connections to predict hit songs

Wire-cut forensic examinations currently too unreliable for court, new study says

How can we make good decisions by observing others? A videogame and computational model have the answer

Newsletter sign up

Donate and enjoy an ad-free experience