Redefine statistical significance: Large group of scientists, statisticians argue for changing p-value from .05 to .005

August 1, 2017 by Bob Yirka report
Relationship between the P-value threshold, power, and the false positive rate. Credit: PsyArXiv, 22 July 2017.

(—A large group of scientists and statisticians has uploaded a paper to the PsyArXiv preprint server arguing for changing the p-value from .05 to .005. The paper outlines their reasons for suggesting that the commonly used value for assigning significance to results be changed.

Some science is cut and dried: If you drop a ball from a tower, for example, it will fall to the ground under normal circumstances. Unfortunately, a lot of other science is not nearly so definitive—the of investigating, producing and using pharmaceuticals, for example. Not all drugs work as expected in all people under all conditions. Uncertainty is prevalent in many areas, including astronomy, physics and economics. Because of this, the has settled on a means for obtaining the p-value that offers a measure of an experiment's success. Different p-values mean different things, of course, but the most prominent is the one that represents what has come to be known as , which has historically been set at .05. But now, this new paper suggests that the bar has been set too low, and is therefore contributing to the problem of irreproducible findings in research efforts.

One of the main problems with the p-value, some in the statistics field have suggested, is that non-statisticians do not really understand it and because of that, use it incorrectly. It cannot be used, for example, to declare that a new drug has a 95 percent chance of working if it is used in the prescribed way. It is also not a way of interpreting how true something is, they note. Instead, it is defined as the probability of an outcome when conducting a test that is equal to or "more extreme" than the result if the null hypothesis (nothing happened) is true.

But even when it is used correctly, it does not offer a strong enough measure of evidence, according to the authors. Thus, they suggest changing the p-value to .005. They claim doing so would reduce the rate of from the current 33 percent down to 5 percent.

Explore further: One reason so many scientific studies may be wrong

More information: Benjamin, Daniel J et al. "Redefine Statistical Significance". PsyArXiv, 22 July 2017. Web.

Related Stories

Recommended for you

New paper answers causation conundrum

November 17, 2017

In a new paper published in a special issue of the Philosophical Transactions of the Royal Society A, SFI Professor Jessica Flack offers a practical answer to one of the most significant, and most confused questions in evolutionary ...

Chance discovery of forgotten 1960s 'preprint' experiment

November 16, 2017

For years, scientists have complained that it can take months or even years for a scientific discovery to be published, because of the slowness of peer review. To cut through this problem, researchers in physics and mathematics ...


Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Aug 01, 2017
"Instead, it is defined ....if the null 'hypnosis' (nothing happened) is true." -please spell check before uploading an article

is this self driven initiative from scientist or from experts from other fields ?
5 / 5 (2) Aug 01, 2017
If they do that, most of the stories on psychological studies (at least) will disappear from this site. They always claim a "significant" finding but almost never define what the degree of significance is. This leaves the reader with no way to judge the cost/benefit of whatever the article is pitching.
5 / 5 (2) Aug 02, 2017
For experimental psychology, a p-value = 0.10 or 0.05 is common, but is too lenient. a p-value = 0.01 would be better. The p-value measures only the likelihood that in reality there is a nonnull relationship.

However, an equally big problem in published experimental psychology papers is the omission of the theta value, which is a measure of the strength of the effect. The strength is important for assessing the plausibility of the theoretical explanation for the relationship.
5 / 5 (1) Aug 02, 2017
@BobSage - when papers talk about "statistical significance", they are referring to the .05 amount. I agree however that defining a value of .005 would be much better.
5 / 5 (1) Aug 02, 2017
As a statistician I have always told clients* that when doing exploratory research you may want to use 0.05 or 0.10 to select what to test in your more formal testing. But the formal testing should use p = 0.001 or smaller depending on how many cases they had looked at during the exploratory phase. For example, if you looked at 1000 compounds, and three stood out. While another 50 fell near the p = 0.05 line, you might have three promising candidates, and the expected amount of noise. Now figure how many test cases you need to get a trustworthy p = 0.001 value, and test the three standouts from before, plus controls and a couple of other samples that "looked interesting." The eye can often pick out patterns that the original gross filter might miss. So it is worth putting those extra samples in.

Oh, and get this approved beforehand.

* People I advised on papers, sometimes I was a member of the team, sometimes just asked to review a paper or presentation, and other cases.
Aug 03, 2017
This comment has been removed by a moderator.
Da Schneib
5 / 5 (1) Aug 03, 2017
@dingbat, what's "negative causality?" Making up more imaginary but great-sounding terminology?

Sorry man, if we test it and we get very low probability, we got lots of other things to test and we move on. After whining that things with low probability get ignored, your comment is, to say the least, unseemly, incoherent, and inconsistent.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.