Scholars take aim at false positives in research

science
Credit: Petr Kratochvil/Public Domain

A single change to a century-old statistical standard would dramatically improve the quality of research in many scientific fields, shrinking the number of so-called false positives, according to a commentary published Sept. 1 in Nature Human Behaviour.

The argument, co-authored by University of Chicago economist John List, represents the consensus of 72 scholars from institutions throughout the world and disciplines ranging from neurobiology to philosophy. Their recommendations could have a major effect on the publication of academic work and on .

"We advertise interventions as working because statistically we think they're working. But they're actually not working. This is becoming a crisis in the sciences," said List, the Kenneth C. Griffin Distinguished Service Professor in Economics.

List and his co-authors suggest that scientists need to reset a statistical benchmark known as the p-value because the standards of evidence for claiming new discoveries in many fields are simply too low. The approach is damaging to the credibility of scientific claims, they said.

A p-value standard was adopted beginning in the 1920s, when British statistician Ronald Fisher proposed a value below 0.05 as a threshold to determine the validity of research findings. If the p-value falls below that threshold—meaning the probability that a study's conclusions are due to random chance is below 5 percent—then the research is generally considered to be statistically significant.

But the p-value threshold has become a target of criticism in response to a perceived replication crisis in scientific communities. Science journals frequently use statistical significance—and p-values—as a test for selecting which papers to publish. List said the current p-value threshold of 0.05 is allowing many studies to be published and influence economic and political decisions even though the results may not be reproducible by other researchers.

"If Ronald Fisher would have known that close to a 100 years later we would be using the 0.05 standard religiously to make 'informed' policy decisions, I don't think he would have advanced it," List said.

More reproducible studies

To be sure that an initial discovery will work when put into practice, results should be replicable. Previous studies have shown that only 24 percent of psychology studies with a p-value of 0.05 could be confirmed by further experiments, suggesting that three out of four studies presented false positive results. Similarly, only 44 percent of economics papers with the same p-value were reproducible.

The authors calculated that lowering the p-value threshold to 0.005 would roughly double rates of replication in psychology and economics, and other fields would see similar outcomes. "Changing the p-value threshold is simple, aligns with the training undertaken by many researchers and might quickly achieve broad acceptance," the authors said.

List agrees. "You want to set up a world where you have more people trying to replicate, and you want society to reward those people," he said. "And you also want more results that go into policy to be true results, to be replicable. Under the 0.005 more of them would be."

To further encourage publication and replication of studies, the authors of the paper propose that new findings that currently would be called "significant" but don't meet the revised 0.005 p-value should be called "suggestive" instead.

List and his co-authors are careful to point out that a change to the p-value is not the only step to improve scientific research. "We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data…are preferable to p-values," they said.


Explore further

Experimental economics: Results you can trust

More information: Nature Human Behaviour (2017). www.nature.com/articles/s41562-017-0189-z
Journal information: Nature Human Behaviour

Citation: Scholars take aim at false positives in research (2017, September 4) retrieved 20 July 2019 from https://phys.org/news/2017-09-scholars-aim-false-positives.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
276 shares

Feedback to editors

User comments

Sep 04, 2017
Face facts, among other things, once a "scientific" imprimatur became all but a guaranteed selling point for every corporate scam and political swindle, "scientific" fraud became a major commodity. New "medicines", new "economic theories", new "educational systems". Getting a fraud to look good became a priority. They talk about the p value not guaranteeing a degree of reproducibility of an "experiment". No one ever checks the source data for "experiments"! They could all be made of whole cloth and so many wouldn't even suspect! If you fabricate all the data for an "experiment" so it satisfies even a .005 p value, it doesn't mean the result is reliable! And the "experiment" will solve the interests of the scammers, namely, keeping the fraud going long enough to make a pile off of it, then let the "marks" desperately switch to another lie, forgetting the swindler who lied to them before!

Sep 04, 2017
Good idea, I like it.

Julian, sure lots of published science is not reproducible by other researchers, as the article states, but the things you rant about in your posts on this site are not amongst them.

I did not know economics is so much worse than psychology in terms of reproducibility.
44% is pretty bad.

Sep 04, 2017
This comment has been removed by a moderator.

Sep 04, 2017
This comment has been removed by a moderator.

Sep 04, 2017
Myself, I call this the Aging Editor Problem.

A researcher conducts a study or experiment. The team supervisor had set them to find a specific conclusion. The mentor for the researcher expects specific results. The RIA's monitoring the research expect specific results.

The paper produced will go through a vetting process that often censors out unwanted data and unanswered questions.

After that, a series of olders & wisers have to sign off on the paper for release to publication. Then the editors have their turn at mauling it.

One hopes, at least at the basic level of instruction and research, that the involved parties are reasonably current in their knowledge of the field.

Each generation is left decades behind in their knowledge base. As they age, so does the instruction set that taught them what they know. The textbooks and manuals become more obsolete.

Time never ceases, it remorselessly grinds at all of us.

Sep 04, 2017
It's silly that research into psychology and economics has a lower standard of statistical evidence than research into physics. This disparity fuels anti-science rhetoric by religious extremists and denial by sociologists, psychologists, and economists merely exacerbates the situation. Numerous examples can be found in this very thread. If the Sokal Affair was not sufficient to prove it I don't know what it will take. At this point I consider sociologists, psychologists, and economists unscientific. I think 0.005 may be too high. Physicists adhere to a higher standard.

Sep 04, 2017
It's silly that research into psychology and economics has a lower standard of statistical evidence than research into physics.

Not really. You can't get the statistical power that you can get in physics because you are almost always dealing
- with a much smaller cohort size
- with multivariate studies (you can't take humans apart and reduce them to one property and then run an experiment on that like you can take matter apart into fundamental particles)

Sometimes you can't just increase the cohort size willy nilly. E.g. in drug trials you'd be exposing suddenly 100 times more people to an untested drug. Costs for new drugs would explode.

If you were to demand the same standard than in physics then there wouldn't be a single study in these fields - and most of these studies are valuable.

Upping the requirements a bit might be OK, though, Since it has gotten easier to make studies. But 0.005 seems a tad excessive to me. Maybe 0.01 or thereabouts is more realistic.

Sep 04, 2017
@anti, if they're using 0.05, then actually yes really. That's about 3 sigma, a level not considered conclusive evidence in physics.

I see no reason to allow statistical evidence that is not accepted in nuclear physics into economics or psychology. I think the same 5σ criterion should apply. And I think the Sokal Affair proves it. If economics and psychology wish to be dismissed from their title of "science" then they are free to do so, but it should be noted everywhere that they cannot maintain the standards of rigor that real science requires. Sorry if it costs too much. Get over it. Physics pays the price to get the real results.

Sep 05, 2017
The problem comes when you study living things. They are so complex that looking at just one variable, in one individual, under seemingly identical conditions can give wildly different results.
You are not as in control and cannot as easily isolate different variables as in physics & chemistry.
Life is a bit more messy.
But economics, no excuses.

Sep 05, 2017
@anti, if they're using 0.05, then actually yes really. That's about 3 sigma, a level not considered conclusive evidence in physics.

I know it's not enough for physics. But getting the 5 sigma level is not feasible in psychology or medicine.
5 sigma on a normal distribution would be a 1 in 3.5 million chance. If you have to do a multivariate study (which you *always* have in medicine, biology or psychology) then it's much worse
You can't do a second or third stage drug trial on 3.5 million (or more) people. That would be costly (and quite possibly criminal).
Sorry if it costs too much.

A current full drug trial costs on the order of 100 million dollars. One drug. That might not pan out. With 2-3k participants in the final phase. You're asking to up this by a factor of 1000. Who will pay the 100 billion?

standards of rigor that real science requires

I think you're comitting a 'real Scotsman' fallacy, here.

Sep 05, 2017
Life is a bit more messy.
But economics, no excuses.

Economics is also messy - for the very same reason. It depends a lot on the psychology of the people making investments. Not just the willingness-theresholds of different people but even the speed at which they can make decisions.

The idea that everyone in the market is a 'rational player' with identical psychological framework, full information and identical decision making speeds might be a convenient fantasy - but it is just that: a fantasy.

Sep 05, 2017
Ds, truly I am not trying to insult you.

However, your criticism of the biological sciences is truly unfair.

BS includes but is not limited to bull shit. All living and dead organics, whatever activity they engage in and everything that results from those activities is biology.

When you have perfected your personal life to '5 Sigma'? When you have achieved perfection in body and mind?

Then, I would have a higher opinion of your opinion. That the biological sciences are not 'living' up to your rigorous demands.

Sep 05, 2017
@anti, if they're using 0.05, then actually yes really. That's about 3 sigma, a level not considered conclusive evidence in physics.

I know it's not enough for physics. But getting the 5 sigma level is not feasible in psychology or medicine.
Then they are not sciences. Simple as that.

Ds, truly I am not trying to insult you.

However, your criticism of the biological sciences is truly unfair.
No, it's not. They accept standards of evidence that are not accepted in real science. Sorry, just observing without quantifying is not science. What this paper shows is that accepting lesser standards of evidence leads to flawed conclusions on a regular basis. If we want these areas to give real science then we have to tighten the criteria; nothing else will fix it.

Sep 05, 2017
We used to call the 'p-value' as 'alpha'. Either way, it represents a mere Yes/No about whether the manipulated variable had an effect on results. The traditional value .05 is too lax.
.
However, we also have the option of calculating 'theta', which is the strength of the variable, or the size of the variable's effect. Theta is almost never reported, yet it is more important than p-value.

Sep 06, 2017
Then they are not sciences.

I guess the world disagrees.

Science is that which generates knowledge. While a study that has a 1-in-20 chance of a false positive is not certain knowledge it's still useful knowledge.
Science isn't as black and white as you think. The 5 sigma level of physics is no less arbitrary than the 2 sigma level in medicine. It's better but it's still just a useful threshold that balances what measure of uncertainty we are willing to accept vs. feasibility of experiments. And experiments in physics are just way easier to control than those in medicine or economics.

accepting lesser standards of evidence leads to flawed conclusions on a regular basis

Which is exactly what alpha denotes. That's why the alpha value is given in each paper. The days of "this is the way it is"-papers is over. Some papers have higher certainties and some lower ones - and the certainty measure is stated therein quite plainly. Always.

Sep 08, 2017
i am guessing you're replying to DS... but i'm throwing in my 1 cent
Science isn't as black and white as you think
especially not in biology, medicine, psychology and such!

yet, as you point out, it still generates useful knowledge

great example: profiling of serial killers
a useful tool based upon sound factual scientific researched and validated information that is rigorously tested and proven in real life applications

... and yet you can't apply this same information to the society at large to predict who will become a serial killer

this reinforces your point above, which is
The idea that everyone in the market is a 'rational player' with identical psychological framework, full information and identical decision making speeds might be a convenient fantasy - but it is just that: a fantasy
you may mean this about economics, but this is also applicable to psychology

Sep 08, 2017
@anti, if they cannot generate statements that are reliable then they are not science. It's not about guesses unless one identifies them as such. If it's only 50% reliable, then it's a guess. Guesses are not science unless identified as such; representing 50% as reliable science is why we have all these idiot #sciencedeniers on here in the first place. They don't know the difference between physics and psychology and they're never going to work it out. It's time for what is called real science to sort that out for them, and if we can't agree on significance, they're going to perceive it all as "the same" because they don't know what significance means and never will.

Sep 09, 2017
At the risk of making DS spurt his morning coffee out his nostrils, I have to say I agree with him on this question of what is science and what is actually (what I would call) a 'useful art', 'skill/technique' etc which is based on experience and success-rate 'evidence' which gives 'confidence' in the application of said 'art'.

In no way should even a very high-success-rate 'art/technique' etc be construed/represented as a SCIENCE in itself (even if the 'practice' of that 'art/technique' may IN PART/LOOSELY use SOME of mathematical/procedural tools/methodologies that the 'hard sciences' do....but without ALL the rigorous principles of objectivity and repeatability due to the very 'subjective' nature of the subjects and data interpretations etc).

but I also agree with the 'usefulness' aspct irrespective of it not being a 'hard' science field/discipline. :)

Sep 09, 2017
Not to take too hard a line, it's fine to collect data, and that's science too. But drawing conclusions from insufficient data is not. It doesn't matter how hard it is, it doesn't matter how long it takes, it doesn't matter how much it costs. If you have the data, then draw some conclusions; if you don't, then don't. That's science. The thing is, though, papers that don't draw conclusions have trouble getting published. So there's always the temptation to draw conclusions that the data don't support.

By increasing the reliability of conclusions, requiring enough data to draw them with sufficient significance, the number of papers in the "soft" sciences will be greatly reduced, unless papers which merely present data and don't try to draw conclusions it is insufficient to support are allowed more access. Personally I don't think this is necessarily a Bad Thing.

Sep 09, 2017
@anti, if they're using 0.05, then actually yes really. That's about 3 sigma, a level not considered conclusive evidence in physics.

I know it's not enough for physics. But getting the 5 sigma level is not feasible in psychology or medicine.
5 sigma on a normal distribution would be a 1 in 3.5 million chance. If you have to do a multivariate study (which you *always* have in medicine, biology or psychology) then it's much worse


Also p-hacking can be easily avoided with not much effort. There's little inherently wrong with a 0.05 cutoff when the odds of outcomes falling into the 5% are small. P-hacking is an easy trap to fall into but if you design protocols to avoid it then you can. There are people who p-hack deliberately but in most cases it's unintentional. Awareness is the key. But, yes, you will have to publish more 'no results' papers and yes not such a bad thing.

Sep 11, 2017
if they cannot generate statements that are reliable then they are not science

Depends on what you define as reliable. Being right 19 out of 20 times sounds pretty reliable to me. Think of it tis way
"This drug can cure your cancer - but this statement may only be true 19 out of 20 times. Do you want to try it out or do you want no treatment?"
Same could be said about the Higgs boson - only that the the probabilities are different. It#s a quantitative differenc - not a qualitative.

In the end: Anything that is better than random is knowledge (which is also in line with how information is defined). That medical/psychological/economic results are never as good as those in physics doesn't mean they're not knowledge worth having.

Science is about getting what is usful. Not what is true. Capital T 'Truth' is (provably) not part of science.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more