Scientists 'bad at judging peers' published work,' says new study

October 8, 2013, Public Library of Science

Are scientists any good at judging the importance of the scientific work of others? According to a study published 8 October in the open access journal PLOS Biology (with an accompanying editorial), scientists are unreliable judges of the importance of fellow researchers' published papers.

The article's lead author, Professor Adam Eyre-Walker of the University of Sussex, says: "Scientists are probably the best judges of science, but they are pretty bad at it."

Prof. Eyre-Walker and Dr Nina Stoletzki studied three methods of assessing published scientific papers, using two sets of peer-reviewed articles. The three assessment methods the researchers looked at were:

  • Peer review: subjective post-publication peer review where other scientists give their opinion of a published work;
  • Number of citations: the number of times a paper is referenced as a recognised source of information in another publication;
  • Impact factor: a measure of a journal's importance, determined by the average number of times papers in a journal are cited by other scientific papers.

The findings, say the authors, show that scientists are unreliable judges of the importance of a scientific publication: they rarely agree on the importance of a particular paper and are strongly influenced by where the paper is published, over-rating science published in high-profile scientific journals. Furthermore, the authors show that the number of times a paper is subsequently referred to by other scientists bears little relation to the underlying merit of the science.

As Eyre-Walker puts it: "The three measures of scientific merit considered here are poor; in particular subjective assessments are an error-prone, biased and expensive method by which to assess merit. While the may be the most satisfactory of the methods considered, since it is a form of prepublication review, it is likely to be a poor measure of merit, since it depends on subjective assessment."

The authors argue that the study's findings could have major implications for any future assessment of scientific output, such as currently being carried out for the UK Government's forthcoming Research Excellence Framework (REF). Eyre-Walker adds: "The quality of the assessments generated during the REF is likely to be very poor, and calls into question whether the REF in its current format is a suitable method to assess scientific output."

PLOS Biology is also publishing an accompanying Editorial by Dr Jonathan Eisen of the University of California, Davis, and Drs Catriona MacCallum and Cameron Neylon from the Advocacy department of the open access organization the Public Library of Science (PLOS).

These authors welcome Eyre-Walker and Stoletski's study as being "among the first to provide a quantitative assessment of the reliability of evaluating research", and encourage scientists and other to read it. They also support their call for openness in research assessment processes. However, they caution that assessment of merit is intrinsically a complex and subjective process, with "merit" itself meaning different things to different people, and point out that Eyre-Walker and Stoletski's study "purposely avoids defining what merit is".

Dr Eisen and co-authors also tackle the suggestion that the impact factor is the "least bad" form of assessment, recommending the use of multiple metrics that appraise the article rather than the journal ("a suite of article level metrics"), an approach that PLOS has been pioneering. Such metrics might include "number of views, researcher bookmarking, social media discussions, mentions in the popular press, or the actual outcomes of the work (e.g. for practice and policy)."

Explore further: Flawed sting operation singles out open access journals

More information: Eyre-Walker A, Stoletzki N (2013) The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations. PLoS Biol 11(10): e1001675. DOI: 10.1371/journal.pbio.1001675

Related Stories

Flawed sting operation singles out open access journals

October 4, 2013

In a sting operation, John Bohannon, a correspondent of Science, claims to have exposed dodgy open access journals. His argument seems to be that, because of their business model, some journals are biased towards accepting ...

Recommended for you

Unprecedented study of Picasso's bronzes uncovers new details

February 17, 2018

Musee national Picasso-Paris and the Northwestern University/Art Institute of Chicago Center for Scientific Studies in the Arts (NU-ACCESS) have completed the first major material survey and study of the Musee national Picasso-Paris' ...

Humans will actually react pretty well to news of alien life

February 16, 2018

As humans reach out technologically to see if there are other life forms in the universe, one important question needs to be answered: When we make contact, how are we going to handle it? Will we feel threatened and react ...

Using Twitter to discover how language changes

February 16, 2018

Scientists at Royal Holloway, University of London, have studied more than 200 million Twitter messages to try and unravel the mystery of how language evolves and spreads.


Adjust slider to filter visible comments by rank

Display comments: newest first

1.6 / 5 (7) Oct 08, 2013
To scientists assessments upon research papers we can add nearly all kinds of assessment such as interviews, examination marking and blinded evaluation of art--where researched people's confident judgments about merit simply lack statistical robustness. Remarkably, each of our lives is determined by evaluations whether in getting qualifications, jobs or a promotions that are often little better than tossing a coin--in spite of the expense, seriousness and confidence with which they are made.

The papers are freely accessible.
1.8 / 5 (10) Oct 08, 2013
How many researchers, experimenters, and genuine scientific innovators have had their work hampered by bias from their peers? Once one looks behind the scenes of how a judgement is arrived at, it becomes evident that it's tainted with the jealous incredulity of the stodgy committee, grumbling, "If I didn't discover it, it must be impossible." For the sake of keeping science honest, I suggest a peer review of the peer reviewers .
1.6 / 5 (5) Oct 09, 2013
The obvious question: How carefully and fairly was this study reviewed and rated?
3.7 / 5 (3) Oct 09, 2013
Judging the merit of a scientific paper is hard. Remember that a paper contains material that has never been published before. So even other scientists aren't experts on THAT particular subject (at that point there is only one expert - the author) but merely knowledgeable in the general field. Peer review is simply about whether the paper is correct and (sometimes) how big the advance proposed within it is - not about the merit of a paper.
So a publication record (which indirectly leads to a referencing record) is not a good indicator of merit.

There are 'fashions' in science, too. And a paper in a hotly researched subject (e.g. buckyballs a few decades back or graphene now) will rack up references easily, even if its contribution is small.
If only a dozen people are working on the subject you're working on you won't get referenced much. No matter how great the paper.
1 / 5 (8) Oct 09, 2013
The consequences of this problem have been known to critical thinkers in science for a number of decades now. Even to this day, popular advocates of mainstream science like Phil Plait suggest that to judge science, we should mostly look at who made it and where it comes from. It's peculiar to watch for those of us who are more concerned with things like concepts, principles, models, arguments, anomalies, philosophy, etc. The "mainstream" is oftentimes a bunch of people looking around at what the "trustworthy" people are saying -- without regard for creating channels where new innovations that dramatically diverge from convention can routinely emerge. It's actually all incredibly transparent, but only once a person goes out of their way to locate and contemplate critiques which are directed at mainstream science. Up until that point, mainstream science seems rather invincible.

I do believe that there is a solution to this problem, btw, and it is NOT scientometrics.
1 / 5 (8) Oct 09, 2013
Re: "Dr Eisen and co-authors also tackle the suggestion that the impact factor is the "least bad" form of assessment, recommending the use of multiple metrics that appraise the article rather than the journal ("a suite of article level metrics"), an approach that PLOS has been pioneering. Such metrics might include "number of views, researcher bookmarking, social media discussions, mentions in the popular press, or the actual outcomes of the work (e.g. for practice and policy).""

But, how does this address the underlying problem of establishing & protecting critical and creative thinking within the sciences? I don't think these suggestions do that.

Here's a hint ... How about re-introducing philosophy of science back into our dialogues about science as a metric? What about visualizing concepts and models so that people can better see them? How about switching from the scientific paper to the concept, proposition & model as the fundamental units of discourse?
3 / 5 (6) Oct 09, 2013
How about re-introducing philosophy of science back into our dialogues about science as a metric?

What? After science has matured to the point of almost eliminating philosophical gobbledygook ya want to reintroduce it?

What about visualizing concepts and models so that people can better see them?

Sure I vote for the AWT (the aether wave theory) and the DAMN (the dense aether model nonsense.) That would help people to better see them.

How about switching from the scientific paper to the concept, proposition & model as the fundamental units of discourse?

That's what scientific papers are,,,, the concept, proposition & model. They are the fundamental units of discourse. They are not, and are not intended to be "textbooks". (But if ya write papers of philosophical gobbledygook, ya won't get a lot of discourse, working scientists have better ways to use their time.)
3.2 / 5 (5) Oct 10, 2013
the more it's probable, they do know its author in person and/or even they feel a competition or even jealousy for his work due to their own research. Which just means in essence, that above certain level of expertise we cannot have a qualified and unbiased peer-review at the same mome

Which is why peer review is anonymous. You don't know who the paper is from when you review it. It is stripped of all identifying information before it is handed to you.

Why don't you look into a subject you spout opinions about before spouting them? You might save yourself a lot of "made myself look like a fool yet again"-moments.

And you don't reference a paper because it is written by author X. You reference it because there is something in it that you need to reference (because you don't have space to rehash it in your own paper or didn't do that particular work).
That you first search for papers from well known authors when you need to reference some basics is only sensible.
3.5 / 5 (6) Oct 10, 2013
the more the opponents feels an experts in physics, the more they're dismissive regarding the dense aether model.

No. Papers get dismissed all the time (depending on the conference up to 60% of papers get rejected outright or relegated to the poster session)

The dismissal is not due to 'author seniority'

Example: when I was working in a group for biomechanics, haptics, stereolithography and surgical simulation our group sent in 8 papers to a conference. 7 passed peer review (eventually) and were accepted. The only one not accepted was the one by the head of the group.
He is well known in the field, but he didn't have time to put much effort into his paper and so just plumped for an overview/metastudy paper. The paper was just not good enough while we 'unknowns' (all PhD students) had spent much more time and effort on ours.

Sometimes papers (like the dense aether model) are just bunk. And bunk gets rejected because it's bunk - not because the author has 'opponents'.
2.8 / 5 (4) Oct 10, 2013
In theory (which you apparently love very much)

In practice. Since I have been a reviewer.

In reality it can be detected from many indicia (until the referee are really experts in their area).

You can infer which group a work came from (usually because they cite their own previous work). But that's by no means a certain thing. Even if you do that you can't tell which researcher it came from in particular. So you don't know whether it's a senior researcher or not.

And honestly: it doesn't matter. As a reviewer you don't care. YOUR reputation as a reviewer is on the line (as papers go out to MULTIPLE reviewers)...and if one trashes the paper for unfounded reasons while the others find no problems with it then you'll not be asked to review again. There is every incentive to do a honest review.

What the "just a bunk" criterion is supposed to mean?

Errors in the science (method, false/not merited conclusions, etc. ) - AWT being a prime example.
2.8 / 5 (4) Oct 10, 2013
One may be wondering, how is it possible, that the community of such honest people managed to boycott the cold fusion research http://www.scienc...22fb.htm and to promote the fringe ideas (like the stringy theories) for whole half of century....;-)

Cold fusion was plagued by false presentations, acrimonious retractions, and a failure to reproduce claimed results.No conspiracy Zeph, failure. Post your links (which I know ya will) and it will prove my point, they seem to getting through to ya, so they are not being suppressed.

You can be only correct or wrong with your results - nothing like "method" or "merited" has a meaning for predicate logics.

Ya don't seem to know what the journals are for Zeph. They are not, and never have been meant to be arbitrators of "correct" or "wrong" results. They are only for reporting of methods, observations, and general conclusions of the AUTHORS, not repositories of last words or truths.
3.7 / 5 (3) Oct 10, 2013
In similar way, you're repeating the AWT is wrong, but did you ever tried to replicate my logics for to demonstrate, it leads into wrong conclusions?

Zeph, I don't mean this in a mean way, but NO ONE could replicate your "logics for to demonstrate". It can not be be done. The AWT is too fluid, too shifting, to shapeless to replicate, chaos theory precludes the AWT being replicated. Entropy precludes it.

It's very easy to distinguish such an ignorance from normal skepticism just by lack of replications.

It's not my fault that NO ONE can replicate your AWT. I will admit that I can't. But if NO ONE else can either, it puts me in a large group that I have the utmost esteem & respect for.

The replications, replications, replications - the omnipresent pain of mainstream science.

No it's not "the pain of the mainstream science." It's only a pain for the people who come up with stuff that CAN'T be replicated.

A bunch of parasitic hypocrites.

1 / 5 (3) Oct 11, 2013
They need to add another bullet point.
i.e. Amount of funding available for the particular area of research.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.