Researchers uncover patterns in how scientists lie about their data

November 17, 2015 by Bjorn Carey, Stanford University
Stanford communication scholars have devised an 'obfuscation index' that can help catch falsified scientific research before it is published. Andrey Popov/Shutterstock

Even the best poker players have "tells" that give away when they're bluffing with a weak hand. Scientists who commit fraud have similar, but even more subtle, tells, and a pair of Stanford researchers have cracked the writing patterns of scientists who attempt to pass along falsified data.

The work, published in the Journal of Language and Social Psychology, could eventually help scientists identify falsified research before it is published.

There is a fair amount of research dedicated to understanding the ways liars lie. Studies have shown that liars generally tend to express more negative emotion terms and use fewer first-person pronouns. Fraudulent financial reports typically display higher levels of linguistic obfuscation – phrasing that is meant to distract from or conceal the fake data – than accurate reports.

To see if similar patterns exist in scientific academia, Jeff Hancock, a professor of communication at Stanford, and graduate student David Markowitz searched the archives of PubMed, a database of life sciences journals, from 1973 to 2013 for retracted papers. They identified 253, primarily from biomedical journals, that were retracted for documented fraud and compared the writing in these to unretracted papers from the same journals and publication years, and covering the same topics.

They then rated the level of fraud of each paper using a customized "obfuscation index," which rated the degree to which the authors attempted to mask their false results. This was achieved through a summary score of causal terms, abstract language, jargon, positive emotion terms and a standardized ease of reading score.

"We believe the underlying idea behind obfuscation is to muddle the truth," said Markowitz, the lead author on the paper. "Scientists faking data know that they are committing a misconduct and do not want to get caught. Therefore, one strategy to evade this may be to obscure parts of the paper. We suggest that language can be one of many variables to differentiate between fraudulent and genuine science."

The results showed that fraudulent retracted papers scored significantly higher on the obfuscation index than papers retracted for other reasons. For example, fraudulent papers contained approximately 1.5 percent more jargon than unretracted papers.

"Fradulent papers had about 60 more jargon-like words per paper compared to unretracted papers," Markowitz said. "This is a non-trivial amount."

The researchers say that scientists might commit data fraud for a variety of reasons. Previous research points to a "publish or perish" mentality that may motivate researchers to manipulate their findings or fake studies altogether. But the change the researchers found in the writing, however, is directly related to the author's goals of covering up lies through the manipulation of language. For instance, a fraudulent author may use fewer positive emotion terms to curb praise for the data, for fear of triggering inquiry.

In the future, a computerized system based on this work might be able to flag a submitted paper so that editors could give it a more critical review before publication, depending on the journal's threshold for obfuscated language. But the authors warn that this approach isn't currently feasible given the false-positive rate.

"Science fraud is of increasing concern in academia, and automatic tools for identifying fraud might be useful," Hancock said. "But much more research is needed before considering this kind of approach. Obviously, there is a very high error rate that would need to be improved, but also science is based on trust, and introducing a 'fraud detection' tool into the publication process might undermine that trust."

Explore further: US scientists significantly more likely to publish fake research

More information: D. M. Markowitz et al. Linguistic Obfuscation in Fraudulent Science, Journal of Language and Social Psychology (2015). DOI: 10.1177/0261927X15614605

Related Stories

Scientists defrauded by hijacked journals

November 4, 2015

Scientific progress is being hindered by the emergence of a relatively new kind of fraud – the hijacked scientific journal, according to researchers from Iran and Poland. They describe the problem and its detrimental effects ...

Publisher retracts 64 articles for fake peer reviews

August 19, 2015

(Phys.org)—German based publishing company Springer has announced on its website that 64 articles published on ten of its journals are being retracted due to editorial staff finding evidence of fake email addresses for ...

Recommended for you

14 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

NIPSZX
4 / 5 (4) Nov 18, 2015
It seems like the first place to look for fraud is the funding. Unfortunate but true.
antialias_physorg
4.7 / 5 (3) Nov 18, 2015
Such analysis methods are certainly a step in the right direction, but:
The results showed that fraudulent retracted papers scored significantly higher on the obfuscation index than papers retracted for other reasons. For example, fraudulent papers contained approximately 1.5 percent more jargon than unretracted papers.

Is this enough to qualify as (statistically) significant? I would think the standard deviation of jargon content is rather large between papers.

Vietvet
3.5 / 5 (8) Nov 19, 2015
This research describes JVK in spades.
cantdrive85
2.3 / 5 (3) Nov 19, 2015
Astrophysicists beware...

Oh, and drug company "scientists".
antigoracle
2.3 / 5 (3) Nov 20, 2015
Just one glance at AGW climate "science" and their heads will explode.
julianpenrod
1 / 5 (4) Nov 21, 2015
There are other forms of deceit in "science", such as taking a phenomenon which is all but guaranteed and misinterpreting it to mean something else. The crooks commissioned by New Jersey to "prove" that cell phone use causes accidents who "defined" a cell phone as causing an accident if it is used within ten minutes of the accident. So you can receive a call at a restaurant, get in your car, be plowed into by the stoned son of a Legislator and the cell phone message would be "interpreted" as having caused the accident. In fact, the "study" mentioned in this article is committing fraud by suggesting that falsifying data is the only way "scientists" lie.
julianpenrod
1 / 5 (4) Nov 21, 2015
Consider "Professor" Gary Wells, the murderer's friend. He insists that looking for the picture of a perpetrator through a book of photos causes people to blend features together. He recommends having the eyewitness sit staring intently for ten minutes at one photograph after another. He claims his method "reduces the number of false identifications" and the "news" misinterpreted that to say it increases the number of correct identifications. Even if they're looking at the actual perpetrator, this can cause them to see details they didn't see initially and assume that is not the criminal. Also, this can tire witnesses out and they can simply give up and say no picture was of the criminal. And, face it, if there are no attempted identifications, there can be absolutely no false ones!
julianpenrod
1 / 5 (4) Nov 21, 2015
And "Professor" Aaron Clauset's claim that "power laws" underly everything from earthquakes to "terrorist" attacks. But that's all based on a fraud. To derive a "power law", you take the logarithm of data and show they fall on a straight line. But the logarithm functions literally "squishes" data into tiny regions. As a result, data points that would be all over the place can be made to appear to fit in a line. Neutrons, raindrops, dollar bills, shoes, sofas, clouds, battleships, stars decay by different means, yet the logarithm of their lifetimes to the logarithm of their mass will look like a straight line, "suggesting" a "power law" connection.
julianpenrod
1 / 5 (4) Nov 21, 2015
Margaret Beale Spencer had a fraudulent "experiment" in which she showed kids five identical cartoons of a child, differing only be the color of the skin and she "informed" that one child was happy and the other sad and she asked the kids to tell her which one was which. The kids, of course, looked for any clue and the only features differing were the colors. It's been independently established that children have a more maudlin feel around dark colors, likely related to fear of the dark. As a result, the chose th dark colored cartoon to be sad. Spencer then lied and "concluded" that this "proved' children in America are programmed to hate dark colored skin and people with that skin.
SuperThunder
4 / 5 (4) Nov 21, 2015
Now would be a great time to declare all the moon-howler psuedo-scientists to be real, genuine, bonafide scientists, and then declare them all frauds based on this article's criteria. Congratulations, you're now all washed up scientists! I'm just kidding, these frauds are way closer to being real scientists than the moon-howlers.

Is this enough to qualify as (statistically) significant? I would think the standard deviation of jargon content is rather large between papers.

That's a good point. I wonder how many papers they compared? It has to be over 2000, surely, hopefully way more than that.
They identified 253,

Oh. That's a margin of error of around +- 6%
Yeah, there's no way that's statistically valid.
Nik_2213
not rated yet Nov 21, 2015
Serious case of bad statistics... They selected a specific set, compared it to a non-random population and think the tiny margin's significant...

Would their report flag itself as suspect ? It should, given their liberties with those statistics...

As a phishing exploit to garner funding, well, yes, it could count as 'proof of concept' to hook a sponsor. And, yes, like colleges check for plagiarism, which keeps the 'usual suspects' honest...

Big flaw, IMHO, is it would not catch the really big frauds, who lack 'negative' opinions, who are enthusiastic, lucid and open, just not reproducible...
Uncle Ira
3.9 / 5 (7) Nov 21, 2015
@ jules-pinhead-Skippy. Why do you waste your racist postums here? They would be much more appreciated at the Bully-Tea-Skippy-Pulpit forums. Over there you won't have to worry about the bad karma votes like you get here. Choot, over there you don't have to worry about karma votes at all. They don't let you vote over there anymore.

Oh yeah, I almost forget. Don't tell them you know Ira-Skippy or Ira-Anything because if you do they will shut the door in your face and not let you in. To bug them I used to write nice things about Obama during the elections, in 08 and 12, and that really gets them hot, I tell you. I been banneded from there 12 or 11 times because they are not so nice like the peoples at physorg and if you don't agree with the Tea-Party-Line they give you the boot-a-roo.

Anyhoo. Please lay off the racist stuffs here for me, okayeei? I ask you nice this time.
gkam
1.8 / 5 (5) Nov 22, 2015
Ira, please take your silly game of goober-speak to a site more amenable to childish comments.

This is a technical site.
Uncle Ira
3.7 / 5 (6) Nov 22, 2015
Ira, please take your silly game of goober-speak to a site more amenable to childish comments.


If you don't like it Skippy, we got this thing called the "Don't-Make-Me-Look-At-This-Skippy's-Stuffs" option. Why you don't try him? Oh, you have. But you don't have the will power to follow through. So why you don't amend some of your own childish comments? Oh yeah, you never make any silly slogans or comments.

This is a technical site.


Really? I did not know that. I thought it was a place where glam-Skippy could post up Palin-Like slogans and brag about all the wonderful things he never did. My mistake.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.