Scientists devise means to test for phony technical papers

Apr 24, 2006

Authors of bogus technical articles beware. A team of researchers at the Indiana University School of Informatics has designed a tool that distinguishes between real and fake papers. It's called the Inauthentic Paper Detector -- one of the first of its kind anywhere -- and it uses compression to determine whether technical texts are generated by man or machine.

"This is a potential problem since no existing systems, the Web for example, can or do discriminate between content that is meaningful or bogus," says assistant professor Mehmet Dalkilic, a data mining expert. "We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."

Joining Dalkilic on the IPD project are Assistant Professor Predrag Radivojac, informatics doctoral student James Costello, and Wyatt T. Clark, who will graduate in May with a bachelor's degree in informatics.

The IPD system is based on a combination of compression algorithms that reduce the amount of data to save space and speed transmission time.

To begin their study, the team identified two kinds of texts they would analyze. "Authentic text" (or document) is a collection of several hundreds or thousands of syntactically correct sentences that are wholly meaningful. "Inauthentic text" (or document) is a collection of several hundreds of thousands of syntactically correct sentences that, taken all together, have no meaning.

The researchers' work is documented in the very authentic paper, "Using Compression to Identify Classes of Inauthentic Texts," which they presented at the Society for Industrial and Applied Mathematics Conference on Data Mining in Bethesda, Md., this weekend.

The informatics study largely was inspired by a prank pulled by three Massachusetts Institute of Technology students, who in 2004 developed a computer program that churned out randomly generated fake computer science language, essentially a four-page compilation of gibberish. They submitted it as a research paper to an international conference on computer science and informatics – and it was accepted without review.

Radivojac, whose research expertise is machine learning, says the IPD easily detected numerous inauthentic technical papers tested, including the MIT students' spurious submission.

"We hypothesized we could build a reliable and fast model that recognizes fake papers automatically," says Radivojac. "We combined these with machine-learning methods to build a predictor of these kinds of papers."

In general, identifying meaning in a technical document is difficult, Dalkilic says. "We don't claim we have found a way to distinguish between meaning and nonsense, but we do emphasize that there are many nontrivial classes of inauthentic documents that can be easily distinguished based on compression algorithms."

Source: Indiana University School of Informatics

Explore further: Pandora posts in-line 1Q loss, upbeat sales

add to favorites email to friend print save as pdf

Related Stories

Recommended for you

Pandora posts in-line 1Q loss, upbeat sales

7 hours ago

(AP)—Internet radio company Pandora reported higher-than-expected revenue in the latest quarter, with losses in line with analysts' forecasts, as the number of subscribers who pay for ad-free listening rose above 2.5 million.

Google Drive sports new view and scan enhancements

7 hours ago

(Phys.org) —Google Drive has a new look and functions. The makeover in Google Drive features scanning and interface enhancements that put the user into "card" mode. The enhancements make it easy for the ...

Inventor creates Card Beams with 3D printer

8 hours ago

What are card beams, you may ask? They are the building toy that allows you to build gravity-defying houses of cards with the help of friction, gravity, and two types of beams - the cap and the connector.

Solar Kettle allows for boiling water off the grid

9 hours ago

(Phys.org) —A company called Contemporary Energy has unveiled a new device it calls the Solar Kettle. It looks very much like a normal coffee thermos, but has flaps on one side that open to allow for collecting ...

Review: Google music plan solid, serendipitous

11 hours ago

Google's new music service offers a lot of eye candy to go with the tunes. The song selection of around 18 million tracks is comparable to popular services such as Spotify and Rhapsody, and a myriad of playlists ...

User comments : 0

More news stories

Google Drive sports new view and scan enhancements

(Phys.org) —Google Drive has a new look and functions. The makeover in Google Drive features scanning and interface enhancements that put the user into "card" mode. The enhancements make it easy for the ...

Solar Kettle allows for boiling water off the grid

(Phys.org) —A company called Contemporary Energy has unveiled a new device it calls the Solar Kettle. It looks very much like a normal coffee thermos, but has flaps on one side that open to allow for collecting ...

Review: Google music plan solid, serendipitous

Google's new music service offers a lot of eye candy to go with the tunes. The song selection of around 18 million tracks is comparable to popular services such as Spotify and Rhapsody, and a myriad of playlists ...

Controlling mood through the motions of mitochondria

(Medical Xpress)—Regulating the distribution of power in neurons is done by a system that makes the national electric grid look simple by comparison. Each neuron has several thousand mitochondria confined ...