Computers learn to spot 'opinion spam' in online reviews

Jul 26, 2011 By Bill Steele

(PhysOrg.com) -- If you read online reviews before purchasing a product or service, you may not always be reading the truth. Review sites are becoming targets for "opinion spam" -- phony positive reviews created by sellers to help sell their products, or negative reviews meant to downgrade competitors.

The bad news: Human beings are lousy at identifying deceptive . The good news: Cornell researchers are developing that's pretty good at it. In a test on 800 reviews of Chicago hotels, a computer was able to pick out deceptive reviews with almost 90 percent accuracy. In the process, the researchers discovered an intriguing correspondence between the of deceptive reviews and fiction writing.

The work was reported at the 49th annual meeting of the Association for Computational Linguistics in Portland, Ore., June 24, by Claire Cardie, professor of ; Jeff Hancock, associate professor of communication; and graduate students Myle Ott and Yejin Choi.

"This is the first look at this, and there's a lot more to be done, but I think there is a potential that [review sites] could apply it," Ott said.

The researchers created what they believe to be the first "gold standard" collection of opinion spam by asking a group of people to deliberately write false positive reviews of 20 Chicago hotels. These were compared with an equal number of carefully verifed truthful reviews.

As a first step, the researchers submitted a set of reviews to three human judges -- volunteer Cornell undergraduates -- who scored no better than chance in identifying deception. The three did not even agree on which reviews they thought were deceptive, reinforcing the conclusion that they were doing no better than chance. Historically, Ott noted, humans suffer from a "truth ," assuming that what they are reading is true until they find evidence to the contrary. When people are trained at detecting deception they may become overly skeptical and report deception too often, still scoring at chance levels.

The researchers then applied computer analysis based on subtle features of text. Truthful hotel reviews, for example, are more likely to use concrete words relating to the hotel, like "bathroom," "check-in" or "price." Deceivers write more about things that set the scene, like "vacation," "business trip" or "my husband." Truth-tellers and deceivers also differ in the use of keywords referring to human behavior and personal life, and sometimes in features like the amount of punctuation or frequency of "large words." In parallel with previous analysis of imaginative vs. informative writing, deceivers use more verbs and truth-tellers use more nouns.

Using these approaches, the researchers trained a computer on a subset of true and false reviews, then tested it against the rest of the database. The best results, they found, came from combining keyword analysis with the ways certain words are combined in pairs. Adding these two scores identified deceptive reviews with 89.8 percent accuracy.

Ott cautions that the work so far is only validated for hotel reviews, and for that matter, only reviews of hotels in Chicago. The next step, he said, is to see if the techniques can be extended to other categories, starting perhaps with restaurants and eventually moving to consumer products. He also wants to look at negative reviews.

This sort of software might be used by review sites as a "first-round filter," Ott suggested. If, say, one particular hotel gets a lot of reviews that score as deceptive, the site should investigate further.

"I think cutting down on deception would help everyone," he said. "Customers would not be fooled, and it would help [sellers] and review sites because people would trust their reviews."

Explore further: Computer scientists can predict the price of Bitcoin

Related Stories

Yelp to show reviews it automatically filters

Apr 06, 2010

(AP) -- Yelp, seeking to combat allegations that the online reviews site manipulates its users' feedback on local businesses, will now let visitors see the items that had been automatically removed by software meant to catch ...

TripAdvisor warns of hotels posting fake reviews

Jul 16, 2009

(AP) -- The hotel review may sound too good - citing obscure details like the type of faucets - or perhaps one stands out as the only negative rating of an otherwise popular location.

PR firm staff wrote iTunes 'customer' reviews

Aug 28, 2010

US regulators have said a public relations firm has agreed to settle charges that it had employees pose as unbiased videogame buyers and post reviews at Apple's online iTunes store.

Literature review made easy with new software

Feb 09, 2011

To many, writing is difficult. Researchers lament that running experiment in labs is easy but reporting is hard. Hence, Teoh Sian Hoon created an Integrative Literature Review Software to assist academics and students to ...

Recommended for you

Tablets, cars drive AT&T wireless gains—not phones

7 hours ago

AT&T says it gained 2 million wireless subscribers in the latest quarter, but most were from non-phone services such as tablets and Internet-connected cars. The company is facing pricing pressure from smaller rivals T-Mobile ...

Twitter looks to weave into more mobile apps

7 hours ago

Twitter on Wednesday set out to weave itself into mobile applications with a free "Fabric" platform to help developers build better programs and make more money.

Blink, point, solve an equation: Introducing PhotoMath

8 hours ago

"Ma, can I go now? My phone did my homework." PhotoMath, from the software development company MicroBlink, will make the student's phone do math homework. Just point the camera towards the mathematical expression, ...

Google unveils app for managing Gmail inboxes

8 hours ago

Google is introducing an application designed to make it easier for its Gmail users to find and manage important information that can often become buried in their inboxes.

User comments : 3

Adjust slider to filter visible comments by rank

Display comments: newest first

Jimbaloid
5 / 5 (2) Jul 26, 2011
Tricky! 100% accuracy is not on the cards of course, so how would they deal with say a 90% accuracy rate for spotting bogus positive reviews but only a 60% accuracy for spotting bogus negative reviews? Unintended consequence could be to skew the collective representation for a product or service toward the negative. Has anyone even established the ratio of bogus positive to bogus negative reviews to start with? Perhaps all reviews should be shown but use the algorithm to give a confidence rating alongside? Would bogus reviewers then learn how to trick the system anyway?
paulthebassguy
2 / 5 (12) Jul 26, 2011
I don't think a technological system like this would be very effective. People are far superior at linguistic analysis than any algorithm so as soon as a system like this is implemented the reviewers would find a way around it.

A much better (and simpler) solution is to simply have a short, visible disclaimer that should invoke some scepticism in the reader.
Msean1941
not rated yet Jul 27, 2011
I was checking USB cables on Amazon where you can click on "see all my reviews". I noticed that three out of five reviewers that gave one star had no other reviews, so this would perhaps support the idea of competitors subverting the review process. Of course, one small test does not a theory make.