Stanford researchers outsmart captcha codes

Nov 03, 2011 by Nancy Owano report
Real schemes learnability: Accuracy of Decaptcha using KNN vs the size of the training set. Logarithmic scale. Image: Elie Bursztein, Stanford University

(PhysOrg.com) -- Stanford researchers say that captcha security codes, asking Internet sign-up users to repeat a string of letters to prove the users are human, can be thwarted, and they have successfully defeated captcha at big name sites such as Visa, CNN, and eBay as proof. In fact, they found that thirteen out of 15 high-profile sites were vulnerable to automated attacks.

Captcha stands for Completely Automated Public Turing Test to tell Computers and Humans Apart. This is a test that a Carnegie Mellon University computer science graduate student and his advisor created in 2000 as a security to safeguard web sites from automated bot attacks and spammers.

Simply put, the test was supposed to be passable by humans, not machines. The Stanford team, however, found that its own anti-spam tool-breaker was able to kill off captcha’s protective cover.

The researchers Elie Bursztein, a postdoctoral researcher at the Stanford Security Laboratory, Matthieu Martin, and John C. Mitchell were able to crack the codes. In their study, they note that site owners should be taking a closer look at their captchas:

“As we substantiate by thorough study, many popular websites still rely on schemes that are vulnerable to automated attacks. For example, our automated Decaptcha tool breaks the Wikipedia scheme... approximately 25% of the time. 13 out of 15 of the most widely used current schemes are similarly vulnerable to automated attack by our tool. Therefore, there is a clear need for a comprehensive set of design and testing principles that will lead to more robust captchas.”

The Stanford automated tool, Decaptcha, involved removal of image background noise and breaking text strings into single characters for easier recognition. This tool was run in selected websites. Visa's Authorize.net payment gateway was defeated 66 per cent of the time. eBay's captcha was sidestepped 43 per cent of the time. Lower thwart rates were recorded at Wikipedia, Digg and .

Google and reCAPTCHA were the only two that beat out the Stanford team’s automated tool--no gotchas for either one.

Interestingly, reCAPTCHA also has its roots at Carnegie Mellon, and it was developed as a step up from captcha. The reCAPTCHA project sought further protective distortions with random warping and lines for something that would be readable by humans but more complex.In 2009, Google acquired reCAPTCHA.

As for other sites using captcha, the three researchers in their paper suggest various ways that can be harder to outsmart.The Stanford team presented results of their research last month at the CCS 2011 (the ACM Conference on Computer and Communication Security) in Chicago.

What’s more, Visa’s Authorize.net and Digg have switched to reCAPTCHA since these tests were performed.

Explore further: Twitter rules out Turkey office amid tax row

More information: Report: cdn.ly.tl/publications/text-based-captcha-strengths-and-weaknesses.pdf


add to favorites email to friend print save as pdf

Related Stories

Stanford computer scientists find Internet security flaw

May 24, 2011

(PhysOrg.com) -- Researchers at the Stanford Security Laboratory create a computer program to defeat audio captchas on website account registration forms, revealing a design flaw that leaves them vulnerable ...

Google acquires Web security firm reCAPTCHA

Sep 16, 2009

(PhysOrg.com) -- Google announced on Wednesday that it has acquired reCAPTCHA, a company that produces the squiggly words used by websites to guard against spam and fraud.

Strong protection for weak passwords

Apr 19, 2011

(PhysOrg.com) -- The combination of simple codes and Captchas, which are even more encrypted using a chaotic process, produces effective password protection.

Touch typists could help stop spammers in their tracks

Jul 15, 2009

(PhysOrg.com) -- Computer scientists at Newcastle University are about to give office workers a perfect excuse to play games: it's all in the name of research. Dr Jeff Yan, together with his PhD student Su-Yang ...

Recommended for you

Twitter rules out Turkey office amid tax row

2 hours ago

Social networking company Twitter on Wednesday rejected demands from the Turkish government to open an office there, following accusations of tax evasion and a two-week ban on the service.

How does false information spread online?

5 hours ago

Last summer the World Economic Forum (WEF) invited its 1,500 council members to identify top trends facing the world, including what should be done about them. The WEF consists of 80 councils covering a wide range of issues including social media. Members come ...

User comments : 18

Adjust slider to filter visible comments by rank

Display comments: newest first

Aliensarethere
5 / 5 (5) Nov 03, 2011
I guess it will not be long before this security falls. At a point in time, so complex captchas are needed, that humans can't read them.
El_Nose
5 / 5 (3) Nov 03, 2011
i know right -- i have a hard time deciphering some capthas
glenn_o
5 / 5 (1) Nov 03, 2011
Not long now until somebody makes a Firefox extension to solve Captchas automatically.
pm_cady
not rated yet Nov 03, 2011
Spambots have been defeating the Captcha on my forum board for months now, this is not news. These guys might want to get out of the lab once in awhile and watch how the pros do it.
Temple
not rated yet Nov 03, 2011
pm_cady:
Spambots have been defeating the Captcha on my forum board for months now, this is not news. These guys might want to get out of the lab once in awhile and watch how the pros do it.


Exactly, there's far more money at stake in developing bots that can beat these tests than was at the disposal of the researchers at the university.

This result does have value as an indication of how captcha codes can be beaten with relatively little effort. This strongly implying that those who can gain money from it will be happy to devote even greater resources to it.

Plus, there's a nearly foolproof method of beating these tests. One can capture the image (or whatever form the test is manifested as) and present it to a real human for deciphering. There are sites that offer free content (porn, etc) in return for solving a page full of captchas. With enough traffic, these systems (which do indeed exist) can bypass thousands of captchas per hour.

Unfortunately, Spam is here to stay.
Isaacsname
not rated yet Nov 03, 2011
Why don't captchas use pictures or images with a moving static overlay, instead of stationary letters ?
Nerdyguy
3 / 5 (2) Nov 03, 2011
Not long now until somebody makes a Firefox extension to solve Captchas automatically.


As a Firefox user, I would applaud this. I'm all for security, and use a lot of tools myself. But, captchas are in that category of tools that are so annoying they likely inhibit the goals of the sites that use them - namely, useage of the site.
that_guy
5 / 5 (2) Nov 03, 2011
Considering that sometimes you need to do two or three captchas because they are so indecipherable, I wouldn't be surprised if this algorithm actually does just as good as humans in many cases.

Also, any captcha that can be deciphered by a human can be broken - There are plenty of dark sites on the internet that ask you to "do a captcha" to see or do something on it - when really the captcha is from another site that someone is trying to spambot. They just side-step the issue and have a person do it for them. You're not going to let a capcha get in between you and your Jessica Alba picture are you?
SmaryJerry
5 / 5 (1) Nov 03, 2011
I get so frustrated at those things. They never say I match even when I enter the letters exactly. This happens a ton on Google, which may exlain what it was more dificult to break, because it rejects even valid answers, until like the 5th try through.
that_guy
5 / 5 (1) Nov 03, 2011
Why don't captchas use pictures or images with a moving static overlay, instead of stationary letters ?

Because moving pixels are ever so easy for an algorithm to catch and remove or vice versa. Or any combination of that you would want to do.

Quite simply, any captcha that had any kind of pixel transformation that applied differently to the masking part and the characters would be the simplest type for a program to separate and break.

That's why captchas are generally static and black and white.
Deesky
5 / 5 (1) Nov 03, 2011
A busy photographic background image, such as a forest, coral reef, etc, would be harder to filter out. Then you could have a mathematical captcha which not only relies on character recognition, but also the need to perform a simple calculation.

That way, the hacking software would need to perform feature extraction, character recognition and a math solver.

Mind you, it would probably also tick off even more human users!
abhishekbt
not rated yet Nov 03, 2011
@that_guy: How about one background image with a list of characters and one overlaying it with moving pixels in such a way that only certain parts of the background are revealed. The moving about might be controlled by another algo known only to the site.

This way, no computer can tell the captcha unless they manage to capture all the parts. Just capturing the moving pixels alone wouldn't help.

Excuse my programmer like thinking. I am one!
antonima
not rated yet Nov 03, 2011
thank you stanford for making the world a better place..
blazingspark
not rated yet Nov 04, 2011
I'm looking forward to the day when computers can solve a captcha as effectively as a human.

That will mean that AI is just around the corner at that point!
warra_warra
not rated yet Nov 04, 2011
A better way to do this kind of thing might be to require comprehension of a piece of text. Display a random sentence and then ask a simple question about it (which answer is one or two words long). Simple image recgnition has become too easy to automate.
JadedIdealist
not rated yet Nov 04, 2011
@Isaacsname has a very good idea - that will work until computer vision has truly matured, ie show a picture and get the user to identify all the objects in the picture.

Eventually we need to have problems that take strong AI to solve - and once they get broken - well, Mission Fucking Accomplished http://xkcd.com/810/
Isaacsname
not rated yet Nov 06, 2011
@Isaacsname has a very good idea - that will work until computer vision has truly matured, ie show a picture and get the user to identify all the objects in the picture.

Eventually we need to have problems that take strong AI to solve - and once they get broken - well, Mission Fucking Accomplished http://xkcd.com/810/


I thought since stochastic resonance actually allows better visual processing of detail in humans, ie " visual static, or visual snow ", why couldn't it be used as an overlay for captcha images ?

I think it would make it extremely difficult for current AI to get around.

http://www.youtub...duEEoCaA
that_guy
not rated yet Nov 07, 2011
@that_guy: How about one background image with a list of characters and one overlaying it with moving pixels in such a way that only certain parts of the background are revealed.


I really think that you're looking at it the wrong way. Consider if the background is moving and the image is not - an algorithm can easily pick up that certain pixels change, while others do not. That is the crux of the problem. If there is anything that happens where some pixels are changed but others are changed in a different way, or some are color, and others are not, then it is really easy to pick those out and isolate or remove them.

To an algorithm, you have to have the relavent pixels and the masking pixels seem exactly the same. The same color scheme, the same movement. If any movement or color applies differently to the captcha and the background, then it can be picked out.

But if it applies exactly the same to all areas, then it doesn't really help people.

More news stories

Quantenna promises 10-gigabit Wi-Fi by next year

(Phys.org) —Quantenna Communications has announced that it has plans for releasing a chipset that will be capable of delivering 10Gbps WiFi to/from routers, bridges and computers by sometime next year. ...

Unlocking secrets of new solar material

(Phys.org) —A new solar material that has the same crystal structure as a mineral first found in the Ural Mountains in 1839 is shooting up the efficiency charts faster than almost anything researchers have ...

Floating nuclear plants could ride out tsunamis

When an earthquake and tsunami struck the Fukushima Daiichi nuclear plant complex in 2011, neither the quake nor the inundation caused the ensuing contamination. Rather, it was the aftereffects—specifically, ...

New US-Spanish firm says targets rich mobile ad market

Spanish telecoms firm Telefonica and US investment giant Blackstone launched a mobile telephone advertising venture on Wednesday, challenging internet giants such as Google and Facebook in a multi-billion-dollar ...