A new kind of pub crawl

August 24, 2012 By Angela Herring
Engin Kirda, an associate professor of information assurance at Northeastern, developed new software for detecting and containing malicious web crawlers. Photo: Dreamstime.

Web­sites like Face­book, LinkedIn and other social-​​media net­works con­tain mas­sive amounts of valu­able public infor­ma­tion. Auto­mated web tools called web crawlers sift through these sites, pulling out infor­ma­tion on mil­lions of people in order to tailor search results and create tar­geted ads or other mar­ketable content.

But what hap­pens when "the bad guys" employ web crawlers? For Engin Kirda, Sy and Laurie Stern­berg Inter­dis­ci­pli­nary Asso­ciate Pro­fessor for Infor­ma­tion Assur­ance in the Col­lege of Com­puter and Infor­ma­tion Sci­ence and the Depart­ment of Elec­trical and Com­puter Engi­neering, they then become tools for spam­ming, phishing or tar­geted .

"You want to pro­tect the infor­ma­tion," Kirda said. "You want people to be able to use it, but you don't want people to be able to auto­mat­i­cally down­load con­tent and abuse it."

Kirda and his col­leagues at the Uni­ver­sity of California–Santa Bar­bara have devel­oped a new soft­ware call Pub­Crawl to solve this problem. Pub­Crawl both detects and con­tains mali­cious web crawlers without lim­iting normal browsing capac­i­ties. The team joined forces with one of the major social-​​networking sites to test Pub­Crawl, which is now being used in the field to pro­tect users' .

Kirda and his col­lab­o­ra­tors pre­sented a paper on their novel approach at the 21st USENIX Secu­rity Sym­po­sium in early August. The article will be pub­lished in the pro­ceed­ings of the con­fer­ence this fall.

In the cyber­se­cu­rity arms race, Kirda explained, mali­cious web crawlers have become increas­ingly sophis­ti­cated in response to stronger pro­tec­tion strate­gies. In par­tic­ular, they have become more coor­di­nated: Instead of uti­lizing a single com­puter or IP address to crawl the web for valu­able infor­ma­tion, efforts are dis­trib­uted across thou­sands of machines.

"That becomes a tougher problem to solve because it looks sim­ilar to benign user traffic," Kirda said. "It's not as straightforward."

Tra­di­tional pro­tec­tion mech­a­nisms, like a CAPTCHA, which oper­ates on an indi­vidual basis, are still useful, but their deploy­ment comes at a cost: Users may be annoyed if too many CAPTCHAs are shown. As an alter­na­tive, non­in­tru­sive approach, Pub­Crawl was specif­i­cally designed with dis­trib­uted crawling in mind. By iden­ti­fying IP addresses with sim­ilar behavior pat­terns, such as con­necting at sim­ilar inter­vals and fre­quen­cies, Pub­Crawl detects what it expects to be dis­trib­uted web-​​crawling activity.

Once a crawler is detected, the ques­tion is whether it is mali­cious or benign. "You don't want to block it com­pletely until you know for sure it is mali­cious," Kirda explained. "Instead, Pub­Crawl essen­tially keeps an eye on it."

Poten­tially mali­cious con­nec­tions can be rate-​​limited and a human oper­ator can take a closer look. If the oper­a­tors decide that the activity is mali­cious, IPs can also be blocked.

In order to eval­uate the approach, Kirda and his col­leagues used it to scan logs from a large-​​scale social net­work, which then pro­vided feed­back on its suc­cess. Then, the social net­work deployed it in real time, for a more robust eval­u­a­tion. Cur­rently, the social net­work is using the tool as a part of its pro­duc­tion system. Going for­ward, the team expects to iden­tify areas where the soft­ware could be evaded and make it even stronger.

Explore further: 3Qs: Analyzing the cybersecurity threat posed by hackers

Related Stories

Recommended for you

Microsoft describes hard-to-mimic authentication gesture

August 1, 2015

Photos. Messages. Bank account codes. And so much more—sit on a person's mobile device, and the question is, how to secure them without having to depend on lengthy password codes of letters and numbers. Vendors promoting ...

Power grid forecasting tool reduces costly errors

July 30, 2015

Accurately forecasting future electricity needs is tricky, with sudden weather changes and other variables impacting projections minute by minute. Errors can have grave repercussions, from blackouts to high market costs. ...

Netherlands bank customers can get vocal on payments

August 1, 2015

Are some people fed up with remembering and using passwords and PINs to make it though the day? Those who have had enough would prefer to do without them. For mobile tasks that involve banking, though, it is obvious that ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.