A new kind of pub crawl

Aug 24, 2012 By Angela Herring
Engin Kirda, an associate professor of information assurance at Northeastern, developed new software for detecting and containing malicious web crawlers. Photo: Dreamstime.

Web­sites like Face­book, LinkedIn and other social-​​media net­works con­tain mas­sive amounts of valu­able public infor­ma­tion. Auto­mated web tools called web crawlers sift through these sites, pulling out infor­ma­tion on mil­lions of people in order to tailor search results and create tar­geted ads or other mar­ketable content.

But what hap­pens when "the bad guys" employ web crawlers? For Engin Kirda, Sy and Laurie Stern­berg Inter­dis­ci­pli­nary Asso­ciate Pro­fessor for Infor­ma­tion Assur­ance in the Col­lege of Com­puter and Infor­ma­tion Sci­ence and the Depart­ment of Elec­trical and Com­puter Engi­neering, they then become tools for spam­ming, phishing or tar­geted .

"You want to pro­tect the infor­ma­tion," Kirda said. "You want people to be able to use it, but you don't want people to be able to auto­mat­i­cally down­load con­tent and abuse it."

Kirda and his col­leagues at the Uni­ver­sity of California–Santa Bar­bara have devel­oped a new soft­ware call Pub­Crawl to solve this problem. Pub­Crawl both detects and con­tains mali­cious web crawlers without lim­iting normal browsing capac­i­ties. The team joined forces with one of the major social-​​networking sites to test Pub­Crawl, which is now being used in the field to pro­tect users' .

Kirda and his col­lab­o­ra­tors pre­sented a paper on their novel approach at the 21st USENIX Secu­rity Sym­po­sium in early August. The article will be pub­lished in the pro­ceed­ings of the con­fer­ence this fall.

In the cyber­se­cu­rity arms race, Kirda explained, mali­cious web crawlers have become increas­ingly sophis­ti­cated in response to stronger pro­tec­tion strate­gies. In par­tic­ular, they have become more coor­di­nated: Instead of uti­lizing a single com­puter or IP address to crawl the web for valu­able infor­ma­tion, efforts are dis­trib­uted across thou­sands of machines.

"That becomes a tougher problem to solve because it looks sim­ilar to benign user traffic," Kirda said. "It's not as straightforward."

Tra­di­tional pro­tec­tion mech­a­nisms, like a CAPTCHA, which oper­ates on an indi­vidual basis, are still useful, but their deploy­ment comes at a cost: Users may be annoyed if too many CAPTCHAs are shown. As an alter­na­tive, non­in­tru­sive approach, Pub­Crawl was specif­i­cally designed with dis­trib­uted crawling in mind. By iden­ti­fying IP addresses with sim­ilar behavior pat­terns, such as con­necting at sim­ilar inter­vals and fre­quen­cies, Pub­Crawl detects what it expects to be dis­trib­uted web-​​crawling activity.

Once a crawler is detected, the ques­tion is whether it is mali­cious or benign. "You don't want to block it com­pletely until you know for sure it is mali­cious," Kirda explained. "Instead, Pub­Crawl essen­tially keeps an eye on it."

Poten­tially mali­cious con­nec­tions can be rate-​​limited and a human oper­ator can take a closer look. If the oper­a­tors decide that the activity is mali­cious, IPs can also be blocked.

In order to eval­uate the approach, Kirda and his col­leagues used it to scan logs from a large-​​scale social net­work, which then pro­vided feed­back on its suc­cess. Then, the social net­work deployed it in real time, for a more robust eval­u­a­tion. Cur­rently, the social net­work is using the tool as a part of its pro­duc­tion system. Going for­ward, the team expects to iden­tify areas where the soft­ware could be evaded and make it even stronger.

Explore further: Russia's Putin calls the Internet a 'CIA project'

add to favorites email to friend print save as pdf

Related Stories

Chipping away at cancer

Jun 25, 2012

(Medical Xpress) -- In the last two decades, the number of deaths from col­orectal cancer has steadily declined, according to the Amer­ican Cancer Society. While some of the decrease can be attrib­uted ...

The risk of carrying a cup of coffee

Jun 15, 2012

Object manip­u­la­tion or tool use is almost a uniquely human trait, said Dagmar Sternad, director of Northeastern’s Action Lab, a research group inter­ested in move­ment coor­di­na­tion. ...

Recommended for you

Brazil enacts Internet 'Bill of Rights'

13 hours ago

Brazil's president signed into law on Wednesday a "Bill of Rights" for the digital age that aims to protect online privacy and promote the Internet as a public utility by barring telecommunications companies ...

Brazil passes trailblazing Internet privacy law

Apr 23, 2014

Brazil's Congress on Tuesday passed comprehensive legislation on Internet privacy in what some have likened to a web-user's bill of rights, after stunning revelations its own president was targeted by US ...

User comments : 0

More news stories