A new kind of pub crawl

Aug 24, 2012 By Angela Herring
Engin Kirda, an associate professor of information assurance at Northeastern, developed new software for detecting and containing malicious web crawlers. Photo: Dreamstime.

Web­sites like Face­book, LinkedIn and other social-​​media net­works con­tain mas­sive amounts of valu­able public infor­ma­tion. Auto­mated web tools called web crawlers sift through these sites, pulling out infor­ma­tion on mil­lions of people in order to tailor search results and create tar­geted ads or other mar­ketable content.

But what hap­pens when "the bad guys" employ web crawlers? For Engin Kirda, Sy and Laurie Stern­berg Inter­dis­ci­pli­nary Asso­ciate Pro­fessor for Infor­ma­tion Assur­ance in the Col­lege of Com­puter and Infor­ma­tion Sci­ence and the Depart­ment of Elec­trical and Com­puter Engi­neering, they then become tools for spam­ming, phishing or tar­geted .

"You want to pro­tect the infor­ma­tion," Kirda said. "You want people to be able to use it, but you don't want people to be able to auto­mat­i­cally down­load con­tent and abuse it."

Kirda and his col­leagues at the Uni­ver­sity of California–Santa Bar­bara have devel­oped a new soft­ware call Pub­Crawl to solve this problem. Pub­Crawl both detects and con­tains mali­cious web crawlers without lim­iting normal browsing capac­i­ties. The team joined forces with one of the major social-​​networking sites to test Pub­Crawl, which is now being used in the field to pro­tect users' .

Kirda and his col­lab­o­ra­tors pre­sented a paper on their novel approach at the 21st USENIX Secu­rity Sym­po­sium in early August. The article will be pub­lished in the pro­ceed­ings of the con­fer­ence this fall.

In the cyber­se­cu­rity arms race, Kirda explained, mali­cious web crawlers have become increas­ingly sophis­ti­cated in response to stronger pro­tec­tion strate­gies. In par­tic­ular, they have become more coor­di­nated: Instead of uti­lizing a single com­puter or IP address to crawl the web for valu­able infor­ma­tion, efforts are dis­trib­uted across thou­sands of machines.

"That becomes a tougher problem to solve because it looks sim­ilar to benign user traffic," Kirda said. "It's not as straightforward."

Tra­di­tional pro­tec­tion mech­a­nisms, like a CAPTCHA, which oper­ates on an indi­vidual basis, are still useful, but their deploy­ment comes at a cost: Users may be annoyed if too many CAPTCHAs are shown. As an alter­na­tive, non­in­tru­sive approach, Pub­Crawl was specif­i­cally designed with dis­trib­uted crawling in mind. By iden­ti­fying IP addresses with sim­ilar behavior pat­terns, such as con­necting at sim­ilar inter­vals and fre­quen­cies, Pub­Crawl detects what it expects to be dis­trib­uted web-​​crawling activity.

Once a crawler is detected, the ques­tion is whether it is mali­cious or benign. "You don't want to block it com­pletely until you know for sure it is mali­cious," Kirda explained. "Instead, Pub­Crawl essen­tially keeps an eye on it."

Poten­tially mali­cious con­nec­tions can be rate-​​limited and a human oper­ator can take a closer look. If the oper­a­tors decide that the activity is mali­cious, IPs can also be blocked.

In order to eval­uate the approach, Kirda and his col­leagues used it to scan logs from a large-​​scale social net­work, which then pro­vided feed­back on its suc­cess. Then, the social net­work deployed it in real time, for a more robust eval­u­a­tion. Cur­rently, the social net­work is using the tool as a part of its pro­duc­tion system. Going for­ward, the team expects to iden­tify areas where the soft­ware could be evaded and make it even stronger.

Explore further: LinkedIn membership hits 300 million

add to favorites email to friend print save as pdf

Related Stories

Chipping away at cancer

Jun 25, 2012

(Medical Xpress) -- In the last two decades, the number of deaths from col­orectal cancer has steadily declined, according to the Amer­ican Cancer Society. While some of the decrease can be attrib­uted ...

The risk of carrying a cup of coffee

Jun 15, 2012

Object manip­u­la­tion or tool use is almost a uniquely human trait, said Dagmar Sternad, director of Northeastern’s Action Lab, a research group inter­ested in move­ment coor­di­na­tion. ...

Recommended for you

LinkedIn membership hits 300 million

Apr 18, 2014

The career-focused social network LinkedIn announced Friday it has 300 million members, with more than half the total outside the United States.

Researchers uncover likely creator of Bitcoin

Apr 18, 2014

The primary author of the celebrated Bitcoin paper, and therefore probable creator of Bitcoin, is most likely Nick Szabo, a blogger and former George Washington University law professor, according to students ...

White House updating online privacy policy

Apr 18, 2014

A new Obama administration privacy policy out Friday explains how the government will gather the user data of online visitors to WhiteHouse.gov, mobile apps and social media sites. It also clarifies that ...

User comments : 0

More news stories

Hackers of Oman news agency target Bouteflika

Hackers on Sunday targeted the website of Oman's official news agency, singling out and mocking Algeria's newly re-elected president Abdelaziz Bouteflika as a handicapped "dictator".

Ex-Apple chief plans mobile phone for India

Former Apple chief executive John Sculley, whose marketing skills helped bring the personal computer to desktops worldwide, says he plans to launch a mobile phone in India to exploit its still largely untapped ...

Easter morning delivery for space station

Space station astronauts got a special Easter treat: a cargo ship full of supplies. The shipment arrived Sunday morning via the SpaceX company's Dragon cargo capsule.