Remember that time, a decade or so ago, when spam was the scourge of the Internet, when the sheer volume of junk email threatened to engulf legitimate correspondence and short-circuit the promise of the digital revolution?
Those concerns are a bit distant these days. Spam is no longer Public Enemy No. 1 among the digerati. People have devised ways to combat it effectively. Our attentions have shifted to other, more recent problemssuch as click-spam.
Click-spam is defined as fraudulent or invalid clicks on online ads where the user has no actual interest in the advertisers site, but dont be misled into thinking this is merely some sort of Internet mischief. Click-spam has the potential to see millions of dollars of ad revenue misappropriated by nefarious means.
The scope of the challenge is made clear in Measuring and Fingerprinting Click-Spam in Ad Networks, written by Vacha Dave, Yin Zhang, and Saikat Guha. The first two are from The University of Texas at Austin, and Dave is spending the summer as an intern at Microsoft Research India, working alongside Guha, a researcher at that lab.
The paper is being presented in Helsinki on Aug. 15 during SIGCOMM 2012, the annual conference of the Association for Computing Machinerys Special Interest Group on Data Communication. It is part of a strong presence from Microsoft Research: six papers, representing 18 percent of the 34 accepted; a poster; a keynote address for the Workshop on Mobile Gaming by Victor Bahl, director of the Mobile Computing Research Center at Microsoft Research Redmond; and further Microsoft Research participation just about anywhere you look.
The nature of research means that, sometimes, paths of inquiry can lead in unfamiliar directions. That, Guha confirms, is what led he and his co-authors to focus on click-spam.
For another project, he recalls, where we signed up as an advertiser, we noticed a large number of clicks, but very little return on investment. That led us to ask the question, How would one, in a rigorous, systematic way, go about estimating the quality of traffic on an ad network?
To our surprise, we found this was extremely challenging, and, aside from a scattered few anecdotes, there was no well-known, systematic technique, so we decided to come up with one.
That technique, described in the SIGCOMM 2012 paper, not only breaks new ground in examining the click-spam problem, but also provides the first independent methodology for advertisers to measure click-spam on their adsand delivers an automated method for ad networks to detect simultaneous click-spam attacks proactively.
The latter is in response to one of the surprises the researchers encountered as they pursued the project: the poor quality of mobile-advertising-network traffic, as measured by how much time people spend on an advertisers landing page.
For a reputed ad network, only one out of 20 people clicking our ad stayed for longer than five seconds, Guha reports. We suspect this is because people mis-clicked the ad due to the small mobile-screen sizes and quickly hit the back button.
Given that mobile is the future of the Internet, this underscores the importance of our work of characterizing the problem today, so ad networks can make a concerted effort to fix it in years to come.
The researchers also conducted a large-scale measurement study of click-spam across 10 major ad networks and four types of adsfurther setting the stage for new directions to pursue in attempting to stem the click-spam tide.
Enabling researchers at universities and private institutions to do cutting-edge click-fraud research is, in my opinion, our most significant contribution, Guha says. Until today, researchers assumed only companies the size of Microsoft that can run large ad networks, could make an impact on this space.
The techniques we designed allow anyone to discover and characterize problems in this space and present solutions. We believe this will spur intense research, much as in the early 2000s in the context of email spam, that will ultimately result in the broader community figuring out how to deal with click-fraud.
Explore further: Computer scientist publishes new algorithm cluster to data mine health records