Researchers automate privacy compliance for big data systems

May 21, 2014

Web services companies, such as Facebook, Google and Microsoft, all make promises about how they will use personal information they gather. But ensuring that millions of lines of code in their systems operate in ways consistent with privacy promises is labor-intensive and difficult. A team from Carnegie Mellon University and Microsoft Research, however, has shown these compliance checks can be automated.

The researchers developed a prototype automated system that is now running on the data analytics pipeline of Bing, Microsoft's . According to Saikat Guha, researcher at Microsoft, it's the first time automated compliance analysis has been applied to the production code of an Internet-scale system and is a reflection of Microsoft's commitment to creating the technology necessary to further safeguard the privacy of customers.

Employing a new, lawyer-friendly language to specify privacy policies and using a data inventory to annotate existing programs, the researchers showed that a team of just five people could manage a daily compliance check on millions of lines of code written by several thousand developers.

They presented their research findings at the 35th IEEE Symposium on Security & Privacy, May 18-21, in San Jose, Calif.

"Companies in the United States have a legal obligation to declare how they use they gather and it's also good business to establish a bond of trust with customers," said Anupam Datta, associate professor of computer science and electrical and computer engineering. "But these systems are constantly evolving and their scale can be daunting. The manual methods typically used for checking compliance are labor intensive, yet too often fail to catch all violations of policy."

"Tens of millions of lines of code are already in the pipeline," noted Shayak Sen, a Ph.D. student in computer science who interned at Microsoft Research India and the lead student author on the study. "And during our implementation on Bing, we found that more than 20 percent of the code was changing on a daily basis." At these large scales, automated methods offer the best hope of verifying compliance.

"One reason that gaps exist between policies set by a company's privacy team and the code written by software developers is that the two groups don't speak the same language," Datta said. Lawyers and privacy champions typically have little experience in programming and developers attempting to translate policies into code can get tripped up by ambiguities in the language of the privacy policies.

So the researchers developed a language – Legalease – that could be easily learned and used by privacy advocates. It employs allow-deny rules with exceptions, a structure that is found in many privacy policies and laws, such as the Health Insurance Portability and Accountability Act (HIPAA), and is expressive enough to capture the real policies of an industrial-scale system such as Bing.

In preliminary usability testing, a dozen Microsoft employees were given a one-page document explaining Legalease and spent an average of under 5 minutes studying it. They then took an average of less than 15 minutes to encode nine Bing policy clauses regarding how user information can be used. "They were able to perform this task with a high degree of accuracy, which is encouraging," Sen said.

But encoding correctly means little if it cannot be applied to large codebases written by large teams of programmers. To solve this dilemma, the researchers leveraged Grok – a data inventory that annotates existing programs written in languages typically employed by MapReduce-like systems, such as those used by Bing and Google – for their backend data analytics over user data.

Grok performs this automated annotation by combining information from different sources with varying levels of confidence. For instance, automated pattern-matching to column names can be performed across an entire database, but with low confidence, while annotations by developers have high confidence, but low coverage.

Grok had been developed by Microsoft Research and deployed by Bing for the express purpose of automating privacy compliance checking the previous year, but writing policies for Grok was cumbersome.

"Legalease was the final piece of the automated privacy compliance jigsaw puzzle," Guha said. "Developed over Sen's internship and subsequent collaboration with CMU, Legalease bridged privacy teams with Grok, and through Grok, with the developers."

Datta said automating the process of compliance checks could push the industry to adopt stronger privacy protection policies.

"Sometimes, companies want to make their policies stronger, but hesitate because they are not sure they can ensure compliance in these large systems," he explained, noting that online privacy policy compliance is enforced in the United States by the Federal Trade Commission.

Explore further: Microsoft slams Google user data policy in new ads

Related Stories

Microsoft slams Google user data policy in new ads

February 1, 2012

Microsoft Corp. took out full-age ads in major newspapers Wednesday, slamming privacy policy changes at search rival Google Inc. that allow it to merge user data across its services.

Google: EU privacy spat will 'play itself out' (Update)

October 10, 2013

Eric Schmidt, Google's executive chairman, said Thursday that he respects but disagrees with complaints about his company's privacy policies made by data protection authorities in six European countries.

Four myths about privacy

May 1, 2014

(Phys.org) —Many privacy discussions follow a similar pattern, and involve the same kinds of arguments. It's commonplace to hear that privacy is dead, that people—especially kids—don't care about privacy, that people ...

Reading privacy policy lowers trust

May 20, 2014

Website privacy policies are almost obligatory for many online services, but for anyone who reads these often unwieldy documents, trust in the provider is more commonly reduced than gained, according to US researchers.

Recommended for you

Team develops targeted drug delivery to lung

September 2, 2015

Researchers from Columbia Engineering and Columbia University Medical Center (CUMC) have developed a new method that can target delivery of very small volumes of drugs into the lung. Their approach, in which micro-liters ...

Not another new phone! But Nextbit's Robin is smarter

September 2, 2015

San Francisco-based Nextbit wants you to meet Robin, which they consider as the smarter smartphone. Their premise is that no one is making a smart smartphone; when you get so big it's hard to see the forest through the trees. ...

Team creates functional ultrathin solar cells

August 27, 2015

(Phys.org)—A team of researchers with Johannes Kepler University Linz in Austria has developed an ultrathin solar cell for use in lightweight and flexible applications. In their paper published in the journal Nature Materials, ...

Magnetic fields provide a new way to communicate wirelessly

September 1, 2015

Electrical engineers at the University of California, San Diego demonstrated a new wireless communication technique that works by sending magnetic signals through the human body. The new technology could offer a lower power ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.