New system would allow individuals to pick and choose what data to share with websites, mobile apps

Jul 09, 2014
Credit: Christine Daniloff/MIT

Cellphone metadata has been in the news quite a bit lately, but the National Security Agency isn't the only organization that collects information about people's online behavior. Newly downloaded cellphone apps routinely ask to access your location information, your address book, or other apps, and of course, websites like Amazon or Netflix track your browsing history in the interest of making personalized recommendations.

At the same time, a host of recent studies have demonstrated that it's shockingly easy to identify unnamed individuals in supposedly "anonymized" data sets, even ones containing millions of records. So, if we want the benefits of data mining—like personalized recommendations or localized services—how can we protect our privacy?

In the latest issue of PLOS ONE, MIT researchers offer one possible answer. Their prototype system, openPDS—short for personal data store—stores data from your digital devices in a single location that you specify: It could be an encrypted server in the cloud, but it could also be a computer in a locked box under your desk. Any cellphone app, online service, or big-data research team that wants to use your data has to query your data store, which returns only as much as is required.

Sharing code, not data

"The example I like to use is personalized music," says Yves-Alexandre de Montjoye, a graduate student in media arts and sciences and first author on the new paper. "Pandora, for example, comes down to this thing that they call the music genome, which contains a summary of your musical tastes. To recommend a song, all you need is the last 10 songs you listened to—just to make sure you don't keep recommending the same one again—and this music genome. You don't need the list of all the songs you've been listening to."

With openPDS, de Montjoye says, "You share code; you don't share data. Instead of you sending data to Pandora, for Pandora to define what your are, it's Pandora sending a piece of code to you for you to define your musical preferences and send it back to them."

De Montjoye is joined on the paper by his thesis advisor, Alex "Sandy" Pentland, the Toshiba Professor of Media Arts and Sciences; Erez Shmueli, a postdoc in Pentland's group; and Samuel Wang, a software engineer at Foursquare who was a in the Department of Electrical Engineering and Computer Science when the research was done.

After an initial deployment involving 21 people who used openPDS to regulate access to their medical records, the researchers are now testing the system with several telecommunications companies in Italy and Denmark. Although openPDS can, in principle, run on any machine of the user's choosing, in the trials, data is being stored in the cloud.

Meaningful permissions

One of the benefits of openPDS, de Montjoye says, is that it requires applications to specify what information they need and how it will be used. Today, he says, "when you install an application, it tells you 'this application has access to your fine-grained GPS location,' or it 'has access to your SD card.' You as a user have absolutely no way of knowing what that means. The permissions don't tell you anything."

In fact, applications frequently collect much more data than they really need. Service providers and application developers don't always know in advance what data will prove most useful, so they store as much as they can against the possibility that they may want it later. It could, for instance, turn out that for some music listeners, album cover art turns out to be a better predictor of what songs they'll like than anything captured by Pandora's music genome.

OpenPDS preserves all that potentially useful data, but in a repository controlled by the end user, not the application developer or service provider. A developer who discovers that a previously unused bit of information is useful must request access to it from the user. If the request seems unnecessarily invasive, the user can simply deny it.

Of course, a nefarious developer could try to game the system, constructing requests that elicit more information than the user intends to disclose. A navigation application might, for instance, be authorized to identify the subway stop or parking garage nearest the user. But it shouldn't need both pieces of information at once, and by requesting them, it could infer more detailed than the user wishes to reveal.

Creating safeguards against such information leaks will have to be done on a case-by-case, application-by-application basis, de Montjoye acknowledges, and at least initially, the full implications of some query combinations may not be obvious. But "even if it's not 100 percent safe, it's still a huge improvement over the current state," he says. "If we manage to get people to have access to most of their , and if we can get the overall state of the art to move from anonymization to interactive systems, that would be such a huge win."

Explore further: openPDS software focuses on control of personal data

More information: * openpds.media.mit.edu/

* de Montjoye Y-A, Shmueli E, Wang SS, Pentland AS (2014) openPDS: Protecting the Privacy of Metadata through SafeAnswers. PLoS ONE 9(7): e98790. DOI: 10.1371/journal.pone.0098790

add to favorites email to friend print save as pdf

Related Stories

openPDS software focuses on control of personal data

Oct 07, 2013

(Phys.org) —Regarded as a building block for the personal data ecosystem, open PDS has arrived. As Thomas Hardjono, technical lead of the MIT Consortium for Kerberos and Internet Trust commented in New Sc ...

What is the price of free?

Mar 06, 2012

Scientists from the Computer Laboratory at Cambridge University have designed a method to improve privacy control in the Android apps market. The method reaches a balance between the need for developer’s ...

Recommended for you

Software provides a clear overview in long documents

12 hours ago

In the future, a software will help users better analyze long texts such as the documents for calls for bids, which are often more than one thousand pages long. Experts at Siemens' global research unit Corporate ...

Google worker shows early-draft glimpse of Chrome OS

Jul 20, 2014

The Chrome OS is in for a future look. Athena, a Chromium OS project, will bring forth the new Chrome OS user experience. Google's François Beaufort on Friday, referring to the screenshot he posted, said," ...

Google eyes Chrome on Windows laptop battery drain

Jul 19, 2014

Google Chrome on Microsoft Windows has been said to have a problem for some time but this week comes news that Google will give it the attention others think the problem quite deserves. Namely, Google is to ...

User comments : 1

Adjust slider to filter visible comments by rank

Display comments: newest first

rp142
not rated yet Jul 09, 2014
A great concept. The fatal flaw is not in the concept, it is in the organisations that will block it.

Collected data is often sold or has value to the company collecting it. Advertising companies want as much data as possible and they are happy to buy or steal this data. That is why there are so many applications that request permissions that have absolutely nothing to do with their stated purpose.

Getting a concept like this implemented on Android, or any other strong privacy protections, is going to be a challenge, without strong legislation. Google is an advertising company. They recently reduced a user's ability to protect their privacy by "simplifying" permissions.

Good luck.