Random additions efficiently anonymize large data sets

December 29, 2015
Random additions efficiently anonymize large data sets
Original (left) and reconstructed anonymized data (right) for age and occupation using the proposed algorithm.

Balancing transparency and freedom of information with the right to privacy lays high demands on data handling methods. So far methods for anonymizing shared data sets have assumed that there is a distinction between details that can be used to identify an individual (quasi-identifiers) and details that are deemed 'sensitive' and private, but this is not always the case. Now Yuichi Sei and Akihiko Ohsuga from the University of Electro- Communications, alongside Takao Takenouchi from NEC Corporation in Japan, have devised an algorithm that efficiently anonymizes data sets without assuming this distinction.

The researchers use hospital lists as an example. A data set may include the name (direct identifier), address and age (quasi-identifier) and sensitive information (a medical condition). Even without giving the name for each entry, someone using the data set could identify entries from the age and address. In addition, anonymization should be resistant to attempts to identify particulars by comparing two anonymized sets for the same data.

One approach to anonymizing data is to add noise to a data set, where the frequency of each possible value for each attribute is presented in a histogram. However as Sei, Ohsuga and Takenouchi point out this can greatly increase the quantity of the data. "Because almost all of the categories have only a few people in the histogram, the noise added to each category of the histogram has a heavy impact."

The UEC-NEC Corporation researchers instead randomised the data set for each attribute and added random values to each entry. "Through simulations of real , we prove that our proposed method can anonymize and reconstruct databases while keeping a high quality of data within a realistic period." The approach may be useful for anonymizing public records such as the census and electronic electoral votes.

Explore further: Big Data analyses depend on starting with clean data points

More information: (l1, ..., lq)-diversity for anonymizing sensitive quasi-identifiers 2015 IEEE Trustcom/BigDataSE/ISPA 596-603. DOI: 10.1109/Trustcom-BigDataSe-ISPA.2015.424

Related Stories

Big Data analyses depend on starting with clean data points

August 5, 2015

Popularly referred to as "Big Data," mammoth sets of information about almost every aspect of our lives have triggered great excitement about what we can glean from analyzing these diverse data sets. Benefits range from better ...

Social network analysis privacy tackled

February 14, 2015

Protecting people's privacy in an age of online big data is difficult, but doing so when using visual representations of such things as social network data may present unique challenges, according to a Penn State computer ...

Intelligent data analysis with guaranteed privacy

February 26, 2015

Siemens is developing tools that ensure smart data applications abide by data protection regulations. The reliable protection of data privacy is very important because it is a precondition for people or institutions to provide ...

Simple errors limit scientific scrutiny

November 11, 2015

Researchers have found more than half of the public datasets provided with scientific papers are incomplete, which prevents reproducibility tests and follow-up studies.

Yahoo plans to keep search records for 18 months

April 18, 2011

(AP) -- Yahoo plans to extend the amount of time it retains user search records to 18 months from 90 days. The company says it will consider keeping other types of information about its users for longer durations, too.

Recommended for you

New method analyzes corn kernel characteristics

November 17, 2017

An ear of corn averages about 800 kernels. A traditional field method to estimate the number of kernels on the ear is to manually count the number of rows and multiply by the number of kernels in one length of the ear. With ...

Optically tunable microwave antennas for 5G applications

November 16, 2017

Multiband tunable antennas are a critical part of many communication and radar systems. New research by engineers at the University of Bristol has shown significant advances in antennas by using optically induced plasmas ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.