Volume 29, Issue 7 e3902
Special Issue Paper

An improved anonymity model for big data security based on clustering algorithm

Chunyong Yin

Chunyong Yin

School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science & Technology, Nanjing, 210044 China

Search for more papers by this author
Sun Zhang

Sun Zhang

School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science & Technology, Nanjing, 210044 China

Search for more papers by this author
Jinwen Xi

Jinwen Xi

School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science & Technology, Nanjing, 210044 China

Search for more papers by this author
Jin Wang

Corresponding Author

Jin Wang

College of Information Engineering, Yangzhou University, Yangzhou, China

Correspondence to: Jin Wang, College of Information Engineering, Yangzhou University, Yangzhou, China.

E-mail: [email protected]

Search for more papers by this author
First published: 23 June 2016
Citations: 25

Summary

The accumulation of massive data generates the new concept of big data. The relationships hidden in big data can bring great benefits, which have attracted public attentions. Meanwhile, the challenges of big data security are also more serious than ever. Privacy disclosure is one of the most concerned problems, and the privacy protection of big data is more difficult than traditional information protection. The technology of data publishing anonymous protection can provide privacy protection with the respect of data releasing. K-anonymity and L-diversity are two kinds of anonymity model. Their main idea is to generalize the value of quasi-identifier and make the data accord with the model. In this paper, we propose the improved model which integrate K-anonymity with L-diversity and can solve the problem of imbalanced sensitive attribute distribution. K-member clustering algorithm can translate the problem of anonymity into the problem of clustering and find a set of equivalence classes in which the records will be generalized to the same value. We utilize K-member clustering algorithm to realize the improved anonymity model which can reduce the algorithm execution time and information loss. The integration of anonymity model and clustering algorithm makes the generalization process more efficient, which is particularly important for big data. Copyright © 2016 John Wiley & Sons, Ltd.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.