Concurrency and Computation: Practice and Experience

Special Issue Paper

An improved anonymity model for big data security based on clustering algorithm

Chunyong Yin

School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science & Technology, Nanjing, 210044 China

Search for more papers by this author

Sun Zhang,

Sun Zhang

Search for more papers by this author

Jinwen Xi,

Jinwen Xi

Search for more papers by this author

Jin Wang,

Corresponding Author

Jin Wang

[email protected]

College of Information Engineering, Yangzhou University, Yangzhou, China

Correspondence to: Jin Wang, College of Information Engineering, Yangzhou University, Yangzhou, China.

E-mail: [email protected]

Search for more papers by this author

Chunyong Yin,

Chunyong Yin

Search for more papers by this author

Sun Zhang,

Sun Zhang

Search for more papers by this author

Jinwen Xi,

Jinwen Xi

Search for more papers by this author

Jin Wang,

Corresponding Author

Jin Wang

[email protected]

College of Information Engineering, Yangzhou University, Yangzhou, China

Correspondence to: Jin Wang, College of Information Engineering, Yangzhou University, Yangzhou, China.

E-mail: [email protected]

Search for more papers by this author

First published: 23 June 2016

https://doi.org/10.1002/cpe.3902

Citations: 25

Share a link

Email
Wechat
Bluesky

Summary

The accumulation of massive data generates the new concept of big data. The relationships hidden in big data can bring great benefits, which have attracted public attentions. Meanwhile, the challenges of big data security are also more serious than ever. Privacy disclosure is one of the most concerned problems, and the privacy protection of big data is more difficult than traditional information protection. The technology of data publishing anonymous protection can provide privacy protection with the respect of data releasing. K-anonymity and L-diversity are two kinds of anonymity model. Their main idea is to generalize the value of quasi-identifier and make the data accord with the model. In this paper, we propose the improved model which integrate K-anonymity with L-diversity and can solve the problem of imbalanced sensitive attribute distribution. K-member clustering algorithm can translate the problem of anonymity into the problem of clustering and find a set of equivalence classes in which the records will be generalized to the same value. We utilize K-member clustering algorithm to realize the improved anonymity model which can reduce the algorithm execution time and information loss. The integration of anonymity model and clustering algorithm makes the generalization process more efficient, which is particularly important for big data. Copyright © 2016 John Wiley & Sons, Ltd.

References

1Feng DG, Zhang M, Hao L. Big data security and privacy protection. Chinese Journal of Computers 2014; 37(1): 246–258.
Google Scholar
2Zhou B, Pei J. The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowledge and Information Systems 2011; 28(1): 47–77.
10.1007/s10115-010-0311-2
Web of Science® Google Scholar
3Friedman A, Wolff R, Schuster A. Providing k-anonymity in data mining. Vldb J 2008; 17(4): 789–804.
10.1007/s00778-006-0039-5
Web of Science® Google Scholar
4Bayardo JR, Agrawal R. Data privacy through optimal k-anonymization. ICDE 2005; 2010: 217–228.
Google Scholar
5Samarati P. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 2001; 13(6): 1010–1027.
10.1109/69.971193
Web of Science® Google Scholar
6Lefevre K, Dewitt D, Ramakrishnan R. Incognito: efficient full-domain K-anonymity. Proc of Sigmod 2005; 2005: 49–60.
Google Scholar
7Fun B, Wang K, Yu P. Top-down specialization for information and privacy preservation. IEEE Computer Society 2005; 2005: 205–216.
Google Scholar
8Meyerson A, Williams R. On the complexity of optimal K-anonymity. ACM 2010; 23: 223–228.
Google Scholar
9Yin C, Zou M. D I, Wang J. Botnet detection based on correlation of malicious behaviors. Int J Hybrid Inf Technol 2013; 6(6): 291–300.
10.14257/ijhit.2013.6.6.26
Google Scholar
10Yin C. Towards accurate node-based detection of P2P Botnets. Scientific World Journal 2014; 2014: 425–491.
10.1155/2014/425491
Web of Science® Google Scholar
11Sun X, Wang H, Li J, Truta T, Li P. (p+, α)-sensitive k-anonymity: a new enhanced privacy protection model. Computer and Information Technology 2008; 2008: 59–64.
Google Scholar
12Sun X, Wang H, Zhang Y. On the identity anonymization of high-dimensional rating data. Concurrency & Computation Practice & Experience 2012; 24(10): 1108–1122.
10.1002/cpe.1724
Web of Science® Google Scholar
13Yin C, Ma L, Lu F. A feature selection method for improved clonal algorithm towards intrusion detection. International Journal of Pattern Recognition and Artificial Intelligence 2016; 30(5): 1–13.
10.1142/S0218001416590138
Web of Science® Google Scholar
14Yin C, Wu G, Yu CW. Protecting location privacy and query privacy: a combined clustering approach. Concurrency & Computation Practice & Experience 2014; 27(12): 3021–3043.
Web of Science® Google Scholar
15Marius Truta T, Vinay B. Privacy protection: p-sensitive k-anonymity property. IEEE Computer Society 2006; 2006: 94.
Google Scholar
16Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M. L-diversity: privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 2007; 1(1): 24.
10.1145/1217299.1217302
Google Scholar
17Soria-Comas J, Domingo-Ferrer J. Big data privacy: challenges to privacy principles and models. Data Science and Engineering 2016; 1(1): 21–28.
10.1007/s41019-015-0001-x
Google Scholar
18Sun X, Wang H, Li J, Truta TM. Enhanced p-sensitive k-anonymity models for privacy preserving data publishing. Transactions On Data Privacy 2008; 1(2008): 53–66.
Google Scholar
19Wong K-S, Kim MH. Towards a respondent-preferred k_i-anonymity model. Frontiers of Information Technology & Electronic Engineering 2015; 16(9): 720–731.
Web of Science® Google Scholar
20Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M. L-diversity: privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 2007; 1(1): 24.
10.1145/1217299.1217302
Google Scholar
21Gehrke J, Kifer D, Machanavajjhala A. ℓ-Diversity. Encyclopedia of Cryptography & Security 2011; 2011: 707–709.
10.1007/978-1-4419-5906-5_899
Google Scholar
22Kim YK, Hossain A, Hossain AA, Chang JW. Hilbert-order based spatial cloaking algorithm in road network. Concurrency & Computation Practice & Experience 2013; 25(1): 143–158.
10.1002/cpe.2844
Web of Science® Google Scholar
23Gu B, Sheng VS, Tay K, Romano W, Li S. Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 2014; 26(7): 1403–1416.
10.1109/TNNLS.2014.2342533
PubMed Web of Science® Google Scholar
24Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S. Incremental learning for ν-support vector regression. Neural Networks 2015; 67: 140–150.
10.1016/j.neunet.2015.03.013
PubMed Web of Science® Google Scholar
25Wang K, Shao Y, Shu L, Zhu C, Zhang Y. Mobile big data fault-tolerant processing for ehealth networks. IEEE Network Magazine 2016; 30(1): 36–42.
10.1109/MNET.2016.7389829
Web of Science® Google Scholar
26Kang J, Yu R, Huang X, Bogucka H, Gjessing S, Zhang Y. Location privacy attack and defense in cloud-enabled Internet of vehicles. accepted by IEEE Wireless Communications:
Google Scholar
27Byun J-W, Kamra A, Bertino E, Li N. Efficient k-anonymization using clustering techniques. DASFAA 2007; 4443: 188–200.
Web of Science® Google Scholar
28Wang Y, Xiang Y, Zhang J, Zhou W, Wei G, Laurence TY. Internet traffic classification using constrained clustering. IEEE Transactions on Parallel and Distributed Systems 2014; 25(11): 2932–2943.
10.1109/TPDS.2013.307
Web of Science® Google Scholar
29Wazir K, Xiang Y, Mohammed A, Quratulain A. Mobile phone sensing systems: a survey. IEEE Communication Surveys and Tutorials 2013; 15(1): 402–407.
10.1109/SURV.2012.031412.00077
Web of Science® Google Scholar

Citing Literature

Volume29, Issue7

Combined Special Issues on Security and privacy in social networks (NSS2015) and 18th IEEE International Conference on Computational Science and Engineering (CSE2015)

10 April 2017

e3902

An improved anonymity model for big data security based on clustering algorithm

Summary

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

An improved anonymity model for big data security based on clustering algorithm

Summary

References

Citing Literature

References

Related

Information