Analysis of Outsourcing Data to the Cloud Using Autonomous Key Generation
Total Page:16
File Type:pdf, Size:1020Kb
Scholars' Mine Masters Theses Student Theses and Dissertations Fall 2017 Analysis of outsourcing data to the cloud using autonomous key generation Mortada Abdulwahed Aman Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses Part of the Computer Engineering Commons Department: Recommended Citation Aman, Mortada Abdulwahed, "Analysis of outsourcing data to the cloud using autonomous key generation" (2017). Masters Theses. 7713. https://scholarsmine.mst.edu/masters_theses/7713 This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected]. ANALYSIS OF OUTSOURCING DATA TO THE CLOUD USING AUTONOMOUS KEY GENERATION by MORTADA ABDULWAHED AMAN A THESIS Presented to the Graduate Faculty of the MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY In Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE in COMPUTER ENGINEERING 2017 Approved by Dr. Egemen K. Çetinkaya, Advisor Dr. Maciej J. Zawodniok Dr. Sanjay K. Madria Copyright 2017 MORTADA ABDULWAHED AMAN All Rights Reserved iii ABSTRACT Cloud computing, a technology that enables users to store and manage their data at a low cost and high availability, has been emerging for the past few decades because of the many services it provides. One of the many services cloud computing provides to its users is data storage. The majority of the users of this service are still concerned to outsource their data due to the integrity and confidentiality issues, as well as performance and cost issues, that come along with it. These issues make it necessary to encrypt data prior to outsourcing it to the cloud. However, encrypting data prior to outsourcing makes searching the data obsolete, lowering the functionality of the cloud. Most existing cloud storage schemes often prioritize security over performance and functionality, or vice versa. In this thesis, the cloud storage service is explored, and the aspects of security, performance, and functionality are analyzed in order to investigate the trade-offs of the service. DSB-SEIS, a scheme with encryption intensity selection, an autonomous key generation algorithm that allows users to control the encryption intensity of their files, as well as other features is developed in order to find a balance between performance, security, and functionality. The features that DSB-SEIS contains are deduplication, assured deletion, and searchable encryption. The effect of encryption intensity selection on encryption, decryption, and key generation is explored, and the performance and security of DSB-SEIS are evaluated. The MapReduce framework is also used to investigate the DSB-SEIS algorithm performance with big data. Analysis demonstrates that the encryption intensity selection algorithm generates a manageable number of encryption keys based on the confidentiality of data while not adding significant overhead on encryption or decryption. iv ACKNOWLEDGMENTS I would like to thank Dr. Egemen K. Çetinkaya, Dr. Sanjay K. Madria, and Dr. Maciej J. Zawodniok for their feedback and continuous help with this work. Furthermore, I thank the CoNetS research group for listening and giving feedback to the ideas presented in this document. This thesis work was supported by the Department of Electrical and Computer Engineering at Missouri University of Science and Technology by providing funding through a graduate teaching and research assistantship. I would like to thank Missouri University of Science and Technology’s IT database team and Perry Koob for extensive help with providing required equipment and support during the experimentation phase of this thesis. I would also like the doctoral student, Katrina Ward for supporting the ideas presented in this work. Finally, I would like to thank ACM, NSF/CANSec, and NSF/Raytheon BBN Tech- nologies for providing travel grants to attend DCC 2016, CANSec 2016, and GENI-NICE 2016, respectively. v TABLE OF CONTENTS Page ABSTRACT . iii ACKNOWLEDGMENTS . iv LIST OF ILLUSTRATIONS . viii LIST OF TABLES . ix SECTION 1. INTRODUCTION AND MOTIVATION . 1 1.1. CONTRIBUTIONS .............................................................. 5 1.2. PUBLICATIONS ................................................................ 6 1.3. ORGANIZATION OF THESIS ................................................. 7 2. BACKGROUND AND RELATED WORK . 8 2.1. PRELIMINARIES ............................................................... 8 2.1.1. Cryptography ............................................................ 8 2.1.2. Deduplication ............................................................ 9 2.1.3. Searchable Encryption .................................................. 9 2.2. CLOUD STORAGE SERVICE ................................................. 9 2.3. DEDUPLICATION .............................................................. 14 2.4. SEARCHING ENCRYPTED DATA ............................................ 15 2.5. MAPREDUCE ................................................................... 18 2.6. CLOUDLAB TESTBED ........................................................ 19 vi 3. ARCHITECTURE . 20 3.1. WORKFLOW .................................................................... 20 3.2. ENCRYPTION INTENSITY SELECTION .................................... 23 3.2.1. Low Intensity ............................................................ 24 3.2.2. Medium Intensity ........................................................ 24 3.2.3. High Intensity............................................................ 25 3.3. DEDUPLICATION .............................................................. 25 3.4. ASSURED DELETION ......................................................... 27 3.5. SEARCHING ENCRYPTED DATA ............................................ 28 3.5.1. Keyword Search in Encrypted Index ................................... 29 3.5.2. Duplicate Checking of Encrypted Data ................................ 30 3.6. MAPREDUCE APPLICATIONS ............................................... 30 3.6.1. Indexing .................................................................. 31 3.6.2. Index Search ............................................................. 32 3.6.3. Disk Search .............................................................. 34 4. RESULTS . 36 4.1. WIKIMEDIA DUMP ............................................................ 37 4.2. DEDUPLICATION PERFORMANCE ......................................... 39 4.3. KEY GENERATION ANALYSIS .............................................. 39 4.4. ENCRYPTION PERFORMANCE .............................................. 41 4.5. DECRYPTION PERFORMANCE .............................................. 42 4.6. INDEXING PERFORMANCE.................................................. 44 4.7. SEARCH PERFORMANCE .................................................... 44 4.7.1. Index Search ............................................................. 44 4.7.2. Disk Search .............................................................. 46 4.8. MAPREDUCE APPLICATIONS ............................................... 47 vii 4.8.1. Indexing Performance ................................................... 47 4.8.2. Index Search Performance .............................................. 49 4.8.3. Disk Search Performance ............................................... 51 5. CONCLUSIONS . 53 6. FUTURE WORK . 56 APPENDIX . 58 REFERENCES . 220 VITA.........................................................................227 viii LIST OF ILLUSTRATIONS Figure Page 3.1 DSB-SEIS architecture............................................................... 21 3.2 Client application flowchart.......................................................... 21 3.3 Cloud application flowchart.......................................................... 22 3.4 Encryption intensity selection example ............................................. 25 3.5 MapReduce indexing application workflow......................................... 31 3.6 MapReduce index search application workflow .................................... 33 3.7 MapReduce disk search application workflow ...................................... 34 4.1 WikiMedia word distribution ........................................................ 37 4.2 Deduplication overhead .............................................................. 38 4.3 Size of data ........................................................................... 38 4.4 Key generation ....................................................................... 40 4.5 Encryption performance ............................................................. 41 4.6 Decryption performance ............................................................. 43 4.7 Index performance ................................................................... 43 4.8 Index search performance............................................................ 45 4.9 Search hits ............................................................................ 45 4.10 Disk search performance............................................................. 46 4.11 MapReduce indexing performance .................................................. 48 4.12 MapReduce index search performance .............................................. 49 4.13 Index search hits.....................................................................