High Performance Distributed Big File Cloud Storage

HIGH PERFORMANCE DISTRIBUTED BIG FILE CLOUD STORAGE A Master’s Project Presented to Department of Computer and Information Sciences SUNY Polytechnic Institute Utica, New York In Partial Fulfillment of the Requirements for the Master of Science Degree by Anusha Shakelli May 2016 © Anusha Shakelli 2016 1 HIGH PERFORMANCE DISTRIBUTED BIG FILE CLOUD STORAGE Except where reference is made to the work of others, the work described here is my own or was done in collaboration with my advisor and/or members of the advisory committee. Further, the content of this work is truthful in regards to my own work and the portrayal of other’s work. This work, I claim, does not include proprietary or classified information to the best of my knowledge. _______________________ Anusha Shakelli 3 Abstract Cloud storage services are growing at a fast rate and are emerging in data storage field. These services are used by people for backing up data, sharing file through social networks like Facebook [3], Zing Me [2]. Users will be able to upload data from computer, mobile or tablet and also download and share them to others. Thus, system load in cloud storage becomes huge. Nowadays, Cloud storage service has become a crucial requirement for many enterprises due to its features like cost saving, performance, security, flexibility. To design an efficient storage engine for cloud based systems, it is always required to deal with requirements like big file processing, lightweight metadata, deduplication, high scalability. Here we suggest a Big file cloud architecture to handle all problems in big file cloud system. Basically, here we propose to build a scalable distributed data cloud storage that supports big file with size up to several terabytes. In cloud storage, system load is usually heavy. Data deduplication to reduce wastage of storage space caused by storing same static data from different users. In order to solve the above problems, a common method used in Cloud storages, is by dividing big file into small blocks, storing them on disks and then dealing them using a metadata system [1], [6], [19], [20]. Current cloud storage services have a complex metadata system. Thereby, the space complexity of the metadata System is O(n) and it is not scalable for big file. In this research, a new big file cloud storage architecture and a better solution to reduce the space complexity of metadata is suggested. 4 Contents 1 Introduction 8 1.1 Background .................................................................................................................... 8 1.2 Motivation ..................................................................................................................... 8 1.3 What are Issues .............................................................................................................. 9 1.4 BFC Architecture ......................................................................................................... 13 1.4.1 Metadata ................................................................................................................ 14 1.4.2 Data deduplication ................................................................................................. 15 1.4.3 Data Security: Encryption and Decryption ............................................................ 17 1.4.4 Distributed System ................................................................................................ 18 1.5 Comparison between existing systems and proposed system ..................................... 19 1.5.1 Metadata comparison ............................................................................................. 19 1.5.2 Deduplication comparison ..................................................................................... 20 1.6 Compression (Project extension) ................................................................................. 20 1.7 The Scientific contribution of the project .................................................................... 21 1.8 Project Structure .......................................................................................................... 21 2 Aims and objectives of the project 23 2.1 Aim .............................................................................................................................. 23 2.2 Summary ...................................................................................................................... 24 3 Literature Survey 25 3.1 Introduction ................................................................................................................. 25 3.2 Dropbox ....................................................................................................................... 25 3.3 The Google File System .............................................................................................. 25 3.4 Personal Cloud Storage Services ................................................................................. 26 3.5 Overall development by the project ............................................................................. 26 4 Model Analysis and Design 28 4.1 Introduction ................................................................................................................. 28 4.2 Analysis ....................................................................................................................... 28 4.2.1 Existing System ..................................................................................................... 28 4.2.2 Proposed System.................................................................................................... 28 4.2.3 Software Requirement Specification ..................................................................... 29 4.3 Design .......................................................................................................................... 29 4.4 UML Diagrams ............................................................................................................ 29 4.4.1 Class diagram ........................................................................................................ 29 4.4.2 Use case diagram ................................................................................................... 30 4.4.3 Sequence diagram ................................................................................................. 31 5 4.4.4 Collaboration diagram .......................................................................................... 32 4.4.5 Component diagram ............................................................................................. 33 4.4.6 Deployment diagram ............................................................................................ 34 4.4.7 Activity diagram ................................................................................................... 35 4.4.8 Data Flow diagram ............................................................................................... 36 5 Implementation 37 5.1 BFC Architecture ......................................................................................................... 37 5.1.1 Module Description ............................................................................................... 37 5.2 Introduction of technologies used ................................................................................ 37 5.3 Java .............................................................................................................................. 37 5.3.1 AWT and Swings................................................................................................... 39 5.3.2 Code ....................................................................................................................... 39 6 Results 55 6.1 Screenshots .................................................................................................................. 55 6.2 Test Cases .................................................................................................................... 64 7 Conclusion and Future Predictions 66 7.1 Security in BFC ........................................................................................................... 66 7.2 Conclusion ................................................................................................................... 67 7.3 Future Predictions ........................................................................................................ 67 References 69 6 List of Figures Figure 1: Downloading Algorithm of application 12 Figure 2: BFC Architecture 14 Figure 3: Metadata Structure 15 Figure 4: Uploading and Deduplication Algorithm 16 Figure 5: Block diagram for Encryption and Decryption 17 Figure 6: Client-server architecture 18 Figure 7: Block diagram of data compression 20 Figure 8: Block diagram for code compilation and Interpretation of Java Source Code 38 List of Tables Table 1: Test Cases 64 7 Chapter 1 Introduction 1.1 Background The goal of this project is to achieve an efficient storage engine for cloud based systems taking into account the limitations faced by previously existing cloud storage systems like Dropbox [6], Personal cloud storage services [5] like Google drive, One drive, Cloud drive etc., and overcoming them in the design of Big file cloud (BFC) storage. In BFC, certain limitations like metadata complexity, data deduplication faced by the previous cloud based storage systems have been resolved. As an extension,

High Performance Distributed Big File Cloud Storage

The Google File System (GFS)

A Review on GOOGLE File System

F1 Query: Declarative Querying at Scale

A Study of Cryptographic File Systems in Userspace

Cloud Computing Bible Is a Wide-Ranging and Complete Reference

Mapreduce: Simplified Data Processing On

Google File System 2.0: a Modern Design and Implementation

A Survey on Cloud Storage

Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

Qos-Enabled Distributed Mutual Exclusion in Public Clouds

Google: the World's First Information Utility?

Scaling to Build the Consolidated Audit Trail