Provable Ownership of Encrypted Files in De-Duplication Cloud Storage
Total Page:16
File Type:pdf, Size:1020Kb
1 Provable Ownership of Encrypted Files in De-duplication Cloud Storage Chao Yangy z, Jianfeng May and Jian Renz ySchool of CS, Xidian University Xi’an, Shaanxi, 710071. Email: fchaoyang, [email protected] zDepartment of ECE, Michigan State University East Lansing, MI 48824. Email: fchaoyang, [email protected] Abstract—The rapid adoption of cloud storage services has tells a client that it does not have to upload the file, it created an issue that many duplicated copies of files are stored in means that some other clients have the same file, which the remote storage servers, which not only wastes the communica- could be a sensitive information [5]. More seriously, Halevi tion bandwidth for duplicated file uploading, but also increases the cost of security data management. To solve this problem, et al. recently found some new attacks to the client-side client-side deduplication was introduced to avoid the client from deduplication system [6]. In these attacks, by learning just uploading files already existed in the remote servers. However, the a small piece of information about the file, namely its hash existing scheme was recently found to be vulnerable to security value, an attacker is able to get the entire file from the server. attacks in that by learning a small piece of information related These attacks are not just theoretical. Some similar attacks to the file, such as the hash value of the file, the attacker may be able to get full access of the entire file; and the confidentiality that were implemented against Dropbox were also discovered of the date may be vulnerable to “honest-but-curious” attacks. by Mulazzani et al. [7], recently. The root cause of all these In this paper, to solve the problems mentioned above, we attacks is that there is a very short piece of information that propose a cryptographically secure and efficient scheme to represents the file, and an attacker that learns it can get access support cross-user client side deduplication over encrypted file. to the entire file. Our scheme utilizes the technique of spot checking in which the client only need to access small portions of the original file, Furthmore, the confidentiality of users’ sensitive data dynamic coefficients, randomly chosen indices of the original files against the cloud storage server in client-side deduplication and a subtle approach to distribute the file encrypting key among is another serious security problem. clients to satisfy security requirements. Our extensive security There are two kinds of straightforward methods: i) The analysis shows that the proposed scheme can generate provable user’s sensitive data can be encrypted by the cloud storage ownership of the encrypted file (POEF) with the presence of the curious server, and maintain a high detection probability of the server who will choose and maintain the encrypting key. But client misbehavior. Both performance analysis and simulation it is reported that as a famous cloud storage server, Dropbox results demonstrate that our proposed scheme is much more mistakenly kept all user data open to public for almost 4 hours, efficient than the existing schemes, especially in reducing the due to a new bug in their software [8]. It is also reported that burden of the client. a bug in Twitter’s client software which allows adversary to Index Terms—Cloud storage, Deduplication, Enrypted File, access users’ private data, is discovered [9]. ii) If users’ data Provable Ownership, Spot-checking are encrypted on client side and the encrypting key is kept away from cloud storage server, then there will be no such I. INTRODUCTION failure of privacy protection of sensitive data, even if cloud With the rapid adoption of Cloud services, a large volume storage server made such mistakes or was hacked in. of data is stored at remote servers, so techniques to save disk However, the second kind of straightforward client side space and network bandwidth are needed. A key concept in encryption with randomly chosen encrypting key will stop this context is deduplication, in which the server stores only a deduplication [5]. The reason is twofold: 1) The cloud storage single copy of a file, regardless of how many clients want to server does not possess the original file in plaintext anymore, store that file. All clients possessing that file only use the link so it is hard for server to authenticate whether a new client to the single copy of the file stored at the server. Furthermore, has the proof of ownership of the original file. 2) Encryptions if the server already has a copy of the file, then clients do of the same file by different users with different encrypting not have to upload it again to the server, which will save the keys will result in different ciphertexts, which will prevent bandwidth as well as the storage capacity and is called client- deduplication across multiusers for happening. side deduplication [1] extensively employed by Dropbox [2] Recently there are only a few of solutions to these new secu- and Wuala [3]. It is reported that business applications can rity problems metioned above. Mulazzani et al. [7] discovered achieve deduplication ratios from 1:10 to as much as 1:500, and implemented a new attack against Dropbox’s deduplica- resulting in disk and bandwidth savings of more 90% [4]. tion system and proposed a preliminary and simple revisal to However, the client-side deduplication introduces some new the communication protocol of Dropbox; Halevi et al. [6] put security problems. Harnik et al. found that when a server forward the proof-of-ownership (POW) model, in which, based 2 on Merkle Hash Trees and error-control encoding, a client can Computation requirements. The server typically has to han- prove to a server that it indeed has a copy of a file without dle a large number of files concurrently. So the solution should actually uploading it. However, neither of two methods above not impose too much burden on the server, even though it has is able to tackle the problem of the confidentiality of users’ more powerful computation capability. On the other hand, the sensitive data against the cloud storage server. To overcome client has limited storage as well as computation resources, this deficiency, Jia et al. [10] recently proposed a solution and it is the leading actor in the deduplication scenario who to support cross-user client side deduplication proof over has to prove to the server that it possesses the exactly same encrypted data. Actually they proposed a method to distribute file already stored at the server. So, the design of the solution a randomly chosen per-file encrypting key to all owners of should pay more attention to reducing the burden on the client the same file, and combined it with the POW proof method in terms of the computation and storage resources and, at the [6] to form a new scheme. However, this combination makes same time, keep the burden on the server at a relatively low their scheme inherit the drawbacks of POW proof method: level. the scheme cannot guarantee the freshness of the proof in Security requirements. Although the server has only the every challenge and has to build Merkle Hash Tree on the encrypted data without the file encrypting key, the solution origianl file which is inherently inefficient. Moreover, their must insist on that the verification and proof should be based scheme failed to provid enough security protection against on the availability of the original data in its original form, key exposure, because it encrypts the file encrypting key only instead of any stored message authentication code (MAC), or with a static and constant hash value of the original file in previously used verification results. Moreover, the requested all key distribution processes. As a result, the applicability parts of the original file should be randomly chosen every of these shcemes in scenario of client-side deduplication over time and the generated proof should be totally different in each encrypted data are greatly limited. challenge. So it is infeasible for anybody to forge or prepare In this paper, to solve the problem in the scenario of client- the proof in advance and to satisfy the verification challenge. side deduplication over encrypted data mentioned above, we Furthermore, the file encrypting key should be encrypted with propose a cryptographically secure and efficient solution where fresh and different keys in every process of key distribution a client proves to the server that it indeed has the encrypted between clients minimizing the risk of key exposure. file, which is called a Provable Ownership of Encrypted File (POEF). We achieve the efficient goal by relying on spot B. System Model checking [20], in which the client could only access small portions of the original file to generate the proof of possessing A typical network architecture for cloud data storage is the original file correctly, thus greatly reducing the burden of illustrated in Figure 1. There are two entities as follows: computation on the client and minimizing the I/O between the Storage Server. It will provide cloud storage service to all client and the server. At the same time, by utilizing dynamic kinds of users. Its computation and storage capability (CPU, coefficients and randomly chosen indices of the original files, I/O, network, etc) is stronger than each single user. Storage our scheme mixes the sampled portions of the original file Server will maintain the integrity of users’ data regardless of with the dynamic coefficients together to generate the unique plaintext or ciphertext, and the availability of cloud service. proof in every challenge. Furthermore, our scheme proposes Client Users.