Scalable and Efficient Provable Data Possession

Scalable and Efficient Provable Data Possession ∗ Giuseppe Ateniese Roberto Di Pietro Luigi V. Mancini The Johns Hopkins University UNESCO Chair in Data Privacy Universitàdi Roma ”La Sapienza” Department of Computer Science Universitat Rovira i Virgili Dipartimento di Informatica [email protected] [email protected] [email protected] Gene Tsudik University of California Irvine Department of Computer Science [email protected] ABSTRACT storage, storage update, storage security Storage outsourcing is a rising trend which prompts a number of interesting security issues, many of which have been extensively investigated in the past. However, Provable Data 1. INTRODUCTION Possession (PDP) is a topic that has only recently appeared In recent years, the concept of third-party data warehous- in the research literature. The main issue is how to fre- ing and, more generally, data outsourcing has become quite quently, efficiently and securely verify that a storage server popular. Outsourcing of data essentially means that the is faithfully storing its client’s (potentially very large) out- data owner (client) moves its data to a third-party provider sourced data. The storage server is assumed to be untrusted (server) which is supposed to – presumably for a fee – faith- in terms of both security and reliability. (In other words, it fully store the data and make it available to the owner (and might maliciously or accidentally erase hosted data; it might perhaps others) on demand. Appealing features of outsourc- also relegate it to slow or off-line storage.) The problem is ing include reduced costs from savings in storage, mainte- exacerbated by the client being a small computing device nance and personnel as well as increased availability and with limited resources. Prior work has addressed this prob- transparent up-keep of data. lem using either public key cryptography or requiring the A number of security-related research issues in data out- client to outsource its data in encrypted form. sourcing have been studied in the past decade. Early work In this paper, we construct a highly efficient and provably concentrated on data authentication and integrity, i.e., how secure PDP technique based entirely on symmetric key cryp- to efficiently and securely ensure that the server returns cor- tography, while not requiring any bulk encryption. Also, in rect and complete results in response to its clients’ queries contrast with its predecessors, our PDP technique allows [14, 25]. Later research focused on outsourcing encrypted outsourcing of dynamic data, i.e, it efficiently supports op- data (placing even less trust in the server) and associated erations, such as block modification, deletion and append. difficult problems mainly having to do with efficient query- ing over encrypted domain [20, 32, 9, 18]. Categories and Subject Descriptors More recently, however, the problem of Provable Data Possession (PDP) –is also sometimes referred to as Proof H.3.2 [Information Storage and Retrieval]: Information of Data Retrivability (POR)– has popped up in the research Storage; E.3 [Data Encryption] literature. The central goal in PDP is to allow a client to efficiently, frequently and securely verify that a server – who General Terms purportedly stores client’s potentially very large amount of data – is not cheating the client. In this context, cheating Security, Performance means that the server might delete some of the data or it might not store all data in fast storage, e.g., place it on CDs Keywords or other tertiary off-line media. It is important to note that a storage server might not be malicious; instead, it might be Provable data possession, probabilistic algorithm, archival simply unreliable and lose or inadvertently corrupt hosted ∗The author is solely responsible for the views expressed in data. An effective PDP technique must be equally appli- this paper, which do not necessarily reflect the position of cable to malicious and unreliable servers. The problem is UNESCO nor commit that organization. He is also with further complicated by the fact that the client might be a Universitàdi Roma Tre, Dipartimento di Matematica. small device (e.g., a PDA or a cell-phone) with limited CPU, battery power and communication facilities. Hence, the need to minimize bandwidth and local computation overhead for the client in performing each verification. Permission to make digital or hard copies of all or part of this work for Two recent results PDP [2] and POR [21] have highlighted personal or classroom use is granted without fee provided that copies are the importance of the problem and suggested two very differ- not made or distributed for profit or commercial advantage and that copies ent approaches. The first [2] is a public-key-based technique bear this notice and the full citation on the first page. To copy otherwise, to allowing any verifier (not just the client) to query the server republish, to post on servers or to redistribute to lists, requires prior specific and obtain an interactive proof of data possession. This permission and/or a fee. SecureComm 2008 September 22 - 25, 2008, Istanbul, Turkey property is called public verifiability. The interaction can Copyright 2008 ACM 978-1-60558-241-2 ...$5.00. be repeated any number of times, each time resulting in a fresh proof. The POR scheme [21] uses special blocks (called data possession, Alice could use a resource-constrained per- sentinels) hidden among other blocks in the data. During sonal device, e.g., a PDA or a cell-phone. In this realistic the verification phase, the client asks for randomly picked setting, our two design requirements – (1) outsourcing data sentinels and checks whether they are intact. If the server in cleartext and (2) bandwidth and computation efficiency modifies or deletes parts of the data, then sentinels would – are very important. also be affected with a certain probability. However, sentinels should be indistinguishable from other regular blocks; Contributions. this implies that blocks must be encrypted. Thus, unlike This paper’s contribution is two-fold: the PDP scheme in [2], POR cannot be used for public databases, such as libraries, repositories, or archives. In 1. Efficiency and Security: the proposed PDP scheme, as other words, its use is limited to confidential data. In addi- [21], relies only on efficient symmetric-key operations tion, the number of queries is limited and fixed a priori. This in both setup (performed once) and verification phases. is because sentinels, and their position within the database, However, our scheme is more efficient than POR as it must be revealed to the server at each query – a revealed requires no bulk encryption of outsourced data and no sentinel cannot be reused. data expansion due to additional sentinel blocks —see Section 5 for details. Our scheme provides probabilis- Motivation. tic assurance of the untampered data being stored on As pointed out in [16], data generation is currently out- the server with the exact probability fixed at setup, as pacing storage availability, hence, there will be more and in [2, 21]. Furthermore, our scheme is provably secure more need to outsource data. in the random oracle model (ROM). A particularly interesting application for PDP is distributed 2. Dynamic Data Support: unlike both prior results [2, data store systems. Consider, for example, the Freenet net- 21], the new scheme supports secure and efficient dy- work [10], or the Cooperative Internet Backup Scheme [24]. namic operations on outsourced data blocks, includ- Such systems rely and thrive on free storage. One reason ing: modification, deletion and append.1 Supporting to misbehave in such systems would be if storage space had such operations is an important step toward practical- real monetary cost attached to it. Moreover, in a distributed ity, since many envisaged application are not limited backup system, a user who grants usage of some of its own to data warehousing, i.e., dynamic operations need to space to store other users’ backup data is normally granted be provided. usage of a proportional amount of space somewhere else in the network, for his own backup. Hence, a user might try to obtain better redundancy for his own data. Furthermore, Roadmap. users will only place trust in such a system as long as the The next section introduces our approach to provable data storage can be consistently relied upon, which is difficult in possession and discusses its effectiveness and security; Sec- the event of massive cheating. A PDP scheme could act tion 3 extends and enhances it to support dynamic out- as a powerful deterrent to cheating, thus increasing trust sourced data (i.e., operations such as block update, append in the system and helping spread its popularity and usage. and delete). Then, Section 4 discusses some features of Finally, note that same considerations can be applied to al- our proposal, followed by Section 5 which overviews related ternative service models, such as peer-to-peer data archiving work. Finally, Section 6 includes a discussion of our results [11], where new forms of assurance of data integrity and data and outlines avenues for future work. accessibility are needed. (Though simple replication offers one path to higher-assurance data archiving, it also involves unnecessary, and sometimes unsustainable, overhead.) 2. PROPOSED PDP SCHEME Another application of PDP schemes is in the context of In this section we describe the proposed scheme. It con- censorship-resistance. There are cases where a duly autho- sists of two phases: setup and verification (also called chal- rized third party (censor) may need to modify a document lenge in the literature). in some controlled and limited fashion [3]. However, an important problem is to prevent unauthorized modifications. 2.1 Notation In this case, there are some well-known solutions, such as [33, 1] and [34].

Load more