Scalable and Efficient Provable Data Possession

Scalable and Efficient Provable Data Possession Giuseppe Ateniese1, Roberto Di Pietro2, Luigi V. Mancini3, and Gene Tsudik4 Abstract i.e., how to efficiently and securely ensure that the server returns correct and complete results in response Storage outsourcing is a rising trend which prompts a to its clients’ queries [1, 2]. Later research focused number of interesting security issues, many of which on outsourcing encrypted data (placing even less trust have been extensively investigated in the past. However, in the server) and associated difficult problems mainly Provable Data Possession (PDP) is a topic that has only having to do with efficient querying over encrypted do- recently appeared in the research literature. The main main [3, 4, 5, 6]. issue is how to frequently, efficiently and securely verify that a storage server is faithfully storing its client’s More recently, however, the problem of Provable (potentially very large) outsourced data. The storage Data Possession (PDP) –is also sometimes referred to server is assumed to be untrusted in terms of both secu- as Proof of Data Retrivability (POR)– has popped up rity and reliability. (In other words, it might maliciously in the research literature. The central goal in PDP is or accidentally erase hosted data; it might also relegate to allow a client to efficiently, frequently and securely it to slow or off-line storage.) The problem is exacer- verify that a server – who purportedly stores client’s po- bated by the client being a small computing device with tentially very large amount of data – is not cheating the limited resources. Prior work has addressed this prob- client. In this context, cheating means that the server lem using either public key cryptography or requiring might delete some of the data or it might not store all the client to outsource its data in encrypted form. data in fast storage, e.g., place it on CDs or other ter- In this paper, we construct a highly efficient and tiary off-line media. It is important to note that a storage provably secure PDP technique based entirely on sym- server might not be malicious; instead, it might be sim- metric key cryptography, while not requiring any bulk ply unreliable and lose or inadvertently corrupt hosted encryption. Also, in contrast with its predecessors, our data. An effective PDP technique must be equally appli- PDP technique allows outsourcing of dynamic data, i.e, cable to malicious and unreliable servers. The problem it efficiently supports operations, such as block modifi- is further complicated by the fact that the client might cation, deletion and append. be a small device (e.g., a PDA or a cell-phone) with limited CPU, battery power and communication facili- ties. Hence, the need to minimize bandwidth and local computation overhead for the client in performing each 1. Introduction verification. In recent years, the concept of third-party data ware- Two recent results PDP [7] and POR [8] have high- housing and, more generally, data outsourcing has be- lighted the importance of the problem and suggested come quite popular. Outsourcing of data essentially two very different approaches. The first [7] is a public- means that the data owner (client) moves its data to a key-based technique allowing any verifier (not just the third-party provider (server) which is supposed to – pre- client) to query the server and obtain an interactive sumably for a fee – faithfully store the data and make proof of data possession. This property is called pub- it available to the owner (and perhaps others) on de- lic verifiability. The interaction can be repeated any mand. Appealing features of outsourcing include re- number of times, each time resulting in a fresh proof. duced costs from savings in storage, maintenance and The POR scheme [8] uses special blocks (called sen- personnel as well as increased availability and transpar- tinels) hidden among other blocks in the data. Dur- ent up-keep of data. ing the verification phase, the client asks for randomly A number of security-related research issues in data picked sentinels and checks whether they are intact. If outsourcing have been studied in the past decade. Early the server modifies or deletes parts of the data, then work concentrated on data authentication and integrity, sentinels would also be affected with a certain probability. However, sentinels should be indistinguishable 1 The Johns Hopkins University. E-mail: [email protected] from other regular blocks; this implies that blocks must 2Universita` di Roma Tre. E-mail: [email protected] 3Universita` di Roma “La Sapienza”. E-mail: be encrypted. Thus, unlike the PDP scheme in [7], POR [email protected] cannot be used for public databases, such as libraries, 4University of California, Irvine. E-mail: [email protected] repositories, or archives. In other words, its use is limited to confidential data. In addition, the number of service is free, Alice wants to make sure that her data queries is limited and fixed a priori. This is because is faithfully stored and readily available. To verify data sentinels, and their position within the database, must possession, Alice could use a resource-constrained per- be revealed to the server at each query – a revealed sen- sonal device, e.g., a PDA or a cell-phone. In this realis- tinel cannot be reused. tic setting, our two design requirements – (1) outsourcing data in cleartext and (2) bandiwdth and computation efficiency – are very important. Motivation: As pointed out in [9], data generation is currently outpacing storage availability, hence, there will be more and more need to outsource data. Contributions: This paper’s contribution is two-fold: A particularly interesting application for PDP is dis- 1. Efficiency and Security: the proposed PDP tributed data store systems. Consider, for example, scheme, as [8], relies only on efficient symmetric- the Freenet network [10], or the Cooperative Internet key operations in both setup (performed once) and Backup Scheme [11]. Such systems rely and thrive on verification phases. However, our scheme is more free storage. One reason to misbehave in such systems efficient than POR as it requires no bulk encryp- would be if storage space had real monetary cost at- tion of outsourced data and no data expansion due tached to it. Moreover, in a distributed backup system, a to additional sentinel blocks —see Section 5 for user who grants usage of some of its own space to store details. Our scheme provides probabilistic assur- other users’ backup data is normally granted usage of ance of the untampered data being stored on the a proportional amount of space somewhere else in the server with the exact probability fixed at setup, as network, for his own backup. Hence, a user might try to in [7, 8]. Furthermore, our scheme is provably se- obtain better redundancy for his own data. Furthermore, cure in the random oracle model (ROM). users will only place trust in such a system as long as the storage can be consistently relied upon, which is diffi- 2. Dynamic Data Support: unlike both prior re- cult in the event of massive cheating. A PDP scheme sults [7, 8], the new scheme supports secure and could act as a powerful deterrent to cheating, thus in- efficient dynamic operations on outsourced data creasing trust in the system and helping spread its pop- blocks, including: modification, deletion and ap- ularity and usage. Finally, note that same considera- pend.1 Supporting such operations is an important tions can be applied to alternative service models, such step toward practicality, since many envisaged ap- as peer-to-peer data archiving [12], where new forms plication are not limited to data warehousing, i.e., of assurance of data integrity and data accessibility are dynamic operations need to be provided. needed. (Though simple replication offers one path to higher-assurance data archiving, it also involves unnec- Roadmap: The next section introduces our approach essary, and sometimes unsustainable, overhead.) to provable data possession and discusses its effective- Another application of PDP schemes is in the con- ness and security; Section 3 extends and enhances it to text of censorship-resistance. There are cases where a support dynamic outsourced data (i.e., operations such duly authorized third party (censor) may need to modify as block update, append and delete). Then, Section 4 a document in some controlled and limited fashion [13]. discusses some features of our proposal, followed by However, an important problem is to prevent unautho- Section 5 which overviews related work. Finally, Sec- rized modifications. In this case, there are some well- tion 6 includes a discussion of our results and outlines known solutions, such as [14, 15] and [16]. But, they ei- avenues for future work. ther require data replication or on-demand computation of a function (e.g., a hash) over the entire outsourced data. Though the latter might seem doable, it does not 2. Proposed PDP Scheme scale to petabytes and terabytes of data. In contrast, a well-designed PDP scheme would be, at the same time, In this section we describe the proposed scheme. secure and scalable/efficient. It consists of two phases: setup and verification (also A potentially interesting new angle for motivating called challenge in the literature). secure and efficient PDP techniques is the outsourcing of personal digital content. (For example, consider the 2.1. Notation rapidly increasing popularity of personal content outsourcing services, such as .MAC, GMAIL, PICASA and • D – outsourced data. We assume that D can be OFOTO.) Consider the following scenario: Alice wants represented as a single contiguous file of d equal- to outsource her life-long collection of digital content sized blocks: D[1];:::;D[d].

Load more