Proofs of Replication Are Also Relevant in the Private-Verifier Setting of Proofs of Data Replication
Total Page:16
File Type:pdf, Size:1020Kb
PoReps: Proofs of Space on Useful Data Ben Fisch Stanford University, Protocol Labs Abstract A proof-of-replication (PoRep) is an interactive proof system in which a prover defends a publicly verifiable claim that it is dedicating unique resources to storing one or more retrievable replicas of a data file. In this sense a PoRep is both a proof of space (PoS) and a proof of retrievability (PoR). This paper establishes a foundation for PoReps, exploring both their capabilities and their limitations. While PoReps may unconditionally demonstrate possession of data, they fundamentally cannot guarantee that the data is stored redundantly. Furthermore, as PoReps are proofs of space, they must rely either on rational time/space tradeoffs or timing bounds on the online prover's runtime. We introduce a rational security notion for PoReps called -rational replication based on the notion of an -Nash equilibrium, which captures the property that a server does not gain any significant advantage by storing its data in any other (non-redundant) format. We apply our definitions to formally analyze two recently proposed PoRep constructions based on verifiable delay functions and depth robust graphs. Lastly, we reflect on a notable application of PoReps|its unique suitability as a Nakamoto consensus mechanism that replaces proof-of-work with PoReps on real data, simultaneously incentivizing and subsidizing the cost of file storage. 1 Introduction A proof-of-replication (PoRep) builds on the two prior concepts of proofs-of-retrievability (PoR) [30] and proofs-of-space (PoS) [24]. In the former a prover demonstrates that it can retrieve a file and in the latter the prover demonstrates that it is using some minimum amount of space to store information. Most proofs of space require the prover to use this space to store junk information that is only relevant to the PoS protocol. A PoRep, in essence, embeds a PoR within a PoS. It enables the prover to demonstrate that it is using some minimum amount of space while simultaneously allowing it to actually use that space to store useful information. An additional critical property of a PoRep is that the storage costs required to succeed in the protocol depend only on the size of the data inputs and are otherwise independent of the data inputs. In particular, the cost to succeed in the protocol should not depend on whether or not this data was privately preprocessed (e.g. encrypted by a client) or generated by the server itself. As another special case, if the input to the protocol were k redundant copies of the same file then this would cost the same as running the protocol on k distinct data files. Intuitively, this would achieve the following property: even if a PoRep prover could pass the protocol without storing the data redundantly (e.g. by deduplicating the k copies), there would be no advantage to doing so. In other words, it would be rational for a PoRep prover in this scenario to honestly store k copies of the data. 1 Consider as a thought exercise a simple composition of a PoR protocol and PoS protocol that would not succeed in achieving these goals. This simple protocol requires the prover to use a total of 2N space. The prover uses half of this space to produce a PoS (i.e. it runs a standard PoS protocol that requires it to fill this space with random data) and it uses the other half to actually store some useful data file of size N and produce PoRs of the file. This satisfies both a PoR and a PoS with only 2N storage and therefore shows both that the prover is using some minimum Ω(N) amount of storage and is able to retrieve the data of interest. However, it fails the \independent cost" criterion. Namely, it is more expensive for the prover to run this protocol on useful data (requiring 2N space to store both the useful data and the random data) than to just store the useless random data required for the PoS and provide a PoR for this random data (requiring only N space). Moreover, consider if the prover were asked to store k redundant copies of the same file D and use this protocol to prove that it (a) is using at least kN space to store these copies and (b) is able to retrieve D. Following the protocol honestly requires 2kN space: kN space to store the k copies of D and kN space to store the random data required for the PoS. Unfortunately, the most rational strategy would be to store only the random data and a single copy of D as this uses only (k + 1)N space and still allows the prover to successfully pass the protocol. While standard PoRs can provide proofs of data duplication in a private-verifier1 setting where the client preprocesses its own data before sending it to the server, their security relies on a non-colluding client to privately preprocess the data. One advantage of PoReps over standard PoRs for proofs of data duplication is that multiple clients could contribute data to a single database and would not need to trust any single client to preprocess the data. PoReps could also be used to provide proofs of storage for publicly available data. For example, a consensus server in a massively distributed and open state-replication system such as Bitcoin could provide a PoRep that it is storing a complete history of the state-machine transcript (i.e. in blockchain systems like Bitcoin this is referred to as a \full node" storing the \chain"). Unlike PoRs, PoReps could be used to provide this proof without requiring all the verifiers to send the server their own preprocessed copy of the public transcript (incurring impractical communication). PoReps provide a publicly verifiable proof of data duplication, secure against adversaries who will not deviate from an -rational honest strategy. The notion of an -equilibrium is used in game theory as a generalization of a Nash equilibrium where players gain at most an advantage from deviating. This solution concept is appropriate for a malicious-but-lazy adversary, or in conjunction with the status-quo-bias assumption: you are on the couch and the TV remote is across the room, so you continue to watch the same channel. This may seem like a strangely weak security property to achieve in a cryptographic protocol. For a very simple reason it is actually the best possible security that PoReps can achieve, at least in the standard2 model of computation for cryptographic analysis. In short, any prover storing k independent replicas of the file could intentionally correlate these replicas in a way that it can still efficiently retrieve each in its original format. For example, it could encrypt them and store the key. However, the primary use case of PoReps is probably not a proof of storage system that weakly discourages de-duplication. The properties achieved by PoReps make it uniquely suited 1Verification of PoRs can also be outsourced. Proofs of retrievability where a client can outsource the work of verifying proofs were previously referred to as publicly verifiable PoRs or \public PoRs", however the label outsourceable appropriately distinguishes their security property from the public verifiability of PoReps and proofs of space. 2This does not preclude secure hardware solutions or proofs based on network timing. 2 for a Nakamoto consensus mechanism (also known as blockchain consensus) that use PoReps as a useful proof of space in place of Bitcoin's proof-of-work to achieve sybil-resistance. Briefly, Nakamoto consensus mechanisms and its variants are a special type of state-machine replication process managed by an unpermissioned, asynchronous, and distributed network of consensus participants, with the additional feature that the state-machine itself encodes fungible tokens of value. In particular, a defining characteristic of these consensus mechanisms is their ability to mint new tokens in the state machine in order to reward and incentivize consensus participants (termed miners), taking for granted that these tokens represent real-world assets. Blockchain systems based on proof-of-space have been proposed [19,42] and are in active de- velopment, pursuing goals including energy efficiency and more egalitarian distribution.3 PoReps target a different advantage entirely: they simultaneously incentivize useful peer-to-peer data storage. This is the basis of Filecoin [1]. The miners that manage the system's distributed state-machine are required to produce PoReps in order to append transactions, or equivalently, be elected as temporary consensus leaders, and are rewarded with freshly minted coins in return. Accepting the hypothesis that the miners will be incentivized by these rewards alone to produce PoReps, just as Bitcoin miners produce wasteful proofs-of-work, the rewards{in effect–subsidize the auxiliary useful work accomplished by the PoRep: file storage. The utility of -rational replication in this context is immediately transparent. It character- izes the cost required to nudge a data replication strategy from a weak equilibrium strategy into a strong one. In other words, it represents the cost that clients must pay (in a stylized model that ignores other market variables) to convince miners to encode their real data inside PoReps rather than \useless" generated data, and therefore the degree to which a system such as Filecoin subsidizes storage costs. PoReps thus sit at an exciting crossroads of economics and cryptography: PoReps epitomize a cryptographic mechanism that is concerned not solely with the actions and capabilities of an adversary, nor penalties levied for misbehavior, but rather its broader effect on behaviors in an economy. 1.1 Related work Proofs of storage Cryptographic proofs of storage have been proposed in a variety of flavors throughout the literature.