Authenticated Data Structures, Generically
Total Page:16
File Type:pdf, Size:1020Kb
Authenticated Data Structures, Generically Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi University of Maryland, College Park, USA Abstract Such a scenario can be supported using authenticated data An authenticated data structure (ADS) is a data structure whose structures (ADS) [5, 23, 30]. ADS computations involve two roles, operations can be carried out by an untrusted prover, the results of the prover and the verifier. The mirror plays the role of the prover, which a verifier can efficiently check as authentic. This is done storing the data of interest and answering queries about it. The by having the prover produce a compact proof that the verifier client plays the role of the verifier, posing queries to the prover can check along with each operation’s result. ADSs thus support and verifying that the returned results are authentic. At any point outsourcing data maintenance and processing tasks to untrusted in time, the verifier holds only a short digest that can be viewed as servers without loss of integrity. Past work on ADSs has focused summarizing the current contents of the data; an authentic copy of on particular data structures (or limited classes of data structures), the digest is provided by the data owner. When the verifier sends one at a time, often with support only for particular operations. the prover a query, the prover computes the result and returns it This paper presents a generic method, using a simple exten- along with a proof that the returned result is correct; both the proof sion to a ML-like functional programming language we call λ• and the time to produce it are linear in the time to compute the (lambda-auth), with which one can program authenticated oper- query result. The verifier can attempt to verify the proof (in time ations over any data structure defined by standard type construc- linear in the size of the proof) using its current digest, and will tors, including recursive types, sums, and products. The program- accept the returned result only if the proof verifies. If the verifier is mer writes the data structure largely as usual and it is compiled to also the data provider, the verifier may also update its data stored code to be run by the prover and verifier. Using a formalization of at the prover; in this case, the result is an updated digest and the λ• we prove that all well-typed λ• programs result in code that is proof shows that this updated digest was computed correctly. ADS secure under the standard cryptographic assumption of collision- computations have two properties. Correctness implies that when resistant hash functions. We have implemented λ• as an extension both parties execute the protocol correctly, the proofs given by the prover verify correctly and the verifier always receives the correct to the OCaml compiler, and have used it to produce authenticated 1 versions of many interesting data structures including binary search result. Security implies that a computationally bounded, malicious trees, red-black+ trees, skip lists, and more. Performance experi- prover cannot fool the verifier into accepting an incorrect result. ments show that our approach is efficient, giving up little compared Authenticated data structures can be traced back to Merkle [18]; to the hand-optimized data structures developed previously. the well-known Merkle hash tree can be viewed as providing an authenticated version of a bounded-length array. More recently, au- Categories and Subject Descriptors D.3.3 [Programming Lan- thenticated versions of data structures as diverse as sets [22, 26], guages]: Language Constructs and Features—Data types and struc- dictionaries [1, 12], range trees [16], graphs [13], skip lists [11, 12], tures B-trees [20], hash trees [25], and more [15] have been proposed. In each of these cases, the design of the data structure, the supporting General Terms Security, Programming Languages, Cryptogra- operations, and how they can be proved authentic have been recon- phy sidered from scratch, involving a new, potentially tricky proof of security. Arguably, this state of affairs has hindered the advance- 1. Introduction ment of new data-structure designs as previous ideas are not easily reused or reapplied. We believe that ADSs will make their way into Suppose data provider would like to allow third parties to mirror its systems more often if they become easier to build. data, providing a query interface over it to clients. The data provider This paper presents λ• (pronounced “lambda auth”), a language wants to assure clients that the mirrors will answer queries over the for programming authenticated data structures. λ• represents the data truthfully, even if they (or another party that compromises a first generic, language-based approach to building dynamic authen- mirror) have an incentive to lie. As examples, the data provider ticated data structures with provable guarantees. The key observa- might be providing stock market data, a certificate revocation list, tion underlying λ•’s design is that, whatever the data structure or the Tor relay list, or the state of the current Bitcoin ledger [21]. operation, the computations performed by the prover and verifier can be made structurally the same: the prover constructs the proof Permission to make digital or hard copies of all or part of this work for personal or at key points when executing a query, and the verifier checks a proof classroom use is granted without fee provided that copies are not made or distributed by using it to “replay” the query, checking at each key point that the for profit or commercial advantage and that copies bear this notice and the full citation computation is self-consistent. on the first page. Copyrights for components of this work owned by others than the λ• authenticated types author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or implements this idea using what we call , republish, to post on servers or to redistribute to lists, requires prior specific permission written •τ, with coercions auth and unauth for introducing and and/or a fee. Request permissions from [email protected]. eliminating values of an authenticated type. Using standard func- POPL ’14, January 22–24, 2014, San Diego, CA, USA. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2544-8/14/01. $15.00. 1 This property is sometimes called soundness but we eschew this term to http://dx.doi.org/10.1145/2535838.2535851 avoid confusion with its standard usage in programming languages. tional programming features, the programmer writes her ADS’s N2,0: h1 = hash(h2,h3) π = str2 @ N0,2 :: datatype definition and its corresponding operations (e.g., queries h7 @ N0,3 :: and updates) to use authenticated types. For example, as we show N1,0: h2 = hash(h4,h5) N1,1: h3 = hash(h6,h7) h2 @ N1,0 later in the paper, the programmer could write an efficient authen- ticated binary search tree using the (OCaml-style) type definition N0,0: str0, h4 N0,1: str1, h5 N0,2: str2, h6 N0,3: str3, h7 type bst = Tip j Bin of •bst × int × •bst along with es- h4 = hash(str0) h5 = hash(str1) h6 = hash(str2) h7 = hash(str3) sentially standard routines for querying and insertion. Then, given such a program, the λ• compiler produces code for both a prover Figure 1. A Merkle tree and proof π for fetch(2) describing the and a verifier that will produce or confirm, respectively, a proof of highlighted path (with bold edge and node outlines). the correct execution of the corresponding operation. Proofs con- sist of a stream of what we call shallow projections of the data the prover visits while running its routine: the prover’s code adds 2. We formalize the semantics and type rules for λ• and prove to this stream at each unauth call, while the verifier’s code draws that all well-typed λ• programs produce ADS protocols that from the stream at the corresponding call, checking for consistency. are both correct and secure. We give a more detailed overview of how this approach works, and 3. We have implemented λ• as an extension to the OCaml com- how authenticated types are represented, in Section 2. Importantly, piler3 and used it to program a variety of existing and new as we show in Sections 3 and 4, any well-typed program written in ADSs, showing good asymptotic performance that is compet- λ• compiles to a prover and verifier which are correct and secure, itive with hand-rolled implementations. where security holds under the standard cryptographic assumption 2 of collision-resistant hash functions. 2. Overview λ• provides two key benefits over prior work. First, it is ex- tremely flexible. We can use λ• to implement any dynamic data This section presents an overview of our approach. We begin by structure, both queries and updates, expressible using ML-style describing Merkle trees, the canonical example of an authenticated data types (involving sums, products, and recursive types). Our data structure. Then we give a flavor for λ• by showing how we can theoretical development, though not our implementation, also sup- use it to implement Merkle trees. We conclude with a discussion of ports authenticated functions. Previous work by Martel et al. [17] the flexibility and ease-of-use benefits of using λ• to write efficient can also be used to build DAG-oriented ADSs, but it supports only authenticated data structures. queries and not (incremental) updates, requires the data structure have a single root, and does not support authenticated functions. 2.1 Background: Merkle trees λ•’s flexibility does not compromise its performance. To the best The canonical example of an ADS is a Merkle tree [19], which is of our knowledge the asymptotic performance of every prior ADS the authenticated version of a full binary tree where data is stored construction from the literature based on collision-resistant hash- at the leaves but not the interior nodes.