Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems Pingcheng Ruany, Gang Chenx, Tien Tuan Anh Dinhy, Qian Liny, Beng Chin Ooiy, Meihui Zhangz yNational University of Singapore xZhejiang University zBeijing Institute of Technology y{ruanpc, dinhtta, linqian, ooibc}@comp.nus.edu.sg [email protected] [email protected] ABSTRACT the previous block via a hash pointer. It was firstly in- The success of Bitcoin and other cryptocurrencies bring troduced in Bitcoin [27], where Satoshi Nakamoto employs enormous interest to blockchains. A blockchain system im- it to batch cryptocurrency transactions. Often referred to plements a tamper-evident ledger for recording transactions as decentralized ledger, the chain ensures integrity of the that modify some global states. The system captures entire complete transaction history. It is replicated over a peer-to- evolution history of the states. The management of that peer (P2P) network, and a distributed consensus protocol, history, also known as data provenance or lineage, has been namely Proof-of-Work (PoW), is used to ensure that honest studied extensively in database systems. However, query- nodes in the network have the same ledger. More recent ing data history in existing blockchains can only be done blockchains, for instance Ethereum [1] and Hyperledger [2], by replaying all transactions. This approach is applicable extend the original design to support applications beyond to large-scale, offline analysis, but is not suitable for online cryptocurrencies. In particular, they add smart contracts transaction processing. which encode arbitrary, Turing-complete computation on We present LineageChain, a fine-grained, secure and effi- top of the blockchain. A smart contract has its states stored cient provenance system for blockchains. LineageChain ex- on the blockchain, and the states are modified via transac- poses provenance information to smart contracts via sim- tions that invoke the contract. ple and elegant interfaces, thereby enabling a new class of Blockchains are disrupting many industries, including fi- blockchain applications whose execution logics depend on nance [34, 29], supply chain [24, 35], and healthcare [4]. provenance information at runtime. LineageChain captures These industries are exploiting two distinct advantages provenance during contract execution, and efficiently stores of blockchains over traditional data management systems. it in a Merkle tree. LineageChain provides a novel skip First, a blockchain is decentralized, which allows mutually list index designed for supporting efficient provenance query distrusting parties to manage the data together instead of processing. We have implemented LineageChain on top trusting a single party. Second, the blockchain provides of Hyperledger and a blockchain-optimized storage system integrity protection (tamper evidence) to all transactions called ForkBase. Our extensive evaluation of LineageChain recorded in the ledger. In other words, the complete trans- demonstrates its benefits to the new class of blockchain ap- action history is secure. plications, its efficient query, and its small storage over- The management of data history, or data provenance, has head. been extensively studied in databases, and many systems have been designed to support provenance [13, 14, 8, 30, PVLDB Reference Format: 5, 36]. In the context of blockchain, there is explicit, but Pingcheng Ruan, Gang Chen, Tien Tuan Anh Dinh, Qian Lin, only coarse-grained support for data provenance. In par- Beng Chin Ooi, Meihui Zhang. Fine-Grained, Secure and Effi- ticular, the blockchain can be seen as having some states cient Data Provenance on Blockchain Systems. PVLDB, 12(9): (with known initial values), and every transaction moves 975-988, 2019. the system to new states. The evolution history of the DOI: https://doi.org/10.14778/3329772.3329775 states (or provenance) can be securely and completely re- constructed by replaying all transactions. This reconstruc- tion can be done during offline analysis. During contract 1. INTRODUCTION execution (or runtime), however, no provenance informa- Blockchains are capturing attention from both academia tion is safely available to smart contracts. In other words, and industry. A blockchain is a chain of blocks, in which smart contracts cannot access historical blockchain states in each block contains many transactions and is linked with a tamper-evident manner. The lack of secure, fine-grained, runtime access to provenance therefore restricts the expres- This work is licensed under the Creative Commons Attribution- siveness of the business logic the contract can encode. NonCommercial-NoDerivatives 4.0 International License. To view a copy Consider an example smart contract shown in Figure 1, of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For which contains a method for transferring a number of tokens any use beyond those covered by this license, obtain permission by emailing from one user to another. Suppose user A wants to send [email protected]. Copyright is held by the owner/author(s). Publication rights tokens to B based on the latter’s historical balance in recent licensed to the VLDB Endowment. months. For example, A only sends token if B’s average Proceedings of the VLDB Endowment, Vol. 12, No. 9 ISSN 2150-8097. balance per day is more than t. It is not currently possible to DOI: https://doi.org/10.14778/3329772.3329775 975 contract Token { exposes simple access interface to smart contracts. method Transfer(sender, recipient, amount) { bal1 = gState[sender]; • We design a novel index optimized for querying bal2 = gState[recipient]; blockchain provenance. The index incurs small stor- if (amount < bal1) { age overhead, and its performance is independent of gState[sender] = bal1 - amount; the blockchain size. It is adapted from the skip list gState[recipient] = bal2 + amount; but we completely remove the randomness to fit for }}} deterministic blockchains. Figure 1: A smart contract that manages for token • We implement LineageChain for Hyperledger [2]. management. Our implementation builds on top of ForkBase, a blockchain-optimized storage [37]. We conduct exten- write a contract method for this operation. To work around sive evaluation of LineageChain. The results demon- this, A needs to first compute the historical balance of B strate its benefits to provenance-dependent applica- by querying and replaying all on-chain transactions, then tions, and its efficient query and small storage over- based on the result issues the Transfer transaction. Besides head. performance overhead incurred from multiple interactions LineageChain is a component of our Hyperledger++ sys- with the blockchain, this approach is not safe: it fails to tem [3], for which we improve Hyperledger’s execution and achieve transaction serializability. In particular, suppose A storage layer for the secure runtime provenance support. issues the Transfer transaction tx based on its computation Elsewhere, we have addressed the consensus bottleneck by of B’s historical balance. But before tx is received by the applying sharding efficiently and exploiting trusted hard- blockchain, another transaction is committed such that B’s ware to scale out system horizontally, to substantially im- average balance becomes t0 < t. Consequently, when tx prove the system throughput [15]. We have also improved is later committed, it will have been based on stale state, the storage efficiency by designing a tamper-evident stor- and therefore fails to meet the intended business logic. In age engine that supports efficient forking called Forkbase. blockchains with native currencies, serializability violation We are currently incorporating smart contract verification can be exploited for Transaction-Ordering attacks that cause to enhance the correctness of smart contracts. substantial financial loss to the users [25]. The remainder of the paper is organized as follows. Sec- In this paper, we design and implement a fine-grained, se- tion 2 provides background on blockchains. Section 3 de- cure and efficient provenance system for blockchains, called scribes our design for capturing provenance, and the inter- LineageChain. In particular, we aim to enable a new class face exposed to smart contracts. Section 4 discusses how of smart contracts that can access provenance information we store provenance, and Section 5 describes our new index. at runtime. Although our goal is similar to that of exist- Section 6 presents our implementation. Section 7 reports ing works in adding provenance to databases [5, 35, 31], we the performance of LineageChain. Section 8 discusses re- face three unique challenges due to the nature of blockchain. lated work, and Section 9 concludes this work. First, there is a lack of data operators whose semantics capture provenance in the form of input-output dependency. More specifically, for general data management workloads 2. BACKGROUND AND OVERVIEW (i.e., non-cryptocurrency), current blockchains expose only In this section, we present relevant background on generic operators, for example, put and get of key-value blockchain systems [18, 7], and design choices that affect tuples. These operators do not have input-output depen- index structure requirements. Following which, we present dency. In contrast, relational databases operators such as an overview of LineageChain. map, join, union, are defined as relations between input and output, which clearly capture their dependencies. To over- come this lack of provenance-friendly operators, we instru-

Load more