A Shared Database on Blockchains

BlockchainDB - A Shared Database on Blockchains Muhammad El-Hindi Carsten Binnig Arvind Arasu TU Darmstadt TU Darmstadt Microsoft Research [email protected] [email protected] [email protected] darmstadt.de darmstadt.de Donald Kossmann Ravi Ramamurthy Microsoft Research Microsoft Research [email protected] [email protected] ABSTRACT data. Second, blockchains can be operated reliably in a de- In this paper we present BlockchainDB, which leverages centralized manner without the need to involve a central blockchains as a storage layer and introduces a database trusted instance which often does not exist in data shar- layer on top that extends blockchains by classical data man- ing. However, while there is a lot of excitement around agement techniques (e.g., sharding) as well as a standardized blockchains in industry, they are still not being used as a query interface to facilitate the adoption of blockchains for shared database (DB) in many real-world scenarios. This data sharing use cases. We show that by introducing the has different reasons: First and foremost, a major obstacle additional database layer, we are able to improve the per- is their limited scalability and performance. Recent bench- formance and scalability when using blockchains for data marks [12] have shown that state-of-the-art blockchain sys- sharing and also massively decrease the complexity for or- tems such as Ethereum or Hyperledger, that can be used for 0s ganizations intending to use blockchains for data sharing. building general applications on top, can only achieve 10 or maximally 1000s of transactions per second, which is often PVLDB Reference Format: way below the requirements of modern applications. Second, Muhammad El-Hindi, Carsten Binnig, Arvind Arasu, Donald blockchains lack easy-to-use abstractions known from data Kossmann, Ravi Ramamurthy. BlockchainDB - A Shared Data- management systems such as a simple query interface as base on Blockchains. PVLDB, 12(11): 1597-1609, 2019. DOI: https://doi.org/10.14778/3342263.3342636 well as other guarantees like well-defined consistency levels that guarantee when/how updates become visible. Instead, 1. INTRODUCTION blockchains often come with proprietary programming inter- faces and require applications to know about the internals Motivation: Blockchain (BC) technology emerged as the of a blockchain to decide on the visibility of updates. basis for crypto-currencies, like Bitcoin [20] or Ethereum [8], allowing parties that do not trust each other to exchange Contribution: In this paper, we present BlockchainDB funds and agree on a common view of their balances. With that tackles the before-mentioned issues. The main idea is the advent of smart contracts, blockchain platforms are be- that BlockchainDB leverages blockchains as the native stor- ing used for many other use cases beyond crypto-currencies age layer and implements an additional database layer on and include applications in domains such as governmental, top to enable access to data stored in shared tables. That healthcare and IoT scenarios [7, 4, 26]. way, existing blockchain systems can be used (without mod- An important aspect that many scenarios have in com- ification) as a tamper-proof and de-centralized storage. On mon, is that blockchains are being used to provide shared top of the storage layer, BlockchainDB implements a data- data access for parties that do not trust each other. For ex- base layer with the following functions: ample, one use case is that the blockchain is used for track- ing goods in a supply chain where independent parties log • Partitioning and Partial Replication: A major per- the location of individual goods. What makes blockchains formance bottleneck of blockchains today is that all attractive in those scenarios are two main characteristics: peers hold a full copy of the state and still only pro- First, blockchains store data in an immutable append-only vide (limited) sharding capabilities. In the database ledger that contains the history of all data modifications. layer of BlockchainDB, we allow applications to define That way, blockchains enable auditability and traceability in how data is replicated and partitioned across all avail- order to detect potential malicious operations on the shared able peers. Thus, applications can trade performance and security guarantees in a declarative manner. • Query Interface and Consistency: In the DB layer, This work is licensed under the Creative Commons Attribution- BlockchainDB additionally provides shared tables as NonCommercial-NoDerivatives 4.0 International License. To view a copy easy-to-use abstractions including different consistency of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For protocols (e.g., eventual and sequential consistency) as any use beyond those covered by this license, obtain permission by emailing well as a simple key/value interface to read/write data [email protected]. Copyright is held by the owner/author(s). Publication rights without knowing the internals of a blockchain system. licensed to the VLDB Endowment. In future, we want to extend the query interface to Proceedings of the VLDB Endowment, Vol. 12, No. 11 ISSN 2150-8097. shared tables to support SQL with full transactional DOI: https://doi.org/10.14778/3342263.3342636 semantics. 1597 In addition to these functions, the database layer of Block- chainDB comes with an off-chain verification procedure in which peers can easily verify the read- and write-set of their put(100,'delivered') 5 4 get(100)->{'ready'} own clients. The idea of the verification procedure is that peers can detect other potentially misbehaving peers in the orderkey status Spurious Read: BlockchainDB network. This is needed since not all Block- put(100,{'new'}) get(100)->{'lost'} 100 new (WF) 6 chainDB peers hold the full copy of the database and the 1 →ready (Lindt) →delivered (FedEx) 3 storage layer of a remote peer could potentially drop puts OrderDetails (SharedTable) put(100,'ready') or return a spurious value for a read operation (i.e., a value 1 custkey orderkey product that was not persisted in the database). 2 put(WF,{100,Lindor}) WF 100 Lindor By introducing a database layer on top of an existing get(WF)->{100} blockchain, BlockchainDB is not only able to provide higher NewOrders (SharedTable) performance, but also to decrease the complexity for or- Figure 1: A Data Sharing Scenario ganizations intending to use blockchains for data sharing. While the concept of moving certain functions out of the say WholeFoods hosts the shared database as a service for all blockchain into additional application layers has been stud- their suppliers. In this setup, however, WholeFoods could ied previously (e.g., [14, 13, 19, 9]), to the best of our knowl- easily leverage the fact that it can manipulate the shared edge, BlockchainDB is the first system to provide a fully data or return false values to the other parties about the functional DB layer on top of blockchains. Our experiments order status, without any chance for the parties to verify show that BlockchainDB allows to increase the performance that WholeFoods was in fact acting in a malicious manner. significantly to support many real-world applications. For example, WholeFoods could claim that the order was lost during transit by returning a spurious (i.e., false) order Outline: The remainder of this paper is organized as fol- status to Lindt as shown in 6 to trigger that a replacement lows: First, in Section 2 we give an overview of what func- is sent (without paying for it). Even worse, WholeFoods tionality and security guarantees BlockchainDB provides for could actually delete the order or not store it in the database data sharing. Afterwards, in Section 3 we present the Block- in the first place. In case of a lawsuit, no evidence could thus chainDB architecture and discuss the trust assumptions as be found that WholeFoods (or any other party) was actually well as potential attacks. Then, in Section 4 and Section 5 acting maliciously. we discuss the details of the database layer and how blockchains are being used as the storage layer. Section 6 after- Guarantees: In order avoid these problems, a blockchain wards outlines our off-chain verification protocol. The re- network such as Ethereum or Hyperledger could be used to sults of our evaluations with the YCSB benchmark are then implement a shared database. The benefit of using a block- presented in Section 7. Finally, we conclude with related chain is that every party (WholeFoods, Lindt, and FedEx) work in Section 8 and a summary in Section 9. keeps a full copy of the database and the majority of parties needs to agree (based on the blockchain-based consensus protocol) on every update before its effect becomes visible. 2. OVERVIEW AND GUARANTEES As a consequence of using blockchains, we get the following As explained already in the introduction, there are many important guarantees for data sharing: First, the data in different applications where untrusted parties need to have each peer is stored in a tamper-proof log. That way, any shared access to the same database. To enable such a shared data modification of the log by any potentially malicious access to data, BlockchainDB provides so-called shared ta- party could be detected since the cryptographic hashes used bles. For accessing a shared table, clients can use the put/get in the blockchain would not be valid anymore. Second, all interface of BlockchainDB to access tuples in the shared ta- parties read only state from their local copy of the database. ble based on their primary key. In order to understand the That way, spurious reads (that are a problem if data needs functionality and guarantees that BlockchainDB provides to be read form a remote party) can be avoided. for untrusted parties to access data via shared tables, we However, as discussed in the introduction, using block- will introduce a short motivating scenario and use this sce- chains directly for data sharing comes with significant prob- nario also to outline the guarantees that applications get lems (e.g., w.r.t.

A Shared Database on Blockchains

Blockchain Database for a Cyber Security Learning System

An Introduction to Cloud Databases a Guide for Administrators

Middleware-Based Database Replication: the Gaps Between Theory and Practice

Database Technology for Bioinformatics from Information Retrieval to Knowledge Systems

What Is a Database? Differences Between the Internet and Library

Data Quality Requirements Analysis and Modeling December 1992 TDQM-92-03 Richard Y

DBOS: a Proposal for a Data-Centric Operating System

Multimedia Database Support for Digital Libraries

A Longitudinal Study of Database Usage Within a General Audience Digital Library Abstract 1. Introduction and Literature Revie

Edsger Wybe Dijkstra

Digital Library Resources

Oracle Database Upgrade: Quick Start Guide