Design and Implementation of a Blockchain Relational Database

Total Page:16

File Type:pdf, Size:1020Kb

Design and Implementation of a Blockchain Relational Database Blockchain Meets Database: Design and Implementation of a Blockchain Relational Database Senthil Nathan1, Chander Govindarajan1, Adarsh Saraf1, Manish Sethi2, and Praveen Jayachandran1 1IBM Research - India, 2IBM Industry Platforms, USA 1(snatara7, chandergovind, adasaraf, praveen.j)@in.ibm.com, [email protected] ABSTRACT In this paper, we design and implement the first-ever de- consensus among a set of untrusted parties. Opinion on the centralized replicated relational database with blockchain technology has varied widely from being hailed as the biggest properties that we term blockchain relational database. We innovation since the Internet to being considered as another highlight several similarities between features provided by database in disguise. For the first time, we undertake the blockchain platforms and a replicated relational database, challenge of explaining blockchain technology from the per- although they are conceptually different, primarily in their spective of concepts well known in databases, and highlight trust model. Motivated by this, we leverage the rich fea- the similarities and differences between the two. Existing tures, decades of research and optimization, and available blockchain platforms [18, 52, 40, 15], in an attempt to build tooling in relational databases to build a blockchain rela- something radically new and transformative, rebuild a lot of tional database. We consider a permissioned blockchain features provided by databases, and treat it as just a store model of known, but mutually distrustful organizations each of information. We take a contrarian view in this paper. operating their own database instance that are replicas of By leveraging the rich features and transactional process- one another. The replicas execute transactions indepen- ing capabilities already built into relational databases over dently and engage in decentralized consensus to determine decades of research and development, we demonstrate that the commit order for transactions. We design two approach- we can build a blockchain relational database with all fea- es, the first where the commit order for transactions is agreed tures provided by existing blockchain platforms and with upon prior to executing them, and the second where trans- better support for complex data types, constraints, triggers, actions are executed without prior knowledge of the commit complex queries, and provenance queries. Furthermore, our order while the ordering happens in parallel. We leverage approach makes building blockchain applications as easy as serializable snapshot isolation (SSI) to guarantee that the building database applications; developers can leverage the replicas across nodes remain consistent and respect the or- rich tooling available for backup, analytics, and integration dering determined by consensus, and devise a new variant with existing enterprise systems. Applications that have of SSI based on block height for the latter approach. We a strong compliance and audit requirements and need for implement our system on PostgreSQL and present detailed rich analytical queries such as in financial services that are performance experiments analyzing both approaches. already built atop relational databases, or ones that are reliant on rich provenance information such as in supply PVLDB Reference Format: chains are likely to most benefit from the blockchain rela- Senthil Nathan, Chander Govindarajan, Adarsh Saraf, Manish tional database proposed in this paper. Sethi, Praveen Jayachandran. Blockchain Meets Database: De- sign and Implementation of a Blockchain Relational Database. With a focus on enterprise blockchain applications, we tar- PVLDB, 12(11): 1539-1552, 2019. get a permissioned setup of known but untrusted organiza- DOI: https://doi.org/10.14778/3342263.3342632 tions each operating their own independent database nodes but connected together to form a blockchain network. These 1. INTRODUCTION database nodes need to be replicas of each other but dis- trust one another. Replication of transaction logs and state Blockchain has gained immense popularity over recent among multiple database nodes is possible using master- years, with its application being actively explored in several slave [17, 26, 46, 51] and master-master [26, 49, 51] proto- industries. At its core, it is an immutable append-only log cols. In these protocols, the master nodes need to be trusted; of cryptographically signed transactions, that is replicated for any transaction, a single node executes it and propagates and managed in a decentralized fashion through distributed the final updates to other nodes either synchronously [39, 21, 45] or asynchronously [29, 43, 47] , which would not work This work is licensed under the Creative Commons Attribution- in a blockchain network. Hence, transactions need to be NonCommercial-NoDerivatives 4.0 International License. To view a copy independently executed on all nodes. Further, such an ex- of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For ecution and subsequent commits across multiple untrusted any use beyond those covered by this license, obtain permission by emailing nodes must result in the same serializable order to ensure [email protected]. Copyright is held by the owner/author(s). Publication rights consistency. We need a novel replication protocol with a licensed to the VLDB Endowment. notion of decentralized trust. Although fault-tolerant syn- Proceedings of the VLDB Endowment, Vol. 12, No. 11 chronous commit protocols [30, 50] exist, they cannot be ISSN 2150-8097. DOI: https://doi.org/10.14778/3342263.3342632 used as they either require a centralized trusted controller 1539 or multple rounds of communications per transaction which (2) User authenticity & non-repudiability: Permis- would be costly in a network with globally distributed nodes. sioned blockchain systems employ public key infrastructure In this paper, we address this key challenge of ensuring for user management and ensuring authenticity. Users be- that all the untrusted database nodes execute transactions long to organizations that are permissioned to participate independently and commit them in the same serializable or- in the network. Transactions are digitally signed by the in- der asynchronously. Towards achieving this, we make two voker and recorded on the blockchain, making them non- key design choices. First, we modify the database to sep- repudiable. Relational databases have sophisticated user arate the concerns of concurrent transaction execution and management capabilities including support for users, roles, decentralized ordering of blocks of transactions. We leverage groups, and various user authentication options. However, ordering through consensus only to order blocks of transac- submitted and recorded transactions are not signed by the tions, and not for the serializable ordering of transactions invoker making them repudiable. within a single block. Second, independently at each node, (3) Confidentiality & access control: Some permis- we leverage serializable snapshot isolation (SSI) [24, 44] to sioned blockchains support the notion of confidentiality; sm- execute transactions concurrently and then serially validate art contracts, transactions and data are only accessible by & commit each transaction in a block to obtain a serializable authorized users. In addition, access to invoke functions order that will be identical across all untrusted nodes. and to modify data is restricted to specific users. Relational We make the following contributions in this paper: (1) We databases have comprehensive support for access control on devise two novel approaches of building a blockchain plat- tables, rows, columns and PL/SQL procedures. Some re- form starting from a database, (a) where block ordering is lational databases even provide features like content-based performed before transaction execution, (b) where transac- access control, encryption and isolation of data. Advanced tion execution happens parallelly without prior knowledge confidentiality features where only a subset of the nodes have of block ordering. (2) We devise a new variant of serializable access to a particular data element or smart contract can be snapshot isolation [44] based on block height to permit trans- implemented, although we don’t consider that in this paper. action execution to happen parallelly with ordering. (3) We (4) Immutability of transactions and ledger state: implement the system in PostgreSQL using only 4000 lines The blockchain ledger is an append-only log of blocks con- of C code and present the modified architecture. taining sets of transactions that are chained together by The rest of this paper is organized as follows. We highlight adding the hash of each block to it’s succeeding block. This the properties required in a blockchain platform, motivate makes it difficult for an attacker to tamper with a block and how several of these are either partially or fully available avoid detection. The ledger state can only be modified by in relational databases, and identify shortcomings that need invoking smart contract functions recorded as transactions to be addressed in §2. We discuss our design in §3 and on the blockchain and are immutable otherwise. Although a describe two approaches to transaction processing leveraging database logs all submitted transactions, executed queries, serializable snapshot isolation. We provide an account of our and user information, it isn’t possible to detect changes to prototype implementation on PostgreSQL
Recommended publications
  • Failures in DBMS
    Chapter 11 Database Recovery 1 Failures in DBMS Two common kinds of failures StSystem filfailure (t)(e.g. power outage) ‒ affects all transactions currently in progress but does not physically damage the data (soft crash) Media failures (e.g. Head crash on the disk) ‒ damagg()e to the database (hard crash) ‒ need backup data Recoveryyp scheme responsible for handling failures and restoring database to consistent state 2 Recovery Recovering the database itself Recovery algorithm has two parts ‒ Actions taken during normal operation to ensure system can recover from failure (e.g., backup, log file) ‒ Actions taken after a failure to restore database to consistent state We will discuss (briefly) ‒ Transactions/Transaction recovery ‒ System Recovery 3 Transactions A database is updated by processing transactions that result in changes to one or more records. A user’s program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned with data read/written from/to the database. The DBMS’s abstract view of a user program is a sequence of transactions (reads and writes). To understand database recovery, we must first understand the concept of transaction integrity. 4 Transactions A transaction is considered a logical unit of work ‒ START Statement: BEGIN TRANSACTION ‒ END Statement: COMMIT ‒ Execution errors: ROLLBACK Assume we want to transfer $100 from one bank (A) account to another (B): UPDATE Account_A SET Balance= Balance -100; UPDATE Account_B SET Balance= Balance +100; We want these two operations to appear as a single atomic action 5 Transactions We want these two operations to appear as a single atomic action ‒ To avoid inconsistent states of the database in-between the two updates ‒ And obviously we cannot allow the first UPDATE to be executed and the second not or vice versa.
    [Show full text]
  • Database Systems Administrator FLSA: Exempt
    CLOVIS UNIFIED SCHOOL DISTRICT POSITION DESCRIPTION Position: Database Systems Administrator FLSA: Exempt Department/Site: Technology Services Salary Grade: 127 Reports to/Evaluated by: Chief Technology Officer Salary Schedule: Classified Management SUMMARY Gathers requirements and provisions servers to meet the district’s need for database storage. Installs and configures SQL Server according to the specifications of outside software vendors and/or the development group. Designs and executes a security scheme to assure safety of confidential data. Implements and manages a backup and disaster recovery plan. Monitors the health and performance of database servers and database applications. Troubleshoots database application performance issues. Automates monitoring and maintenance tasks. Maintains service pack deployment and upgrades servers in consultation with the developer group and outside vendors. Deploys and schedules SQL Server Agent tasks. Implements and manages replication topologies. Designs and deploys datamarts under the supervision of the developer group. Deploys and manages cubes for SSAS implementations. DISTINGUISHING CAREER FEATURES The Database Systems Administrator is a senior level analyst position requiring specialized education and training in the field of study typically resulting in a degree. Advancement to this position would require competency in the design and administration of relational databases designated for business, enterprise, and student information. Advancement from this position is limited to supervisory management positions. ESSENTIAL DUTIES AND RESPONSIBILITIES Implements, maintains and administers all district supported versions of Microsoft SQL as well as other DBMS as needed. Responsible for database capacity planning and involvement in server capacity planning. Responsible for verification of backups and restoration of production and test databases. Designs, implements and maintains a disaster recovery plan for critical data resources.
    [Show full text]
  • SQL Server Protection Whitepaper
    SQL Server Protection Whitepaper Contents 1. Introduction ..................................................................................................................................... 2 Documentation .................................................................................................................................................................. 2 Licensing ............................................................................................................................................................................... 2 The benefits of using the SQL Server Add-on ....................................................................................................... 2 Requirements ...................................................................................................................................................................... 2 2. SQL Protection overview ................................................................................................................ 3 User databases ................................................................................................................................................................... 3 System databases .............................................................................................................................................................. 4 Transaction logs ................................................................................................................................................................
    [Show full text]
  • Blockchain Database for a Cyber Security Learning System
    Session ETD 475 Blockchain Database for a Cyber Security Learning System Sophia Armstrong Department of Computer Science, College of Engineering and Technology East Carolina University Te-Shun Chou Department of Technology Systems, College of Engineering and Technology East Carolina University John Jones College of Engineering and Technology East Carolina University Abstract Our cyber security learning system involves an interactive environment for students to practice executing different attack and defense techniques relating to cyber security concepts. We intend to use a blockchain database to secure data from this learning system. The data being secured are students’ scores accumulated by successful attacks or defends from the other students’ implementations. As more professionals are departing from traditional relational databases, the enthusiasm around distributed ledger databases is growing, specifically blockchain. With many available platforms applying blockchain structures, it is important to understand how this emerging technology is being used, with the goal of utilizing this technology for our learning system. In order to successfully secure the data and ensure it is tamper resistant, an investigation of blockchain technology use cases must be conducted. In addition, this paper defined the primary characteristics of the emerging distributed ledgers or blockchain technology, to ensure we effectively harness this technology to secure our data. Moreover, we explored using a blockchain database for our data. 1. Introduction New buzz words are constantly surfacing in the ever evolving field of computer science, so it is critical to distinguish the difference between temporary fads and new evolutionary technology. Blockchain is one of the newest and most developmental technologies currently drawing interest.
    [Show full text]
  • How to Conduct Transaction Log Analysis for Web Searching And
    Search Log Analysis: What is it; what’s been done; how to do it Bernard J. Jansen School of Information Sciences and Technology The Pennsylvania State University 329F IST Building University Park, Pennsylvania 16802 Email: [email protected] Abstract The use of data stored in transaction logs of Web search engines, Intranets, and Web sites can provide valuable insight into understanding the information-searching process of online searchers. This understanding can enlighten information system design, interface development, and devising the information architecture for content collections. This article presents a review and foundation for conducting Web search transaction log analysis. A methodology is outlined consisting of three stages, which are collection, preparation, and analysis. The three stages of the methodology are presented in detail with discussions of goals, metrics, and processes at each stage. Critical terms in transaction log analysis for Web searching are defined. The strengths and limitations of transaction log analysis as a research method are presented. An application to log client-side interactions that supplements transaction logs is reported on, and the application is made available for use by the research community. Suggestions are provided on ways to leverage the strengths of, while addressing the limitations of, transaction log analysis for Web searching research. Finally, a complete flat text transaction log from a commercial search engine is available as supplementary material with this manuscript. Introduction Researchers have used transaction logs for analyzing a variety of Web systems (Croft, Cook, & Wilder, 1995; Jansen, Spink, & Saracevic, 2000; Jones, Cunningham, & McNab, 1998; Wang, 1 of 42 Berry, & Yang, 2003). Web search engine companies use transaction logs (also referred to as search logs) to research searching trends and effects of system improvements (c.f., Google at http://www.google.com/press/zeitgeist.html or Yahoo! at http://buzz.yahoo.com/buzz_log/?fr=fp- buzz-morebuzz).
    [Show full text]
  • The Relational Data Model and Relational Database Constraints
    chapter 33 The Relational Data Model and Relational Database Constraints his chapter opens Part 2 of the book, which covers Trelational databases. The relational data model was first introduced by Ted Codd of IBM Research in 1970 in a classic paper (Codd 1970), and it attracted immediate attention due to its simplicity and mathematical foundation. The model uses the concept of a mathematical relation—which looks somewhat like a table of values—as its basic building block, and has its theoretical basis in set theory and first-order predicate logic. In this chapter we discuss the basic characteristics of the model and its constraints. The first commercial implementations of the relational model became available in the early 1980s, such as the SQL/DS system on the MVS operating system by IBM and the Oracle DBMS. Since then, the model has been implemented in a large num- ber of commercial systems. Current popular relational DBMSs (RDBMSs) include DB2 and Informix Dynamic Server (from IBM), Oracle and Rdb (from Oracle), Sybase DBMS (from Sybase) and SQLServer and Access (from Microsoft). In addi- tion, several open source systems, such as MySQL and PostgreSQL, are available. Because of the importance of the relational model, all of Part 2 is devoted to this model and some of the languages associated with it. In Chapters 4 and 5, we describe the SQL query language, which is the standard for commercial relational DBMSs. Chapter 6 covers the operations of the relational algebra and introduces the relational calculus—these are two formal languages associated with the relational model.
    [Show full text]
  • An Introduction to Cloud Databases a Guide for Administrators
    Compliments of An Introduction to Cloud Databases A Guide for Administrators Wendy Neu, Vlad Vlasceanu, Andy Oram & Sam Alapati REPORT Break free from old guard databases AWS provides the broadest selection of purpose-built databases allowing you to save, grow, and innovate faster Enterprise scale at 3-5x the performance 14+ database engines 1/10th the cost of vs popular alternatives - more than any other commercial databases provider Learn more: aws.amazon.com/databases An Introduction to Cloud Databases A Guide for Administrators Wendy Neu, Vlad Vlasceanu, Andy Oram, and Sam Alapati Beijing Boston Farnham Sebastopol Tokyo An Introduction to Cloud Databases by Wendy A. Neu, Vlad Vlasceanu, Andy Oram, and Sam Alapati Copyright © 2019 O’Reilly Media Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more infor‐ mation, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Development Editor: Jeff Bleiel Interior Designer: David Futato Acquisitions Editor: Jonathan Hassell Cover Designer: Karen Montgomery Production Editor: Katherine Tozer Illustrator: Rebecca Demarest Copyeditor: Octal Publishing, LLC September 2019: First Edition Revision History for the First Edition 2019-08-19: First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. An Introduction to Cloud Databases, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views.
    [Show full text]
  • Data Definition Language
    1 Structured Query Language SQL, or Structured Query Language is the most popular declarative language used to work with Relational Databases. Originally developed at IBM, it has been subsequently standard- ized by various standards bodies (ANSI, ISO), and extended by various corporations adding their own features (T-SQL, PL/SQL, etc.). There are two primary parts to SQL: The DDL and DML (& DCL). 2 DDL - Data Definition Language DDL is a standard subset of SQL that is used to define tables (database structure), and other metadata related things. The few basic commands include: CREATE DATABASE, CREATE TABLE, DROP TABLE, and ALTER TABLE. There are many other statements, but those are the ones most commonly used. 2.1 CREATE DATABASE Many database servers allow for the presence of many databases1. In order to create a database, a relatively standard command ‘CREATE DATABASE’ is used. The general format of the command is: CREATE DATABASE <database-name> ; The name can be pretty much anything; usually it shouldn’t have spaces (or those spaces have to be properly escaped). Some databases allow hyphens, and/or underscores in the name. The name is usually limited in size (some databases limit the name to 8 characters, others to 32—in other words, it depends on what database you use). 2.2 DROP DATABASE Just like there is a ‘create database’ there is also a ‘drop database’, which simply removes the database. Note that it doesn’t ask you for confirmation, and once you remove a database, it is gone forever2. DROP DATABASE <database-name> ; 2.3 CREATE TABLE Probably the most common DDL statement is ‘CREATE TABLE’.
    [Show full text]
  • SQL Vs Nosql: a Performance Comparison
    SQL vs NoSQL: A Performance Comparison Ruihan Wang Zongyan Yang University of Rochester University of Rochester [email protected] [email protected] Abstract 2. ACID Properties and CAP Theorem We always hear some statements like ‘SQL is outdated’, 2.1. ACID Properties ‘This is the world of NoSQL’, ‘SQL is still used a lot by We need to refer the ACID properties[12]: most of companies.’ Which one is accurate? Has NoSQL completely replace SQL? Or is NoSQL just a hype? SQL Atomicity (Structured Query Language) is a standard query language A transaction is an atomic unit of processing; it should for relational database management system. The most popu- either be performed in its entirety or not performed at lar types of RDBMS(Relational Database Management Sys- all. tems) like Oracle, MySQL, SQL Server, uses SQL as their Consistency preservation standard database query language.[3] NoSQL means Not A transaction should be consistency preserving, meaning Only SQL, which is a collection of non-relational data stor- that if it is completely executed from beginning to end age systems. The important character of NoSQL is that it re- without interference from other transactions, it should laxes one or more of the ACID properties for a better perfor- take the database from one consistent state to another. mance in desired fields. Some of the NOSQL databases most Isolation companies using are Cassandra, CouchDB, Hadoop Hbase, A transaction should appear as though it is being exe- MongoDB. In this paper, we’ll outline the general differences cuted in iso- lation from other transactions, even though between the SQL and NoSQL, discuss if Relational Database many transactions are execut- ing concurrently.
    [Show full text]
  • Applying the ETL Process to Blockchain Data. Prospect and Findings
    information Article Applying the ETL Process to Blockchain Data. Prospect and Findings Roberta Galici 1, Laura Ordile 1, Michele Marchesi 1 , Andrea Pinna 2,* and Roberto Tonelli 1 1 Department of Mathematics and Computer Science, University of Cagliari, Via Ospedale 72, 09124 Cagliari, Italy; [email protected] (R.G.); [email protected] (L.O.); [email protected] (M.M.); [email protected] (R.T.) 2 Department of Electrical and Electronic Engineering (DIEE), University of Cagliari, Piazza D’Armi, 09100 Cagliari, Italy * Correspondence: [email protected] Received: 7 March 2020; Accepted: 7 April 2020; Published: 10 April 2020 Abstract: We present a novel strategy, based on the Extract, Transform and Load (ETL) process, to collect data from a blockchain, elaborate and make it available for further analysis. The study aims to satisfy the need for increasingly efficient data extraction strategies and effective representation methods for blockchain data. For this reason, we conceived a system to make scalable the process of blockchain data extraction and clustering, and to provide a SQL database which preserves the distinction between transaction and addresses. The proposed system satisfies the need to cluster addresses in entities, and the need to store the extracted data in a conventional database, making possible the data analysis by querying the database. In general, ETL processes allow the automation of the operation of data selection, data collection and data conditioning from a data warehouse, and produce output data in the best format for subsequent processing or for business. We focus on the Bitcoin blockchain transactions, which we organized in a relational database to distinguish between the input section and the output section of each transaction.
    [Show full text]
  • Middleware-Based Database Replication: the Gaps Between Theory and Practice
    Appears in Proceedings of the ACM SIGMOD Conference, Vancouver, Canada (June 2008) Middleware-based Database Replication: The Gaps Between Theory and Practice Emmanuel Cecchet George Candea Anastasia Ailamaki EPFL EPFL & Aster Data Systems EPFL & Carnegie Mellon University Lausanne, Switzerland Lausanne, Switzerland Lausanne, Switzerland [email protected] [email protected] [email protected] ABSTRACT There exist replication “solutions” for every major DBMS, from Oracle RAC™, Streams™ and DataGuard™ to Slony-I for The need for high availability and performance in data Postgres, MySQL replication and cluster, and everything in- management systems has been fueling a long running interest in between. The naïve observer may conclude that such variety of database replication from both academia and industry. However, replication systems indicates a solved problem; the reality, academic groups often attack replication problems in isolation, however, is the exact opposite. Replication still falls short of overlooking the need for completeness in their solutions, while customer expectations, which explains the continued interest in developing new approaches, resulting in a dazzling variety of commercial teams take a holistic approach that often misses offerings. opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. Even the “simple” cases are challenging at large scale. We deployed a replication system for a large travel ticket brokering This paper aims to characterize the gap along three axes: system at a Fortune-500 company faced with a workload where performance, availability, and administration. We build on our 95% of transactions were read-only. Still, the 5% write workload own experience developing and deploying replication systems in resulted in thousands of update requests per second, which commercial and academic settings, as well as on a large body of implied that a system using 2-phase-commit, or any other form of prior related work.
    [Show full text]
  • Concurrency Control
    Concurrency Control Instructor: Matei Zaharia cs245.stanford.edu Outline What makes a schedule serializable? Conflict serializability Precedence graphs Enforcing serializability via 2-phase locking » Shared and exclusive locks » Lock tables and multi-level locking Optimistic concurrency with validation Concurrency control + recovery CS 245 2 Lock Modes Beyond S/X Examples: (1) increment lock (2) update lock CS 245 3 Example 1: Increment Lock Atomic addition action: INi(A) {Read(A); A ¬ A+k; Write(A)} INi(A), INj(A) do not conflict, because addition is commutative! CS 245 4 Compatibility Matrix compat S X I S T F F X F F F I F F T CS 245 5 Update Locks A common deadlock problem with upgrades: T1 T2 l-S1(A) l-S2(A) l-X1(A) l-X2(A) --- Deadlock --- CS 245 6 Solution If Ti wants to read A and knows it may later want to write A, it requests an update lock (not shared lock) CS 245 7 Compatibility Matrix New request compat S X U S T F Lock already X F F held in U CS 245 8 Compatibility Matrix New request compat S X U S T F T Lock already X F F F held in U F F F Note: asymmetric table! CS 245 9 How Is Locking Implemented In Practice? Every system is different (e.g., may not even provide conflict serializable schedules) But here is one (simplified) way ... CS 245 10 Sample Locking System 1. Don’t ask transactions to request/release locks: just get the weakest lock for each action they perform 2.
    [Show full text]