H-Store: a High-Performance, Distributed Main Memory Transaction Processing System

H-Store: A High-Performance, Distributed Main Memory Transaction Processing System Robert Kallman Evan P. C. Jones John Hugg Hideaki Kimura Samuel Madden Vertica Inc. Jonathan Natkins Michael Stonebraker [email protected] Andrew Pavlo Yang Zhang Alexander Rasin Massachusetts Institute of Stanley Zdonik Technology Daniel J. Abadi Brown University Yale University {evanj, madden, {rkallman, hkimura, stonebraker, [email protected] jnatkins, pavlo, alexr, yang}@csail.mit.edu sbz}@cs.brown.edu ABSTRACT sponsibilities across multiple shared-nothing machines. Al- Our previous work has shown that architectural and appli- though some RDBMS platforms now provide support for cation shifts have resulted in modern OLTP databases in- this paradigm in their execution framework, research shows creasingly falling short of optimal performance [10]. In par- that building a new OLTP system that is optimized from ticular, the availability of multiple-cores, the abundance of its inception for a distributed environment is advantageous main memory, the lack of user stalls, and the dominant use over retrofitting an existing RDBMS [2]. of stored procedures are factors that portend a clean-slate Using a disk-oriented RDBMS is another key bottleneck redesign of RDBMSs. This previous work showed that such in OLTP databases. All but the very largest OLTP applica- a redesign has the potential to outperform legacy OLTP tions are able to fit their entire data set into the memory of databases by a significant factor. These results, however, a modern shared-nothing cluster of server machines. There- were obtained using a bare-bones prototype that was devel- fore, disk-oriented storage and indexing structures are un- oped just to demonstrate the potential of such a system. We necessary when the entire database is able to reside strictly have since set out to design a more complete execution plat- in memory. There are some main memory database systems form, and to implement some of the ideas presented in the available today, but again many of these systems inherit the original paper. Our demonstration presented here provides architectural baggage of System R [6]. Other distributed insight on the development of a distributed main memory main memory databases have also focused on the migration OLTP database and allows for the further study of the chal- of legacy architectural features to this environment [4]. lenges inherent in this operating environment. Our research is focused on developing H-Store, a next- generation OLTP system that operates on a distributed cluster of shared-nothing machines where the data resides en- 1. INTRODUCTION tirely in main memory. The system model is based on the The use of specialized data engines has been shown to coordination of multiple single-threaded engines to provide outperform traditional or “one size fits all” database sys- more efficient execution of OLTP transactions. An earlier tems [8, 9]. Many of these traditional systems use a myriad prototype of the H-Store system was shown to significantly of architectural components inherited from the original Sys- outperform a traditional, disk-based RDBMS installation tem R database, regardless if the target application domain using a well-known OLTP benchmark [10, 11]. The results actually needs such unwieldy techniques [1]. For example, from this previous work demonstrate that our ideas have the workloads for on-line transaction processing (OLTP) merit; we expand on this work and in this paper we present systems have particular properties, such as repetitive and a more full-featured version of the system that is able to short-lived transaction executions, that are hindered by the execute across multiple machines within a local area clus- I/O performance of legacy RDBMS platforms. One obvious ter. With this new system, we are investigating interesting strategy to mitigate this problem is to scale a system hor- aspects of this operating environment, including how to ex- izontally by partitioning both the data and processing re- ploit non-trivial properties of transactions. We begin in Section 2 by first providing an overview of Permission to make digital or hard copies of portions of this work for the internals of the H-Store system. The justification for personal or classroom use is granted without fee provided that copies many of the design decisions for H-Store is found in ear- arePermission not made to copyor distr withoutibuted feefor all pr orofit part or ofco thismmercial material advantage is granted and provided lier work [10]. In Section 3 we then expand our previous that the copies are not made or distributed for direct commercial advantage, that copies bear this notice and the full citation on the first page. discussion on the kinds of transactions that are found in Copyrightthe VLDB for copyright components notice andof this the work title of owned the publication by others and than its VL dateD appear,B OLTP systems and the desirable properties of an H-Store Endowmentand notice is must given be thathonored. copying is by permission of the Very Large Data AbstractingBase Endowment. with credit To copyis per otherwise,mitted. To orcopy to republish,otherwise, toto postrepublish, on servers database design. We conclude the paper in Section 4 by toor post to redistribute on servers to or lists, to re requiresdistribute a feeto lists and/or requires special prior permission specific from the describing the demonstration system that we developed to permissionpublisher, ACM. and/or a fee. Request permission to republish from: explore these properties further and observe the execution PublicationsVLDB ‘08, August Dept., 24-30,ACM, 2008,Inc. Fax Auckland, +1 (212)8 New69-0481 Zealand or behavior of our system. Copyright 2008 VLDB Endowment, ACM 000-0-00000-000-0/00/00. [email protected]. PVLDB '08, August 23-28, 2008, Auckland, New Zealand Copyright 2008 VLDB Endowment, ACM 978-1-60558-306-8/08/08 1496 of multiple distributed skeleton plans, one for each query in the procedure. Each plan’s sub-operations are not specific to database’s distributed layout, but are optimized using common techniques found in non-distributed databases. Once this first optimization phase is complete, the compiled procedures are transmitted to each site in the local cluster. At runtime, the query planner on each site converts skeleton plans for the queries into multi-site, distributed query plans. This two-phase optimization approach is similar to strate- gies used in previous systems [3]. Our optimizer uses a pre- liminary cost model based on the number of network com- munications that are needed to process the request. This Figure 1: H-Store system architecture. has proven to be sufficient in our initial tests as transaction throughput is not affected by disk access. 2. SYSTEM OVERVIEW 2.2 Runtime Model The H-Store system is a highly distributed, row-store- At execution time, the client OLTP application requests based relational database that runs on a cluster on shared- a new transaction using the stored procedure handlers gen- nothing, main memory executor nodes. erated during deployment process. If a transaction requires We define a single H-Store instance as a cluster of two parameter values as input, the client must also pass these to or more computational nodes deployed within the same ad- the system along with the request. We assume for now that ministrative domain. A node is a single physical computer all sites are trusted and reside within the same administra- system that hosts one or more sites.A site is the basic oper- tive domain, and thus we trust any network communication ational entity in the system; it is a single-threaded daemon that the system receives. that an external OLTP application connects to in order to Any site in an H-Store cluster is able to execute any OLTP execute a transaction. We assume that the typical H-Store application request, regardless of whether the data needed node has multiple processors and that each site is assigned by a particular transaction is located on that site. Unless all to execute on exactly one processor core on a node. Each of the queries’ parameter values in a transaction are known site is independent from all other sites, and thus does not at execution time, we use a lazy-planning strategy for cre- share any data structures or memory with collocated sites ating optimized distributed execution plans. If all of the running on the same machine. parameter values are known in the beginning, then we per- Every relation in the database is divided into one or more form additional optimizations based on certain transaction partitions. A partition is replicated and hosted on a multiple properties (see Section 3.1). sites, forming a replica set. When a running transaction executes a SQL command, OLTP applications make calls to the H-Store system to it makes an internal request through an H-Store API. At repeatedly execute pre-defined stored procedures. Each pro- this point, all of the parameter values for that query are cedure is identified by a unique name and consists of struc- known by the system, and thus the plan is annotated with tured control code intermixed with parameterized SQL com- the locations of the target sites for each query sub-operation. mands. An instance of a store procedure initiated by an The updated plan is then passed to a transaction manager OLTP application is called a transaction. that is responsible for coordinating access with the other Using these definitions, we now describe the deployment sites. The manager transmits plan fragments to the ap- and execution schemes of the H-Store system (see Figure 1). propriate site over sockets and the intermediate results (if any) are passed back to the initiating site. H-Store uses a 2.1 System Deployment generic distributed transaction framework to ensure serializ- ability.

H-Store: a High-Performance, Distributed Main Memory Transaction Processing System

Rainblock: Faster Transaction Processing in Public Blockchains

Not ACID, Not BASE, but SALT a Transaction Processing Perspective on Blockchains

What Is a Database Management System? What Is a Transaction

Improving High Contention OLTP Performance Via Transaction

A High-Performance, Distributed Main Memory Transaction Processing System

Transaction Processing Monitors

High-Performance Transaction Processing in SAP HANA

A Transaction Processing Method for Distributed Database

Arxiv:1612.05566V1 [Cs.DB] 16 Dec 2016

Concurrency Control and Recovery ACID • Transactions • Recovery Transaction Model Concurency Control Recovery

Jennie Duggan (Née Rogers)

Transactions and Failure Recovery