Low Overhead Concurrency Control for Partitioned Main Memory Databases

Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan P. C. Jones Daniel J. Abadi Samuel Madden MIT CSAIL Yale University MIT CSAIL Cambridge, MA, USA New Haven, CT, USA Cambridge, MA, USA [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Database partitioning is a technique for improving the per- Databases rely on concurrency control to provide the illu- formance of distributed OLTP databases, since “single par- sion of sequential execution of transactions, while actually tition” transactions that access data on one partition do not executing multiple transactions simultaneously. However, need coordination with other partitions. For workloads that several research papers suggest that for some specialized are amenable to partitioning, some argue that transactions databases, concurrency control may not be necessary [11, 12, should be executed serially on each partition without any 27, 26]. In particular, if the data fits in main memory and concurrency at all. This strategy makes sense for a main the workload consists of transactions that can be executed memory database where there are no disk or user stalls, without user stalls, then there is no need to execute trans- since the CPU can be fully utilized and the overhead of tra- actions concurrently to fully utilize a single CPU. Instead, ditional concurrency control, such as two-phase locking, can each transaction can be completely executed before servicing be avoided. Unfortunately, many OLTP applications have the next. Previous work studying the Shore database sys- some transactions which access multiple partitions. This in- tem [14] measured the overhead imposed by locks, latches, troduces network stalls in order to coordinate distributed and undo buffer maintenance required by multi-threaded transactions, which will limit the performance of a database concurrency control to be 42% of the CPU instructions ex- that does not allow concurrency. ecuted on part of the TPC-C benchmark [1]. This suggests In this paper, we compare two low overhead concurrency that removing concurrency control could lead to a significant control schemes that allow partitions to work on other trans- performance improvement. actions during network stalls, yet have little cost in the Databases also use concurrency to utilize multiple CPUs, common case when concurrency is not needed. The first by assigning transactions to different threads. However, re- is a light-weight locking scheme, and the second is an even cent work by Pandis et al. [20] shows that this approach lighter-weight type of speculative concurrency control that does not scale to large numbers of processing cores; instead avoids the overhead of tracking reads and writes, but some- they propose a data-oriented approach, where each thread times performs work that eventually must be undone. We “owns” a partition of the data and transactions are passed to quantify the range of workloads over which each technique different threads depending on what data they access. Simi- is beneficial, showing that speculative concurrency control larly, H-Store [26] also has one thread per data partition. In generally outperforms locking as long as there are few aborts these systems, there is only one thread that can access a data or few distributed transactions that involve multiple rounds item, and traditional concurrency control is not necessary. of communication. On a modified TPC-C benchmark, spec- Data partitioning is also used for shared-nothing systems. ulative concurrency control can improve throughput relative Data is divided across n database servers, and transactions to the other schemes by up to a factor of two. are routed to the partitions that contain the data they need to access. This approach is often used to improve database performance. Some applications are “perfectly partition- Categories and Subject Descriptors able,” such that every transaction can be executed in its H.2.4 [Database Management]: Systems entirety at a single partition. In such a case, if the data is stored in main memory, each transaction can run without concurrency control, running to completion at the corre- General Terms sponding partition. However, many applications have some Experimentation, Performance transactions that span multiple partitions. For these transactions, some form of concurrency control is needed. since Network stalls are necessary to coordinate the execution of these multi-partition transactions, and each processor must Permission to make digital or hard copies of all or part of this work for wait if there is no concurrency control. personal or classroom use is granted without fee provided that copies are This paper focuses on these “imperfectly partitionable” not made or distributed for profit or commercial advantage and that copies applications. For these workloads, there are some multi- bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific partition transactions that impose network stalls, and some permission and/or a fee. single-partition transactions that can run to completion with- SIGMOD’10, June 6–11, 2010, Indianapolis, Indiana, USA. out disk, user, or network stalls. The goal is to use a concur- Copyright 2010 ACM 978-1-4503-0032-2/10/06 ...$10.00. rency control scheme that allows a processor to do something during normal operation. This eliminates disk stalls, which useful during a network stall, while not significantly hurting further reduces the need for concurrent transactions. the performance of single-partition transactions. Traditional databases rely on disk to provide durability. We study two such schemes. The first is a speculative However, mission critical OLTP applications need high avail- approach which works as follows: when a multi-partition ability which means they use replicated systems. H-Store transaction t has completed on one partition, but is wait- takes advantage of replication for durability, as well as high ing for other participating partitions to complete, additional availability. A transaction is committed when it has been speculative transactions are executed. However, they are not received by k > 1 replicas, where k is a tuning parameter. committed until t commits. Speculation does not hold locks or keep track of read/write sets—instead, it assumes that a 2.3 Partitioning speculative transaction conflicts with any transaction t with Without client and disk stalls, H-Store simply executes which it ran concurrently. For this reason, if t aborts, any transactions from beginning to completion in a single thread. speculative transactions must be re-executed. To take advantage of multiple physical machines and mul- The second scheme is based on two-phase locking. When tiple CPUs, the data must be divided into separate parti- there are no active multi-partition transactions, single par- tions. Each partition executes transactions independently. tition transactions are efficiently executed without acquir- The challenge becomes dividing the application’s data so ing locks. However, any active multi-partition transactions that each transaction only accesses one partition. For many cause all transactions to lock and unlock data upon access, OLTP applications, partitioning the application manually as with standard two-phase locking. is straightforward. For example, the TPC-C OLTP bench- We compare the strengths and limitations of these two mark can be partitioned by warehouse so an average of 89% concurrency control schemes for main-memory partitioned of the transactions access a single partition [26]. There is databases, as well as a simple blocking scheme, where only evidence that developers already do this to scale their ap- one transaction runs at a time. Our results show that the plications [22, 23, 25], and academic research provides some simple blocking technique only works well if there are very approaches for automatically selecting a good partitioning few multi-partition transactions. Speculative execution per- key [4, 21, 28]. However, unless the partitioning scheme is forms best for workloads composed of single partition trans- 100% effective in making all transactions only access a sin- actions and multi-partition transactions that perform one gle partition, then coordination across multiple partitions unit of work at each participant. In particular, our experi- for multi-partition transactions cause network stalls and ex- mental results show that speculation can double throughput ecuting a transaction to completion without stalls is not pos- on a modified TPC-C workload. However, for abort-heavy sible. In this paper, we focus on what the system should do workloads, speculation performs poorly, because it cascades in this case. aborts of concurrent transactions. Locking performs best on workloads where there are many multi-partition transac- 3. EXECUTING TRANSACTIONS tions, especially if participants must perform multiple rounds In this section, we describe how our prototype executes of work, with network communication between them. How- transactions. We begin by describing the components of our ever, locking performs worse with increasing data conflicts, system and the execution model. We then discuss how single and especially suffers from distributed deadlock. partition and multi-partition transactions are executed. 2. ASSUMPTIONS ON SYSTEM DESIGN 3.1 System Components Our concurrency control schemes are designed for a par- The system is composed of three types

Low Overhead Concurrency Control for Partitioned Main Memory Databases

Plantuml Language Reference Guide (Version 1.2021.2)

BEA Weblogic Platform Security Guide

Plantuml Language Reference Guide

Hive Where Clause Example

Asymmetric-Partition Replication for Highly Scalable Distributed Transaction Processing in Practice

Partitionable Blockchain

Alter Table Add Partition Oracle

How Voltdb Does Transactions

The Complete IMS HALDB Guide All You Need to Know to Manage Haldbs

Designing and Architecting for Internet Scale: Sharding

Database DB2 Multisystem

Oracle9i Has Several Types of Tablespaces