Handling Highly Contended OLTP Workloads Using Fast Dynamic

Research 6: Transaction Processing and Query Optimization SIGMOD ’20, June 14–19, 2020, Portland, OR, USA Handling Highly Contended OLTP Workloads Using Fast Dynamic Partitioning Guna Prasaad Alvin Cheung Dan Suciu University of Washington University of California, Berkeley University of Washington [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION Research on transaction processing has made significant As modern OLTP servers today ship with an increasing progress towards improving performance of main memory number of cores, users expect transaction processing perfor- multicore OLTP systems under low contention. However, mance to scale accordingly. In practice, however, achieving these systems struggle on workloads with lots of conflicts. scalability is challenging, with the bottleneck often being Partitioned databases (and variants) perform well on high the concurrency control needed to ensure a serializable ex- contention workloads that are statically partitionable, but ecution of the transactions. Consequently, a new class of time-varying workloads often make them impractical. To- high performance concurrency control protocols [8, 18, 19, wards addressing this, we propose StrifeÐa novel transac- 37, 39, 44] have been developed and are shown to scale to tion processing scheme that clusters transactions together even thousands of cores [39, 43]. dynamically and executes most of them without any con- Yet, the performance of these protocols degrades rapidly currency control. Strife executes transactions in batches, on workloads with high contention [1, 43]. One promising where each batch is partitioned into disjoint clusters with- approach to execute workloads with high contention is to out any cross-cluster conflicts and a small set of residuals. partition the data, and have each core execute serially the The clusters are then executed in parallel with no concur- transactions that access data in one partition [1, 15, 24, 29]. rency control, followed by residuals separately executed with The intuition is that, if many transactions access a common concurrency control. Strife uses a fast dynamic clustering al- data item, then they should be executed serially by the core gorithm that exploits a combination of random sampling and hosting that data item without any need for synchroniza- concurrent union-find data structure to partition the batch tion. In these systems, the database is partitioned statically online, before executing it. Strife outperforms lock-based using either user input, application semantics, or workload- and optimistic protocols by up to 2× on high contention driven techniques [6, 25, 26, 28]. Once the data is partitioned, workloads. While Strife incurs about 50% overhead relative the transactions are routed to the appropriate core where to partitioned systems in the statically partitionable case, they are executed serially. For workloads where transactions it performs 2× better when such static partitioning is not tend to access data from a single partition [34], static data possible and adapts to dynamically varying workloads. partitioning systems are known to scale up very well [15]. However, cross-partition transactions (i.e., those that ac- ACM Reference Format: Guna Prasaad, Alvin Cheung, Dan Suciu. 2020. Handling Highly cess data from two or more partitions) do require synchro- Contended OLTP Workloads Using Fast Dynamic Partitioning. In nization among cores and can significantly degrade the per- Proceedings of the 2020 ACM SIGMOD International Conference on formance of the system. Unfortunately, in some applications, Management of Data (SIGMOD’20), June 14ś19, 2020, Portland, OR, a large fraction of the transactions are cross-partition as no USA. ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/ good static partitioning exists, or the hot data items change 3318464.3389764 dynamically over time. As an extreme example, in the 4Chan social network, a post lives for only about 20-30 minutes, Permission to make digital or hard copies of all or part of this work for and the popularity can vary within seconds [2]. The various personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear static partitioning schemes differ in how they synchronize this notice and the full citation on the first page. Copyrights for components cross-partition transactions: H-Store [13] acquires partition- of this work owned by others than ACM must be honored. Abstracting with level locks, Dora [1, 24] migrates transactions between cores, credit is permitted. To copy otherwise, or republish, to post on servers or to while Orthrus [29] delegates concurrency control to a ded- redistribute to lists, requires prior specific permission and/or a fee. Request icated set of threads. If the database is statically partition- permissions from [email protected]. able, then these approaches can mitigate a small fraction of SIGMOD’20, June 14ś19, 2020, Portland, OR, USA © 2020 Association for Computing Machinery. cross-partitioned transactions, however, their performance ACM ISBN 978-1-4503-6735-6/20/06...$15.00 degrades rapidly when this fraction increases, or when the https://doi.org/10.1145/3318464.3389764 workload is not statically partitionable as discussed in [1]. 527 Research 6: Transaction Processing and Query Optimization SIGMOD ’20, June 14–19, 2020, Portland, OR, USA In this paper, we propose Strife, a new batch-based OLTP workload is statically partitionable and there are very few system that partitions transactions dynamically. Strife groups cross-partition transactions, Strife achieves about half of transactions into batches, partitions them based on the data the throughput of a statically partitioned system (due to they access, then executes transactions in each partition seri- clustering overhead). However, when the workload is not ally on an exclusive core. Our key insight is to recognize that statically partitioned, or there are many cross-partition trans- most OLTP workloads, even those that are not amenable to actions, then it can improve throughput by 2× as compared static partitioning, can be łalmostž perfectly partitioned at a to other systems. We conclude by reporting a number of batch granularity. More precisely, Strife divides the batch micro-benchmarks to evaluate the quality of the clustering into two parts. The first consists of transactions that can be algorithm, and the impact of its parameters. grouped into disjoint clusters, such that no conflicts exist While many automated static partitioning techniques exist between transactions in different clusters, and the second in the literature (surveyed in Sec. 6), to the best of our knowl- consists of residual transactions, which include the remain- edge the only system that considers dynamic partitioning ing ones. Strife executes the batch in two phases: in the first is LADS [42]. Like Strife, LADS uses dynamic graph-based phase it executes each cluster on a dedicated core, without partitioning for a batch of transactions. Unlike Strife, the any synchronization, and in the second phase it executes the partitioning algorithm in LADS depends on range-based par- residual transactions using some fine-grained serializable titioning. In addition, it does not eliminate the need for con- concurrency protocol. currency control, instead it breaks a transaction into record As a result, Strife does not require a static partition actions, and migrates different pieces to different cores, to of the database apriori; it discovers the partitions dynam- reduce the number of locks required. This creates new chal- ically at runtime. If the workload is statically partitionable, lenges, such as cascading aborts, and is unclear how much Strife will rediscover these partitions, setting aside the overhead it introduces. In contrast, Strife is not limited cross-partition transactions as residuals. When the work- to range partitions, is free from cascading aborts, and com- load is not statically partitionable, Strife will discover a pletely eliminates concurrency control when possible. good way to partition the workload at runtime, based on hot In summary, we make the following contributions: data items. It easily adapts to time-varying workloads, since • We describe Strife, an OLTP system that relies on dy- it computes the optimal partition for every batch. namic clustering of transactions, aiming to eliminate con- The major challenge in Strife is the clustering algorithm, currency control for most transactions in a batch. (Sec. 2). and this forms the main technical contribution of this paper. • We describe a novel clustering algorithm that produces Optimal graph clustering is NP-hard in general, which rules high quality clusters using only three scans over the set out an exact solution. Several polynomial time approxima- of transactions in the batch. (Sec. 3). tion algorithms exist [9, 16, 21, 23], but their performance is • We then present a concurrent implementation of the union- still too low for our needs. To appreciate the challenge, con- find data structure that enables parallelizing the clustering sider a statically partitioned system that happens to execute of transactions. (Sec. 4). a perfectly partitionable balanced workload. Its runtime is linear in the total number of data accesses and scales with the • We conduct an extensive experimental evaluation, com- number of cores. To be competitive, the clustering algorithm paring our system

Handling Highly Contended OLTP Workloads Using Fast Dynamic

Improving High Contention OLTP Performance Via Transaction

A High-Performance, Distributed Main Memory Transaction Processing System

Arxiv:1612.05566V1 [Cs.DB] 16 Dec 2016

Jennie Duggan (Née Rogers)

Scheduling OLTP Transactions Via Learned Abort Prediction

H-Store: a High-Performance, Distributed Main Memory Transaction Processing System

Chiller: Contention-Centric Transaction Execution and Data Partitioning For

Cross-Engine Query Execution in Federated Database Systems Ankush M. Gupta

Uses Data Partitioning to Distribute Execution » on Shared

P-Store: an Elastic Database System with Predictive Provisioning

Faster: a Concurrent Key-Value Store with In-Place Updates

A Queue-Oriented Transaction Processing Paradigm