Scheduling OLTP Transactions Via Learned Abort Prediction

Scheduling OLTP Transactions via Learned Abort Prediction Yangjun Sheng Anthony Tomasic Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA Pittsburgh, PA [email protected] [email protected] Tieying Zhang Andrew Pavlo Alibaba Database Group Carnegie Mellon University [email protected] Pittsburgh, PA ABSTRACT CCS CONCEPTS Current main memory database system architectures are still • Information systems → Database transaction process- challenged by high contention workloads and this challenge ing; • Computing methodologies → Machine learning; will continue to grow as the number of cores in processors continues to increase [23]. These systems schedule transac- KEYWORDS tions randomly across cores to maximize concurrency and machine learning scheduling to produce a uniform load across cores. Scheduling never ACM Reference Format: considers potential conflicts. Performance could be improved Yangjun Sheng, Anthony Tomasic, Tieying Zhang, and Andrew if scheduling balanced between concurrency to maximize Pavlo. 2019. Scheduling OLTP Transactions via Learned Abort throughput and scheduling transactions linearly to avoid Prediction. In International Workshop on Exploiting Artificial In- conflicts. In this paper, we present the design of several intel- telligence Techniques for Data Management (aiDM’19), July 5, 2019, ligent transaction scheduling algorithms that consider both Amsterdam, Netherlands. ACM, New York, NY, USA, 8 pages. https: potential transaction conflicts and concurrency. To incor- //doi.org/10.1145/3329859.3329871 porate reasoning about transaction conflicts, we develop a supervised machine learning model that estimates the prob- 1 INTRODUCTION ability of conflict. This model is incorporated into several Transaction aborts are one of the main sources of perfor- scheduling algorithms. In addition, we integrate an unsuper- mance loss in main memory online transaction processing vised machine learning algorithm into an intelligent schedul- (OLTP) database management systems (DBMS) [23]. Current ing algorithm. We then empirically measure the performance architectures for OLTP DBMS use random scheduling to as- impact of different scheduling algorithms on OLTP and social sign transactions to threads. Random scheduling achieves networking workloads. Our results show that, with appropri- uniform load across CPU cores and keeps all cores occupied. ate settings, intelligent scheduling can increase throughput For workloads with a high abort rate, a large portion of work by 54% and reduce abort rate by 80% on a 20-core machine, done by CPU is wasted. In contrast, the abort rate drops to relative to random scheduling. In summary, the paper pro- zero if all transactions are scheduled sequentially into one vides preliminary evidence that intelligent scheduling sig- thread. No work is wasted through aborts, but the through- nificantly improves DBMS performance. put drops to the performance of a single hardware thread. Research has shown that statistical scheduling of transactions using a history can achieve low abort rate and high Permission to make digital or hard copies of all or part of this work for throughput [25] for partitionable workloads. We propose personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear a more systematic machine learning approach to schedule this notice and the full citation on the first page. Copyrights for components transactions that achieves low abort rate and high through- of this work owned by others than ACM must be honored. Abstracting with put for both partitionable and non-partitionable workloads. credit is permitted. To copy otherwise, or republish, to post on servers or to The fundamental intuitions of this paper are that (i) the redistribute to lists, requires prior specific permission and/or a fee. Request probability that a transactions will abort can be accurately permissions from [email protected]. aiDM’19, July 5, 2019, Amsterdam, Netherlands modeled through machine learning, and (ii) given that an © 2019 Association for Computing Machinery. abort is predicted, scheduling can avoid the conflict with- ACM ISBN 978-1-4503-6802-5/19/07...$15.00 out loss of response time or throughput. Several research https://doi.org/10.1145/3329859.3329871 works [1, 12, 16, 17] perform exact analyses of aborts of aiDM’19, July 5, 2019, Amsterdam, Netherlands Y. Sheng, et al. transaction statements that are complex and not easily gen- eralizable over different workloads or DBMS architectures. Our approach uses machine learning algorithms to model the probability of aborts. Our approach is to (i) build a machine learning model that helps to group transactions that are likely to abort with each other, (ii) assign each group of transactions to a first-in-first-out (FIFO) queue, and (iii) monitor concurrency across cores and adjust for imbalances. 2 SYSTEM ARCHITECTURE OVERVIEW The overall environment (Figure 1) consists of three layers – the incoming stream of transaction requests, our scheduler, and a main-memory database system. The API between the system and the database is simple, consisting of three func- tions: (i) the ability to capture incoming transactions, (ii) the ability to queue a transaction in a specific run queue of the Figure 1: Functional Architecture – The Transaction database, (iii) the ability to log the SQL of two events that component represents an incoming transaction. The occur during transaction processing: a transaction abort and Process Transaction component converts the trans- a transaction commit, and (iv) the ability to report real-time action into its feature representation. The Scheduler transaction response times. This environment and API im- component chooses the queue to place the incom- plies our approach can be “bolted on” to any DBMS with ing transaction. The Queues component represents little effort. Our system does however require that for a(non the set of run queues. The DB component is a main- user) aborted transaction, the system can identify at least memory database system. The Log component collects one transaction that ’caused’ the abort. Typically the causing transaction execution results for model training pur- transaction holds a lock on a row or modified the row during poses. The Process Log component converts the log the aborted transaction’s execution. into its feature representation for training. The machine learning component trains the processed log to 3 SUPERVISED MACHINE LEARNING produce the Model. The dotted lines separate conven- tional components from our scheduler. The DB com- Our approach is to learn a classifier that classifies two trans- ponent also provides real-time transaction response actions, T and T , into Abort if they will abort each other A B time information to the scheduler. when run concurrently (with high probability) or Commit if they do not conflict. For example, in TPC-C, a transaction T might want to 3.1 Training Data read rows where warehouse id is equal to 10. Then T has Each transaction indirectly describes information about which a reference W_ID=10 and the string W_ID=10 is considered tuples, attributes and tables it will read and write in the data- as a feature of T . More precisely, any instance of attribute base. To access this information, some research uses the operator value in a WHERE clause of a SELECT, DELETE read/write set of a transaction, but this set of information is or UPDATE statement is a feature. All values of an INSERT large and dynamically changing. Instead, the string repre- are also features. We do not distinguish between reads and sentation of the SQL statements of a transaction are coded writes. IfT both read and write rows W_ID=10, it has only one into a feature space. The SQL statements are a lightweight such string. The function Features¹strinдº ! F maps from feature representation of the transaction. a SQL string to a set of features F. Note that any particular transaction produces only a few features. 3.1.1 Features. As described above, the primary goal of the To produce a compact representation with a fixed size model is to check if two transactions will conflict with each independent of the size of the transaction, we apply feature other. Therefore, the features should be derived from both hashing, a fast and space-efficient way of converting an ar- transactions, T and T . In particular, we want to define a 1 2 bitrary number of features into indices in a fix-size vector. function The function Hash¹T º constructs the union of the set of the ¹ º ! trans T1;T2 vector features F of the statements of the transaction and generates to transform two transactions into a feature vector and train a binary vector of fixed length k. Given two transactions T1 our model to classify this vector into Abort or Commit. and T2, we now have two vectors Hash¹Features¹T1ºº = V1 Scheduling OLTP Transactions via Learned Abort Prediction aiDM’19, July 5, 2019, Amsterdam, Netherlands and Hash¹Features¹T2ºº = V2. Let V3 = V1&V2 be the binary of our subsequent scheduling decisions. Concretely, if T1 is logical AND of V1 and V2. The final feature vector is the aborted by T2, then we log the pair ¹trans¹T1;T2º; 1º into our concatenation of these three vectors, log where 1 indicates an abort. To obtain vectors in the commit set, we used the following approach: when T1 commits, trans¹T1;T2º = V1jV2jV3 we randomly pick a transaction, T2, that is running currently The vector V3 encodes tuples, columns or tables that will with T1 and add ¹trans¹T1;T 2º; 0º into our log. The log serves potentially be touched by both transactions T1 and T2. In our directly as the training and test set for choosing a learning experiments our feature vectors are 1k bits in length. algorithm. 3.1.2 Canonical Features. Attribute names that appear in Note that only one conflicting transaction is chosen to a schema are arbitrary, independent of the underlying do- record in the log.

Load more