Corobase: Coroutine-Oriented Main-Memory Database Engine

CoroBase: Coroutine-Oriented Main-Memory Database Engine Yongjun He Jiacheng Lu Tianzheng Wang Simon Fraser University Simon Fraser University Simon Fraser University [email protected] [email protected] [email protected] ABSTRACT transaction: transaction: transaction: v1 = get(k1) v1, v2 = v1 = get(k1) Data stalls are a major overhead in main-memory database engines v2 = get(k2) multi_get(k1, k2) v2 = get(k2) due to the use of pointer-rich data structures. Lightweight corou- if v1 == 0: if v1 == 0: if v1 == 0: tines ease the implementation of software prefetching to hide data put(k3, 10) put(k3, 10) put(k3, 10) stalls by overlapping computation and asynchronous data prefetch- T1 T1 T1 T2 . T1 T1 T1 T2 T2 T1 T1 T1 . ing. Prior solutions, however, mainly focused on (1) individual . Time T1 T1 T2 T2 T2 T2 components and operations and (2) intra-transaction batching that (a) Sequential (b) Multi-get (c) CoroBase requires interface changes, breaking backward compatibility. It was not clear how they apply to a full database engine and how much Figure 1: Data access interfaces and execution under (a) se- end-to-end beneft they bring under various workloads. quential execution (no interleaving), (b) prior approaches This paper presents CoroBase, a main-memory database engine that require multi-key interfaces, (c) CoroBase which hides that tackles these challenges with a new coroutine-to-transaction par- data stalls and maintains backward compatibility. adigm. Coroutine-to-transaction models transactions as coroutines and thus enables inter-transaction batching, avoiding application changes but retaining the benefts of prefetching. We show that bring data from memory to CPU caches. This gave rise to software × on a 48-core server, CoroBase can perform close to 2 better for prefetching techniques [5, 21, 26, 34, 39, 44, 45] that hide memory read-intensive workloads and remain competitive for workloads access latency by overlapping data fetching and computation, allevi- that inherently do not beneft from software prefetching. ating pointer chasing overhead. Most of these techniques, however, require hand-crafting asynchronous/pipelined algorithms or state PVLDB Reference Format: machines to be able to suspend/resume execution as needed. This is Yongjun He, Jiacheng Lu, and Tianzheng Wang. CoroBase: Coroutine-Oriented Main-Memory Database Engine. PVLDB, 14(3): a difcult and error-prone process; the resulted code often deviates 431-444, 2021. a lot from the original code, making it hard to maintain [21]. doi:10.14778/3430915.3430932 1.1 Software Prefetching via Coroutines PVLDB Artifact Availability: With the recent standardization in C++20 [19], coroutines greatly The source code, data, and/or other artifacts have been made available at ease the implementation of software prefetching. Coroutines [38] htps://github.com/sfu-dis/corobase/tree/v1.0. are functions that can suspend voluntarily and be resumed later. Functions that involve pointer chasing can be written as coroutines 1 INTRODUCTION which are executed (interleaved) in batches. Before dereferencing Modern main-memory database engines [11, 22, 24, 25, 30, 33, 56, a pointer in coroutine �1, the thread issues a prefetch followed 59] use memory-optimized data structures [2, 29, 31, 36] to ofer by a suspend to pause �1 and switches to another coroutine �2, high performance on multicore CPUs. Many such data structures overlapping data fetching in �1 and computation in �2. rely on pointer chasing [34] which can stall the CPU upon cache Compared to earlier approaches [5, 26], coroutines only require misses. For example, in Figure 1(a), to execute two SELECT (get) prefetch/suspend be inserted into sequential code, greatly simpli- queries, the engine may traverse a tree, and if a needed tree node fying implementation while delivering high performance, as the is not cache-resident, dereferencing a pointer to it stalls the CPU switching overhead can be cheaper than a last-level cache miss [21]. (dotted box in the fgure) to fetch the node from memory. Compu- However, adopting software prefetching remains challenging. tation (solid box) would not resume until data is in the cache. With First, existing approaches typically use intra-transaction batch- the wide speed gap between CPU and memory, memory accesses ing which mandates multi-key interfaces that can break backward have become a major overhead [4, 35]. The emergence of capacious compatibility. For example, in Figure 1(b) an application1 uses but slower persistent memory [10] is further widening this gap. multi_get to retrieve a batch of records at once in a transac- Modern processors allow multiple outstanding cache misses tion. Cache misses caused by probing k1 (k2) in a tree are hid- and provide prefetch instructions [18] for software to explicitly den behind the computation part of probing k2 (k1). While intra- transaction batching is a natural ft for some operators (e.g., IN- This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit htps://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of predicate queries [44, 45]), it is not always directly applicable. this license. For any use beyond those covered by this license, obtain permission by Changing the application is not always feasible and may not achieve emailing [email protected]. Copyright is held by the owner/author(s). Publication rights the desired improvement as depending requests need to be issued in licensed to the VLDB Endowment. Proceedings of the VLDB Endowment, Vol. 14, No. 3 ISSN 2150-8097. doi:10.14778/3430915.3430932 1The “application” may be another database system component or an end-user application that uses the record access interfaces provided by the database engine. 431 separate batches, limiting interleaving opportunities. Short (or even necessitating new interfaces, and (2) understand its end-to-end single-record) transactions also cannot beneft much due to the lack benefts. Hand-crafted techniques usually present the performance of interleaving opportunity. It would be desirable to allow batching upper bound; CoroBase strikes a balance between performance, operations across transactions, i.e., inter-transaction batching. programmability and backward compatibility. Second, prior work provided only piece-wise solutions, focusing on optimizing individual database operations (e.g., index traver- 1.3 Contributions and Paper Organization sal [21] and hash join [5, 44]). Despite the signifcant improvement We make four contributions. 1 We highlight the challenges for × (e.g., up to 3 faster for tree probing [21]), it was not clear how much adopting software prefetching in main-memory database engines. overall improvement one can expect when these techniques are 2 We propose a new execution model, coroutine-to-transaction, applied in a full database engine that involves various components. to enable inter-transaction batching and avoid interface changes Overall, these issues lead to two key questions: while retaining the benefts of prefetching. 3 We build CoroBase, a • How should a database engine adopt coroutine-based software main-memory multi-version database engine that uses coroutine-to- prefetching, preferably without requiring application changes? transaction to hide data stalls during index and version chain traver- • How much end-to-end beneft can software prefetching bring to sals. We explore the design tradeofs by describing our experience a database engine under realistic workloads? of transforming an existing engine to use coroutine-to-transaction. 4 We conduct a comprehensive evaluation of CoroBase to quan- 1.2 CoroBase tify the end-to-end efect of prefetching under various workloads. CoroBase is open-source at htps://github.com/sfu-dis/corobase. To answer these questions, we propose and evaluate CoroBase, a Next, we give the necessary background in Section 2. Sections 3– multi-version, main-memory database engine that uses coroutines 4 then present the design principles and details of CoroBase. Sec- to hide data stalls. The crux of CoroBase is a simple but efec- tion 5 quantifes the end-to-end benefts of software prefetching. tive coroutine-to-transaction paradigm that models transactions as We cover related work in Section 6 and conclude in Section 7. coroutines, to enable inter-transaction batching and maintain backward compatibility. Worker threads receive transaction requests 2 BACKGROUND and switch among transactions (rather than requests within a transaction) without requiring intra-transaction batching or multi-key This section gives the necessary background on software prefetch- interfaces. As Figure 1(c) shows, the application remains unchanged ing techniques and coroutines to motivate our work. as batching and interleaving happen at the transaction level. Coroutine-to-transaction can be easily adopted to hide data stalls 2.1 Software Prefetching in diferent database engine components and can even work to- Although modern CPUs use sophisticated hardware prefetching gether with multi-key based approaches. In particular, in multi- mechanisms, they are not efective on reducing pointer-chasing version systems versions of data records are typically chained using overheads, due to the irregular access patterns in pointer-intensive linked lists [62], traversing which constitutes another main source data structures. For instance, when traversing a tree, it is difcult of data stalls, in addition to index traversals. CoroBase transpar- for hardware to predict and prefetch correctly the node which is ently

Corobase: Coroutine-Oriented Main-Memory Database Engine

The Early History of Smalltalk

Object-Oriented Programming in the Beta Programming Language

CPS323 Lecture: Concurrency Materials: 1. Coroutine Projectables

A Low-Cost Implementation of Coroutines for C

Kotlin Coroutines Deep Dive Into Bytecode #Whoami

Csc 520 Principles of Programming Languages Coroutines Coroutines

Practical Perl Tools Parallel Asynchronicity, Part 2

Coroutines and Generators Chapter Three

Deriving Type Systems and Implementations for Coroutines

CS164: Introduction to Programming Languages and Compilers, Spring 2012 Thibaud Hottelier UC Berkeley

Specialising Dynamic Techniques for Implementing the Ruby Programming Language

Language Design and Implementation Using Ruby and the Interpreter Pattern