
CoroBase: Coroutine-Oriented Main-Memory Database Engine Yongjun He Jiacheng Lu Tianzheng Wang Simon Fraser University Simon Fraser University Simon Fraser University [email protected] [email protected] [email protected] ABSTRACT transaction: transaction: transaction: v1 = get(k1) v1, v2 = v1 = get(k1) Data stalls are a major overhead in main-memory database engines v2 = get(k2) multi_get(k1, k2) v2 = get(k2) due to the use of pointer-rich data structures. Lightweight corou- if v1 == 0: if v1 == 0: if v1 == 0: tines ease the implementation of software prefetching to hide data put(k3, 10) put(k3, 10) put(k3, 10) stalls by overlapping computation and asynchronous data prefetch- T1 T1 T1 T2 . T1 T1 T1 T2 T2 T1 T1 T1 . ing. Prior solutions, however, mainly focused on (1) individual . Time T1 T1 T2 T2 T2 T2 components and operations and (2) intra-transaction batching that (a) Sequential (b) Multi-get (c) CoroBase requires interface changes, breaking backward compatibility. It was not clear how they apply to a full database engine and how much Figure 1: Data access interfaces and execution under (a) se- end-to-end beneft they bring under various workloads. quential execution (no interleaving), (b) prior approaches This paper presents CoroBase, a main-memory database engine that require multi-key interfaces, (c) CoroBase which hides that tackles these challenges with a new coroutine-to-transaction par- data stalls and maintains backward compatibility. adigm. Coroutine-to-transaction models transactions as coroutines and thus enables inter-transaction batching, avoiding application changes but retaining the benefts of prefetching. We show that bring data from memory to CPU caches. This gave rise to software × on a 48-core server, CoroBase can perform close to 2 better for prefetching techniques [5, 21, 26, 34, 39, 44, 45] that hide memory read-intensive workloads and remain competitive for workloads access latency by overlapping data fetching and computation, allevi- that inherently do not beneft from software prefetching. ating pointer chasing overhead. Most of these techniques, however, require hand-crafting asynchronous/pipelined algorithms or state PVLDB Reference Format: machines to be able to suspend/resume execution as needed. This is Yongjun He, Jiacheng Lu, and Tianzheng Wang. CoroBase: Coroutine-Oriented Main-Memory Database Engine. PVLDB, 14(3): a difcult and error-prone process; the resulted code often deviates 431-444, 2021. a lot from the original code, making it hard to maintain [21]. doi:10.14778/3430915.3430932 1.1 Software Prefetching via Coroutines PVLDB Artifact Availability: With the recent standardization in C++20 [19], coroutines greatly The source code, data, and/or other artifacts have been made available at ease the implementation of software prefetching. Coroutines [38] htps://github.com/sfu-dis/corobase/tree/v1.0. are functions that can suspend voluntarily and be resumed later. Functions that involve pointer chasing can be written as coroutines 1 INTRODUCTION which are executed (interleaved) in batches. Before dereferencing Modern main-memory database engines [11, 22, 24, 25, 30, 33, 56, a pointer in coroutine �1, the thread issues a prefetch followed 59] use memory-optimized data structures [2, 29, 31, 36] to ofer by a suspend to pause �1 and switches to another coroutine �2, high performance on multicore CPUs. Many such data structures overlapping data fetching in �1 and computation in �2. rely on pointer chasing [34] which can stall the CPU upon cache Compared to earlier approaches [5, 26], coroutines only require misses. For example, in Figure 1(a), to execute two SELECT (get) prefetch/suspend be inserted into sequential code, greatly simpli- queries, the engine may traverse a tree, and if a needed tree node fying implementation while delivering high performance, as the is not cache-resident, dereferencing a pointer to it stalls the CPU switching overhead can be cheaper than a last-level cache miss [21]. (dotted box in the fgure) to fetch the node from memory. Compu- However, adopting software prefetching remains challenging. tation (solid box) would not resume until data is in the cache. With First, existing approaches typically use intra-transaction batch- the wide speed gap between CPU and memory, memory accesses ing which mandates multi-key interfaces that can break backward have become a major overhead [4, 35]. The emergence of capacious compatibility. For example, in Figure 1(b) an application1 uses but slower persistent memory [10] is further widening this gap. multi_get to retrieve a batch of records at once in a transac- Modern processors allow multiple outstanding cache misses tion. Cache misses caused by probing k1 (k2) in a tree are hid- and provide prefetch instructions [18] for software to explicitly den behind the computation part of probing k2 (k1). While intra- transaction batching is a natural ft for some operators (e.g., IN- This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit htps://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of predicate queries [44, 45]), it is not always directly applicable. this license. For any use beyond those covered by this license, obtain permission by Changing the application is not always feasible and may not achieve emailing [email protected]. Copyright is held by the owner/author(s). Publication rights the desired improvement as depending requests need to be issued in licensed to the VLDB Endowment. Proceedings of the VLDB Endowment, Vol. 14, No. 3 ISSN 2150-8097. doi:10.14778/3430915.3430932 1The “application” may be another database system component or an end-user appli- cation that uses the record access interfaces provided by the database engine. 431 separate batches, limiting interleaving opportunities. Short (or even necessitating new interfaces, and (2) understand its end-to-end single-record) transactions also cannot beneft much due to the lack benefts. Hand-crafted techniques usually present the performance of interleaving opportunity. It would be desirable to allow batching upper bound; CoroBase strikes a balance between performance, operations across transactions, i.e., inter-transaction batching. programmability and backward compatibility. Second, prior work provided only piece-wise solutions, focusing on optimizing individual database operations (e.g., index traver- 1.3 Contributions and Paper Organization sal [21] and hash join [5, 44]). Despite the signifcant improvement We make four contributions. 1 We highlight the challenges for × (e.g., up to 3 faster for tree probing [21]), it was not clear how much adopting software prefetching in main-memory database engines. overall improvement one can expect when these techniques are 2 We propose a new execution model, coroutine-to-transaction, applied in a full database engine that involves various components. to enable inter-transaction batching and avoid interface changes Overall, these issues lead to two key questions: while retaining the benefts of prefetching. 3 We build CoroBase, a • How should a database engine adopt coroutine-based software main-memory multi-version database engine that uses coroutine-to- prefetching, preferably without requiring application changes? transaction to hide data stalls during index and version chain traver- • How much end-to-end beneft can software prefetching bring to sals. We explore the design tradeofs by describing our experience a database engine under realistic workloads? of transforming an existing engine to use coroutine-to-transaction. 4 We conduct a comprehensive evaluation of CoroBase to quan- 1.2 CoroBase tify the end-to-end efect of prefetching under various workloads. CoroBase is open-source at htps://github.com/sfu-dis/corobase. To answer these questions, we propose and evaluate CoroBase, a Next, we give the necessary background in Section 2. Sections 3– multi-version, main-memory database engine that uses coroutines 4 then present the design principles and details of CoroBase. Sec- to hide data stalls. The crux of CoroBase is a simple but efec- tion 5 quantifes the end-to-end benefts of software prefetching. tive coroutine-to-transaction paradigm that models transactions as We cover related work in Section 6 and conclude in Section 7. coroutines, to enable inter-transaction batching and maintain back- ward compatibility. Worker threads receive transaction requests 2 BACKGROUND and switch among transactions (rather than requests within a trans- action) without requiring intra-transaction batching or multi-key This section gives the necessary background on software prefetch- interfaces. As Figure 1(c) shows, the application remains unchanged ing techniques and coroutines to motivate our work. as batching and interleaving happen at the transaction level. Coroutine-to-transaction can be easily adopted to hide data stalls 2.1 Software Prefetching in diferent database engine components and can even work to- Although modern CPUs use sophisticated hardware prefetching gether with multi-key based approaches. In particular, in multi- mechanisms, they are not efective on reducing pointer-chasing version systems versions of data records are typically chained using overheads, due to the irregular access patterns in pointer-intensive linked lists [62], traversing which constitutes another main source data structures. For instance, when traversing a tree, it is difcult of data stalls, in addition to index traversals. CoroBase transpar- for hardware to predict and prefetch correctly the node which is ently
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages14 Page
-
File Size-