Thread Management for High Performance Database Systems - Design and Implementation

Nr.: FIN-003-2018 Thread Management for High Performance Database Systems - Design and Implementation Robert Jendersie, Johannes Wuensche, Johann Wagner, Marten Wallewein-Eising, Marcus Pinnecke, Gunter Saake Arbeitsgruppe Database and Software Engineering Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg Nr.: FIN-003-2018 Thread Management for High Performance Database Systems - Design and Implementation Robert Jendersie, Johannes Wuensche, Johann Wagner, Marten Wallewein-Eising, Marcus Pinnecke, Gunter Saake Arbeitsgruppe Database and Software Engineering Technical report (Internet) Elektronische Zeitschriftenreihe der Fakultät für Informatik der Otto-von-Guericke-Universität Magdeburg ISSN 1869-5078 Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg Impressum (§ 5 TMG) Herausgeber: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Der Dekan Verantwortlich für diese Ausgabe: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Marcus Pinnecke Postfach 4120 39016 Magdeburg E-Mail: [email protected] http://www.cs.uni-magdeburg.de/Technical_reports.html Technical report (Internet) ISSN 1869-5078 Redaktionsschluss: 21.08.2018 Bezug: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Dekanat Thread Management for High Performance Database Systems - Design and Implementation Technical Report Robert Jendersie1, Johannes Wuensche2, Johann Wagner1, Marten Wallewein-Eising2, Marcus Pinnecke1, and Gunter Saake1 Database and Software Engineering Group, Otto-von-Guericke University Magdeburg, Germany firstname.lastname@(1ovgu | 2st.ovgu).de Abstract—Processing tasks in parallel is used in nearly all we provide insights into our thread managing strategy in applications to keep up with the requirements of modern software GECKODB/BOLSTER, its design and its implementation. systems. However, the current implementation of parallel pro- Parallel data processing can be achieved by different ap- cessing in GECKODB, a graph database system developed in our group, requires to spawn many short-lived threads that execute proaches, like instruction and data parallelism or multi thread- a single task and then terminate. This creates a lot of overhead ing. We focus on multi threading by implementing a thread since the threads are not being reused. To counter this effect, we pool for the graph database system GECKODB1. The thread implemented a thread pool to be used in GECKODB to reduce the pool will be integrated into BOLSTER, a high performance additional time required to create and close thousands of threads. library for parallel execution of primitives like for-loops, or In this paper, we show our implementation of a thread pool to process independent tasks in parallel, including the waiting filter-operation on large data sets. BOLSTER has similarities for a set of processed tasks. We also compare the thread pool to Intel TBB [11], a C++ template library for parallel pro- implementation against the current one. Additionally, we show gramming, but BOLSTER is written from scratch to perfectly that the task and thread pool configuration, depending on the use fit into GECKODBs specialized storage engine and vectorized case, has a high impact on the thread pool performance. At the query engine, both written in C11 [12]. end of our evaluation, we show that the implementation fulfils all given requirements and generally reduce overhead compared In the current implementation, BOLSTER creates a fix to creating and spawning threads individually. number of threads for each call of a primitive. This approach is Index Terms—Database Management System, Architecture of called thread-per-request. Since many primitives are executed Parallel Processing Systems, Thread and Task Management at the same time, a couple of drawbacks arise from this implementation. First of all, the creation of threads comes along I. INTRODUCTION with overhead like stack initialization and memory allocation. Since the amount of data that is stored and processed Secondly, creating a huge number of threads simultaneously by modern database systems is growing fast, sequential data may lead to large context switch overhead of the scheduler. processing as only possibility is inconceivable [1]. Hence, ap- Additionally, debugging and profiling applications that create plications and systems have to process data in parallel to reach many threads during runtime is time consuming. sufficient throughput to fulfil appropriate requirements [2]– To overcome these drawbacks, we integrate an optimized [4] which in turn affects the architectural design and query thread pool in BOLSTER. Along with the implementation, we processing of modern database systems [5]–[9]. measure the performance of the primitives to determine the The growing number of cores for both traditional CPUs as thread pool overhead. Additionally, we measure metrics like well as trending co-processors like GPUs require to re-evaluate waiting and busy time of threads to evaluate correct thread core design decision in data-intense systems. For instance, Wu pool sizes for the considered use cases. et al. examined the scalability of multi-version concurrency In this work we make the following contributions: control (MVCC) for a growing number of threads in 2017 • Design and Implementation We describe our design and [10]. They concluded that multi-threading in particular and implementation of the thread pool modern hardware in general both promise notable performance • Waiting Strategies We evaluate the possibility to wait gains for high-performance database systems using MVCC. for a group of task in the calling thread However, they also observed that there is no clear winner • Evaluation We compare our thread pool against the combination for several concurrency control protocol, version existing implementation in BOLSTER storage, and garbage collection as well as the index manage- • Statistics per Configuration We measure and evaluate ment strategies in MVCC. additional statistics for different thread pool configura- For this paper, we focus on managing a far lower level: general parallel data processing. In particular, with this paper 1GECKODB source code repository: https://github.com/geckodb tions A. Architecture of the Thread Pool We organized the rest of the paper as follows. In Section II, we Compared to simply create threads on demand, managing give preliminaries about the considered task configuration and threads in a pool comes along with memory and CPU over- about thread safe access of memory. In Section III, we show head. The thread pool must know information about the state our design and implementation of the thread pool and examine each thread has and his assigned task. To measure metrics of our experimental environment in Section IV. In Section V, we threads and tasks, additional memory for threads and tasks is describe the results of our performance evaluation. In Section required. VI, we name related work and state our conclusion and future In Figure 1, we show our design of the whole thread pool work in Section VII. system, containing the thread pool itself, the task priority queue and the performance monitoring. Since measuring per- II. PRELIMINARIES formance metrics lead to memory and CPU overhead, we In this section, we define our configuration of tasks that decided to exclude the performance measurements from the are processed by the thread pool and state difficulties of thread pool to make it optional. The performance monitoring synchronizing thread access to memory. can be activated by a boolean parameter of the thread pool A. Task Configuration create functions. Consequently, in the target database system, the designers can decide for each thread pool instance, if We define a Task as a structure referencing data that has performance monitoring should be applied. to be processed and an operation that has to be executed on The thread pool system includes an array of threads and the data. In this work, we define tasks as independent, which a priority queue to store the passed tasks. We decided to means tasks do not have dependencies on other tasks and can implement the thread pool using the POSIX thread library be processed independently. Furthermore, we expect the data to provide the thread pool for multiple operation systems like passed to two tasks are stored in different memory locations. Linux and Unix. Additionally, we avoid using custom compiler Consequently, while executing the task operation, threads do flags to ensure that the thread pool can be compiled with not access the same memory locations. different compilers. The thread pool itself contains a variable Each task can be enqueued with a priority. The priority zero number of threads. Since Xu et al. [13] and Ling et al. [14] is the highest and the task will be processed by the next free show the importance of accurate thread pool size, we add a thread. The higher the priority of a task is, the further it is resizing function for our thread pool, which enables to change placed behind in the queue. Additionally, we only consider the amount of threads at runtime. non-preemptable tasks. Once a task is assigned to a thread, the thread will finish the operation of the task before getting B. Thread Pool and Task Queue a new one. Threads and tasks are two major entities in the thread pool B. Synchronizing Memory Access from Threads system. As we can observe from Figure 1, thread pool and Parallelism with multi threading works great as long as each task queue are two data structures used to store the required thread

Thread Management for High Performance Database Systems - Design and Implementation

Concurrency & Parallel Programming Patterns

Model Checking Multithreaded Programs With

Lecture 26: Creational Patterns

An Enhanced Thread Synchronization Mechanism for Java

Designing for Performance: Concurrency and Parallelism COS 518: Computer Systems Fall 2015

Assessment of Barrier Implementations for Fine-Grain Parallel Regions on Current Multi-Core Architectures

Migrating Thread-Based Intentional Concurrent Programming to a Task-Based Paradigm

Intrinsically-Typed Mechanized Semantics for Session Types

A Comparison of Parallel Design Patterns for Game Development

ADPLUS, 307–308 Aggregate Method, 260 ASP.NET MVC Index View, 180 in .NET 4.0 APM, 183 Asynccontroller, 181 EAP, 181 Indexasyn

NUMA-Aware Thread Pool

Groovy Parallel Systems