In-Memory Performance for Big Data
Total Page:16
File Type:pdf, Size:1020Kb
In-Memory Performance for Big Data Goetz Graefe Haris Volos Hideaki Kimura HP Labs Palo Alto HP Labs Palo Alto HP Labs Palo Alto [email protected] [email protected] [email protected] Harumi Kuno Joseph Tucek HP Labs Palo Alto HP Labs Palo Alto [email protected] [email protected] ∗ Mark Lillibridge Alistair Veitch HP Labs Palo Alto Google [email protected] [email protected] ABSTRACT coherence issues. However, a buffer pool imposes a layer When a working set fits into memory, the overhead im- of indirection that incurs a significant performance cost. In posed by the buffer pool renders traditional databases non- particular, page identifiers are required to locate the page competitive with in-memory designs that sacrifice the ben- regardless of whether it is in memory or on disk. This in efits of a buffer pool. However, despite the large memory turn calls for a mapping between in-memory addresses of available with modern hardware, data skew, shifting work- pages and their identifiers. loads, and complex mixed workloads make it difficult to guarantee that a working set will fit in memory. Hence, some 10000000 recent work has focused on enabling in-memory databases to protect performance when the working data set almost 1000000 fits in memory. Contrary to those prior efforts, we en- OS Virtual Memory able buffer pool designs to match in-memory performance Traditional Buffer Pool while supporting the \big data" workloads that continue 100000 to require secondary storage, thus providing the best of both worlds. We introduce here a novel buffer pool de- sign that adapts pointer swizzling for references between 10000 system objects (as opposed to application objects), and uses Throughput(qps) it to practically eliminate buffer pool overheads for memory- resident data. Our implementation and experimental eval- 1000 uation demonstrate that we achieve graceful performance degradation when the working set grows to exceed the buffer pool size, and graceful improvement when the working set 100 shrinks towards and below the memory and buffer pool sizes. 0 5 10 15 20 Working Set Size (GB) Figure 1: Index lookup performance with in- 1. INTRODUCTION memory vs. traditional database. Database systems that use a buffer pool to hold in-memory copies of the working data enjoy a number of benefits. They can very efficiently manage working sets that far exceed Figure 1 illustrates the impact of the buffer pool's over- available memory. They offer natural support for write- heads, comparing the performance of an in-memory database ahead logging. They are somewhat insulated from cache using direct pointers between B-tree nodes (marked \Main Memory") and a database system using a traditional buffer ∗Work done while at HP Labs. pool with disk pages (marked \Traditional Buffer Pool"). The available memory is 10 GB; further details of this ex- periment are provided in Section 5. In the left half, the This work is licensed under the Creative Commons Attribution- NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this li- entire working set fits into the available physical memory cense, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain per- and buffer pool. In the right half, the working set exceeds mission prior to any use beyond those covered by the license. Contact the available memory and spills to the disk. The in-memory copyright holder by emailing [email protected]. Articles from this volume database relies on the OS virtual memory mechanism to sup- were invited to present their results at the 41st International Conference on port working sets larger than the available physical memory. Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, The in-memory database permits substantially faster index Hawaii. Proceedings of the VLDB Endowment, Vol. 8, No. 1 look-ups than the traditional buffer pool when the working Copyright 2014 VLDB Endowment 2150-8097/14/09. set fits in physical memory. In fact, our experiments show 37 that an in-memory database improves throughput by be- our approach is entirely automatic and it adapts to cases tween 41% to 68% compared to one that uses a traditional of growing and shrinking working sets. As the raison d'etre buffer pool. This correlates well with earlier results that of such systems is to provide superior performance for in- the buffer pool takes about 30% of the execution cycles in memory workloads, most of these do not support data sets traditional transaction processing databases [14]. that do not fit in memory. We borrow the technique of pointer swizzling and adapt For instance, Microsoft's SQL Server Hekaton stores raw it to the context of page-oriented data stores, selectively memory addresses as pointers, eliminating the need for data replacing identifiers of pages persisted in storage with direct pages and indirection by page IDs. However, once the work- references to the memory locations of the page images. Our ing set size exceeds the available amount of memory, Heka- techniques enable a data store that uses a buffer pool to ton simply stops inserting rows [1]. achieve practically the same performance as an in-memory One possible solution is to rely on the VM layer to deal database. The research contributions here thus include: with any spills out of memory. For instance, some databases (such as Tokyo Cabinet) simply mmap their backing store [6]. 1. Introduction of pointer swizzling in database buffer This leads to several problems. The first is that the OS pool management. VM layer is particularly poor at making eviction decisions 2. Specification and analysis of swizzling and un-swizzling for transactional workloads (as we show later in our exper- techniques for the specific case of B-tree indexes. iments). The second is that the operating system consid- ers itself free to flush dirty pages opportunistically, with- 3. Victim selection (replacement policy) guiding eviction out notifying the application. This can cause data integrity and un-swizzling of B-tree pages from the buffer pool. problems when a page dirtied by an in-flight transaction is written back without the matching log records. 4. Experimental proof that these simple techniques en- Failure atomic msync [35] provides a mechanism by which able a database with a buffer pool to achieve in-memory 1) the kernel is prevented from lazily flushing pages, and 2) database performance when the working set fits in a modification to the msync call allows users to select a sub- memory, and furthermore achieve a continuum with set of pages to be written out atomically. Although this graceful degradation as the working set size exceeds certainly helps, it is insufficient for a transaction process- the memory size. ing system. First, failure atomic msync does not provide In the remainder of this paper, we first discuss prior efforts parallelism. From contact with the authors, failure atomic to improve the performance of in-memory database systems msync \only gives A and D . we provide no support for iso- with regard to memory limitations as well as prior work in lation. If you want safe, concurrent, database-style transac- pointer swizzling (Section 2). We next describe the problem tions, you need more" [34]. Second, as implemented, failure being solved (Section 3), and then present details of our atomic msync hijacks the file system journaling mechanism, techniques for swizzling and un-swizzling within a database duplicating each page write. Unlike a traditional buffer pool buffer pool (Section 4). We evaluate the performance of based system where changing a single row causes a few bytes our solution using an implementation (Section 5) and finally of log writes and a page-sized in-place update, a system present our conclusions from the work to-date (Section 6). based on failure atomic msync incurs a minimum of two page-sized writes, a significant performance penalty. 2. RELATED WORK There are a small number of recent efforts that make VM- based in-memory databases performant when the working This paper describes our efforts to adapt the known tech- data set almost fits in memory. These systems address the nique of pointer swizzling to the new context of making problem of swapping by either explicitly moving some data disk-based systems competitive with in-memory database records to virtual memory pages that the OS can then swap systems. This section begins by discussing prior work that to disk more intelligently or else by explicitly compressing addresses the problem of how to preserve the performance less-frequently accessed data records so as to reduce the of in-memory database systems when the working set does space consumed, but also reducing access speed. not quite fit in the available amount of memory. These prior Stoica and Ailamaki profiled the performance of the state- systems do not consider how to match the performance of a of-the-art in-memory VoltDB database, and found that per- traditional database when disk I/O is the bottleneck. Our formance dropped precipitously when the working data set design, on the other hand, enables a traditional database exceeded available memory [37]. Their work demonstrated system that has been optimized for disk I/O to compete that by intelligently controlling which tuples are swapped with in-memory systems when the working data set fits in- out by the operating system and which are kept in memory, memory. Thus, the second half of this section discusses prior it is possible to make much more effective use of available work on pointer swizzling, which is the technique we lever- memory. To achieve this, two regions of memory are created age to eliminate buffer pool overheads.