KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Total Page:16
File Type:pdf, Size:1020Kb
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures Thomas Kissinger, Benjamin Schlegel, Dirk Habich, Wolfgang Lehner Database Technology Group Technische Universität Dresden 01062 Dresden, Germany {firstname.lastname}@tu-dresden.de ABSTRACT Level 1 virtual Growing main memory capacities and an increasing num- ber of hardware threads in modern server systems led to 16bit fundamental changes in database architectures. Most im- portantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite Level 2 on-demand & of a high main memory scan performance, index structures 4 KB 4 KB 4 KB 4 KB … 4 KB 4 KB compact pointers 10bit are still important components, but they have to be designed from scratch to cope with the specific characteristics of main Level 3 compressed 6bit memory and to exploit the high degree of parallelism. Cur- rent research mainly focused on adapting block-optimized Thread-Local Memory Management Subsystem B+-Trees, but these data structures were designed for sec- ondary memory and involve comprehensive structural main- tenance for updates. Figure 1: KISS-Tree Overview. In this paper, we present the KISS-Tree, a latch-free in- memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. indexes) in-memory and use secondary memory (e.g., disks) More specifically, we aim for the same performance as mod- only for persistence. While disk block accesses constituted ern hash-based algorithms but keeping the order-preserving the bottleneck for disk-based systems, modern in-memory nature of trees. We achieve this by using a prefix tree that databases shift the memory hierarchy closer to the CPU incorporates virtual memory management functionality and and face the \memory wall" [13] as new bottleneck. Thus, compression schemes. In our experiments, we evaluate the they have to care for CPU data caches, TLBs, main mem- KISS-Tree on different workloads and hardware platforms ory accesses and access patterns. This essential change also and compare the results to existing in-memory indexes. The affects indexes and forces us to design new index structures KISS-Tree offers the highest reported read performance on that are now optimized for these new design goals. More- current architectures, a balanced read/write performance, over, the movement from disks to main memory dramatically and has a low memory footprint. increased data access bandwidth and reduced latency. In combination with the increasing number of cores on modern hardware, parallel processing of operations on index struc- 1. INTRODUCTION tures imposes a new challenges for us, because the overhead Databases heavily leverage indexes to decrease access of latches became a critical issue. Therefore, building future times for locating single records in large relations. On clas- index structures in a latch-free way is essential for scalability sic disk-based database systems, the B+-Tree [2] is typi- on modern and future systems. cally the structure of choice. B+-Trees are suited for kinds In this paper, we present the KISS-Tree [1] that is an of those systems because they are optimized for block-based order-preserving latch-free in-memory index structure for disk accesses. However, modern server hardware is equipped currently maximum 32bit wide keys and arbitrary values. with high capacities of main memory. Therefore, database It is up to 50% faster compared to the previously reported architectures move from classic disk-based systems towards read performance on the same architecture1 and exceeds its databases that keep the entire data pool (i.e., relations and update performance by orders of magnitude. The KISS-Tree is based on the Generalized Prefix Tree [3] and advances it by minimizing the number of memory accesses needed for Permission to make digital or hard copies of all or part of this work for accessing a key's value. We achieve this reduction by taking personal or classroom use is granted without fee provided that copies are advantage of the virtual memory management functionality not made or distributed for profit or commercial advantage and that copies provided by the operating system and the underlying hard- bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific ware as well as compression mechanisms. Figure 1 shows an permission and/or a fee. overview of the KISS-Tree structure. As shown, the entire Proceedings of the Eighth International Workshop on Data Management on 1 New Hardware (DaMoN 2012) , May 21, 2012, Scottsdale, AZ, USA Due to the unavailability of source code and binaries, we Copyright 2012 ACM 978-1-4503-1445-9 ...$10.00. refer to the figures published in [5]. SIMD Blocking Cache line Blocking 1 31 5bit 0 1 2 … 14 15 4bit Page Blocking I I 14 15 1 14 26 5bit 0 1 2 … 14 15 0 1 2 … 14 15 4bit … … … … … … I I I I I Key Key … … … Key Key Value Value 0 1 2 … 14 15 4bit 0 2 2bit … Value Value … … Key Key Key Key Key Key Key Key Key Key Value Value Value Value Value Value Value Value … Value Value (a) FAST. (b) Prefix Tree (k0 = 4). (c) CTrie. Figure 2: Existing In-Memory Index Structures. index structure comprises only three tree levels. Each level a fixed number of pointers; the key is split into fragments of leverages different techniques to find a trade-off between per- an equal length k0 where each fragment selects one pointer formance and memory consumption on the respective level. of a certain node. Starting from the most significant frag- ment, the path through the nodes is then given by following 2. RELATED WORK the pointers that are selected by the fragments, i.e., the i-th fragment selects the pointer within the node on level i. Fig- In-memory indexing is a well investigated topic in ure 2(b) shows an exemplary prefix tree with k0 = 4. The database research since years. Early research mainly fo- first tree level differentiates the first 4 bits of the key, the sec- cused on reducing the memory footprint of traditional data ond level the next 4 bits, and so on. Prefix trees do not have structures and making the data structures cache-conscious. to care for cache line sizes because there is only one memory For example, the T-Tree [8] reduces the number of pointers access per tree level necessary. Also the number of memory of traditional AVL-trees [6] while the CSB+-Tree [11] is an accesses is limited by the length of the key, for instance, a almost pointer-free and thus cache-conscious variant of the 32bit key involves a maximum of 8 memory accesses. In or- traditional B+-Tree [2]. These data structures usually offer der to reduce the overall tree size and the memory accesses, a good read performance. However, they struggle with prob- a prefix tree allows dynamic expansion and only unrolls tree lems when updates are performed by multiple threads in par- levels as far as needed like shown on the left hand side of allel. All block-based data structures require comprehensive the example. Because of the dynamic expansion, the prefix balancing operations when keys are removed or added. This tree has to store the original key besides the value in the implicates complex latching schemes [7] that cause an oper- content node at the end of the path. The characteristics ation to latch across multiple index nodes and hence blocks of the prefix tree allow a balanced read/update performance other read/write operations even if they work on different and parallel operations, because there is not much structural keys. Even if there is no node splitting/merging required, maintenance necessary. B+-Tree structures have to latch coarse-grained at page- A weak point of the prefix tree is the high memory over- level, which increases contention. Moreover, it is not possi- head that originates from sparely occupied nodes. The ble to apply optimistic latch-free mechanisms, for instance CTrie [10] removes this overhead by compressing each node. atomic instructions, without the help of hardware trans- This is achieved by adding a 32bit bitmap (limits the k0 to actional memory(HTM). To overcome this critical point, 5) to the beginning of each node, which indicates the occu- PALM [12] took a synchronous processing approach on the pied buckets in that node. Using this bitmap, the node only B+-Tree. Instead of processing single operations directly, has to store the buckets in use. Due to the compression, the operations are buffered and each thread processes the nodes grow and shrink. Thus, the CTrie raises additional operations for its assigned portion of the tree. This approach costs for updates, because growing and shrinking nodes re- eliminates the need for any latches on tree elements, but still quires data copying. To allow efficient parallel operations suffers from the high structural maintenance overhead of the on the CTrie, it was designed latch-free. Synchronization B+-Tree. is done via atomic compare-and-swap instructions, like it is The FAST approach [5] gears even more in the direction possible in the basic prefix tree. However, because nodes of fast reads and slow updates. FAST uses an implicit data and buckets inside nodes are moving as a result of the com- layout, as shown in Figure 2(a), which gets along without pression, the CTrie uses I-nodes for indirection as depicted any pointers and is optimized for the memory hierarchy of in Figure 2(c). This indirection creates a latch-free index todays CPUs.