The Adaptive Radix Tree: Artful Indexing for Main-Memory Databases
Total Page:16
File Type:pdf, Size:1020Kb
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases Viktor Leis, Alfons Kemper, Thomas Neumann Fakultat¨ fur¨ Informatik Technische Universitat¨ Munchen¨ Boltzmannstrae 3, D-85748 Garching <lastname>@in.tum.de Abstract—Main memory capacities have grown up to a point digit 1 where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. A Traditional in-memory data structures like balanced binary digit 2 search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often NR used for main-memory indexes, are fast but only support point queries. digit 3 To overcome these shortcomings, we present ART, an adaptive D YT ET radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while leaf nodes supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem Fig. 1. Adaptively sized nodes in our radix tree. of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted the long pipelines of modern CPUs stall, which causes addi- order, which enables additional operations like range scan and tional latencies after every second comparison (on average). prefix lookup. These problems of traditional search trees were tackled by recent research on data structures specifically designed to be I. INTRODUCTION efficient on modern hardware architectures. The k-ary search After decades of rising main memory capacities, even large tree [6] and the Fast Architecture Sensitive Tree (FAST) [7] transactional databases fit into RAM. When most data is use data level parallelism to perform multiple comparisons cached, traditional database systems are CPU bound because simultaneously with Singe Instruction Multiple Data (SIMD) they spend considerable effort to avoid disk accesses. This instructions. Additionally, FAST uses a data layout which has led to very intense research and commercial activities in avoids cache misses by optimally utilizing cache lines and main-memory database systems like H-Store/VoltDB [1], SAP the Translation Lookaside Buffer (TLB). While these opti- HANA [2], and HyPer [3]. These systems are optimized for mizations improve search performance, both data structures the new hardware landscape and are therefore much faster. Our cannot support incremental updates. For an OLTP database system HyPer, for example, compiles transactions to machine system which necessitates continuous insertions, updates, and code and gets rid of buffer management, locking, and latching deletions, an obvious solution is a differential file (delta) overhead. For OLTP workloads, the resulting execution plans mechanism, which, however, will result in additional costs. are often sequences of index operations. Therefore, index Hash tables are another popular main-memory data struc- efficiency is the decisive performance factor. ture. In contrast to search trees, which have O(log n) access More than 25 years ago, the T-tree [4] was proposed as time, hash tables have expected O(1) access time and are an in-memory indexing structure. Unfortunately, the dramatic therefore much faster in main memory. Nevertheless, hash processor architecture changes have rendered T-trees, like all tables are less commonly used as database indexes. One reason traditional binary search trees, inefficient on modern hardware. is that hash tables scatter the keys randomly, and therefore only The reason is that the ever growing CPU cache sizes and support point queries. Another problem is that most hash tables the diverging main memory speed have made the underlying do not handle growth gracefully, but require expensive reor- assumption of uniform memory access time obsolete. B+-tree ganization upon overflow with O(n) complexity. Therefore, variants like the cache sensitive B+-tree [5] have more cache- current systems face the unfortunate trade-off between fast friendly memory access patterns, but require more expensive hash tables that only allow point queries and fully-featured, update operations. Furthermore, the efficiency of both binary but relatively slow, search trees. and B+-trees suffers from another feature of modern CPUs: A third class of data structures, known as trie, radix tree, Because the result of comparisons cannot be predicted easily, prefix tree, and digital search tree, is illustrated in Figure1. These data structures directly use the digital representation of II. RELATED WORK keys instead of hashing or comparing keys. The underlying + idea is similar to a thumb-index found in many alphabetically In disk-based database systems, the B -tree [8] is ubiq- ordered dictionary books: The first character of a word can uitous [9]. It retrieves large blocks from disk to reduce the directly be used to jump to all words starting with that number of accesses. Red-black trees [10], [11] and T-trees character. In a computer, this process can be repeated with [4] were proposed for main-memory database systems. Rao the next characters until a match is found. As a consequence and Ross [5] showed that T-trees, like all binary search trees, of this process, all operations have O(k) complexity where k is suffer from poor cache behavior and are therefore often slower + the length of the key. In the era of extremely large data sets, than B -trees on modern hardware. As an alternative, they + + when n is growing faster than k, having a time complexity proposed a cache conscious B -tree variant, the CSB -tree + independent of n is very attractive. [12]. Further cache optimizations for B -trees were surveyed In this work, we present the adaptive radix tree (ART) which by Graefe and Larson [13]. is a fast and space-efficient in-memory indexing structure Modern CPUs allow to perform multiple comparisons with specifically tuned for modern hardware. While most radix a single SIMD instruction. Schlegel et al. [6] proposed k- trees require to trade off tree height versus space efficiency ary search which reduces the number of comparisons from by setting a globally valid fanout parameter, ART adapts the log2 n to logK n where K is the number of keys that fit representation of every individual node, as exemplified in into one SIMD vector. In comparison with binary trees, this Figure1. By adapting each inner node locally, it optimizes technique also reduces the number of cache misses, because global space utilization and access efficiency at the same time. K comparisons are performed for each cache line loaded Nodes are represented using a small number of efficient and from main memory. Kim et al. extended this research by compact data structures, chosen dynamically depending on proposing FAST, a methodology for laying out binary search the number of child nodes. Two additional techniques, path trees in an architecture sensitive way [7]. SIMD, cache line, compression and lazy expansion, allow ART to efficiently and page blocking are used to optimally use the available index long keys by collapsing nodes and thereby decreasing cache and memory bandwidth. Additionally, they proposed to the tree height. interleave the stages of multiple queries in order to increase A useful property of radix trees is that the order of the keys the throughput of their search algorithm. FAST trees and is not random as in hash tables; rather, the keys are ordered the k-ary search trees are pointer-free data structures which bitwise lexicographically. We show how typical data types can store all keys in a single array and use offset calculations be reordered efficiently to support all operations that require to traverse the tree. While this representation is efficient and the data to be ordered (e.g., range scan, prefix lookup, top-k, saves space, it also implies that no online updates are possible. minimum, and maximum). Kim et al. also presented a GPU implementation of FAST and This work makes the following contributions: compared its performance to modern CPUs. Their results show • We develop the adaptive radix tree (ART), a fast and that, due to the higher memory bandwidth, GPUs can achieve space-efficient general-purpose indexing structure for higher throughput than CPUs. Nevertheless, the use of GPUs main-memory database systems. as dedicated indexing hardware is not yet practical because • We prove that the space consumption per key is bounded memory capacities of GPUs are limited, communications cost to 52 bytes, even for arbitrarily long keys. We show with main memory is high, and hundreds of parallel queries experimentally, that the space consumption is much lower are needed to achieve high throughput. We, therefore focus on in practice, often as low as 8.1 bytes per key. index structures for CPUs. • We describe how common built-in data types can be The use of tries for indexing character strings has been stored in radix trees while retaining their order. studied extensively. The two earliest variants use lists [14] • We experimentally evaluate ART and other state of the and arrays [15] as internal node representations. Morrison art main-memory index structures, including the most introduced path compression in order to store long strings efficient search tree proposals. efficiently [16]. Knuth [17] analyzes these early trie variants. • By integrating ART into the main-memory database The burst trie is a more recent proposal which uses trie nodes system HyPer and running the TPC-C benchmark, we for the upper levels of the tree, but switches to a linked list prove its superior end-to-end performance in a “real-life” once a subtree has only few elements. The HAT-trie [18] transaction processing application. improves performance by replacing the linked list with a The rest of this paper is organized as follows.