Improving In-Memory Database Index Performance with Intel R

Appears in the 20th International Symposium On High-Performance Computer Architecture, Feb. 15 - Feb. 19, 2014. Improving In-Memory Database Index Performance with Intel R Transactional Synchronization Extensions Tomas Karnagel†?, Roman Dementiev†, Ravi Rajwar†, Konrad Lai†, Thomas Legler‡, Benjamin Schlegel?, Wolfgang Lehner? †Intel Corporation, Munich, Germany and Hillsboro, USA ‡SAP AG, Database Development, Walldorf, Germany ?TU Dresden, Database Technology Group, Dresden, Germany †[email protected], ‡[email protected], [email protected] Abstract scalability and concurrency, and systems today employ a wide The increasing number of cores every generation poses range of sophisticated implementations. Unfortunately, such challenges for high-performance in-memory database systems. implementations end up being hard to verify and developers While these systems use sophisticated high-level algorithms must deal with numerous corner cases and deadlock scenar- to partition a query or run multiple queries in parallel, they ios. As Graefe [10] has noted, “Perhaps the most urgently also utilize low-level synchronization mechanisms to synchro- needed future direction is simplification. Functionality and nize access to internal database data structures. Developers code for concurrency control and recovery are too complex often spend significant development and verification effort to to design, implement, test, debug, tune, explain, and main- improve concurrency in the presence of such synchronization. tain." The development of simple and scalable easy-to-verify The Intel R Transactional Synchronization Extensions implementations remains challenging. TM (Intel R TSX) in the 4th Generation Core Processors enable With the increasing dependence on parallelism to improve hardware to dynamically determine whether threads actually performance, researchers have investigated methods to sim- need to synchronize even in the presence of conservatively plify the development of highly-concurrent multi-threaded used synchronization. This paper evaluates the effectiveness algorithms. Transactional Memory, both hardware [13] and of such hardware support in a commercial database. We focus software [23], aims to simplify lock-free implementations. on two index implementations: a B+Tree Index and the Delta However, integrating it into complex software systems re- Storage Index used in the SAP HANA R database system. We mains a significant challenge [12]. Lock elision [19, 20] pro- demonstrate that such support can improve performance of poses hardware mechanisms to improve the performance of database data structures such as index trees and presents a lock-based algorithms by transactionally executing them in compelling opportunity for the development of simpler, scal- a lock-free manner when possible. In such an execution, the able, and easy-to-verify algorithms. lock is only read; it is not acquired nor written to, thus expos- ing concurrency. The hardware buffers transactional updates 1. Introduction and checks for conflicts with other threads. If the execution is The increasing number of cores every generation poses successful, then the hardware makes all memory operations new challenges for the implementation of modern high- performed within the region to appear to occur instantaneously. performance in-memory database systems. While these The hardware can thus dynamically determine whether threads database systems use sophisticated algorithms to partition need to synchronize and threads perform serialization only if a query or run multiple queries across multiple cores, they required for correct execution. also utilize low-level synchronization mechanisms such as However, it has so far been an open question as to whether latches and locks1to synchronize access from concurrently modern databases can take advantage of such hardware sup- executing threads to internal in-memory data structures, such port in the development of simple yet scalable concurrent as indexes. Database developers often spend significant de- algorithms to manage internal database state [17, 14, 10]. velopment effort and resources to improve concurrency in the This paper addresses that question and evaluates the effec- presence of such synchronization. Numerous implementations tiveness of hardware supported lock elision to enable simple have been proposed over the years with the goal to improve yet scalable database index implementations. For this paper, we use the Intel R Transactional Synchronization Extensions 1We use the terms latches and locks interchangeably to represent low-level R TM synchronization mechanisms. Locks here do not refer to database locks used (Intel TSX) in the Intel 4th Generation Core Processors to provide isolation between high-level database transactions. but our findings are applicable to other implementations. Paper Contributions. This paper presents the first exper- Delta Storage write Column Index read imental analysis of the effectiveness of hardware supported Data Dictionary lock elision to enable the development of simple yet scal- (List of IDs) able index implementations in the context of a commercial Delta Merge + to new Main Storage in-memory database system. at certain point • Highlights the scalability issues that arise from con- Main Storage Column read tended atomic read-modify-write operations in modern high- Data Dictionary (List of IDs) performance systems. • Applies Intel TSX-based lock elision to two index imple- Figure 1: Delta and Main Storage in SAP HANA R mentations, the common B+Tree index and a more complex Delta Storage Index used in the SAP HANA R in-memory a read-only data structure and maintains highly compressed database system and demonstrates how such hardware sup- permanent data optimized for querying. The Delta Storage port can enable simple yet scalable algorithms. supports insert and read operations and tracks deletions using visibility flags for each row. The Delta Storage tracks all • Shows how applying well-known transformations can help changes to a column and is optimized for writes. If the Delta overcome hardware limitations and increase the effective- Storage exceeds a certain size, a merge operation joins the ness of such hardware support. Delta Storage with the actual column. Section 2 analyzes the concurrency behavior of two in- The Main Storage and the Delta Storage both maintain dic- dex implementations: A common B+Tree index and a more tionaries for compression. The database uses the dictionary complex Delta Storage Index used in SAP HANA. The two to replace the stored data with an identifier (ID) that points to represent different examples of access patterns and foot prints the dictionary entries. The Main Storage dictionary is sorted for index traversals. We find that even with a reader-writer to provide optimized search performance and high compres- lock and read-only accesses, the communication overheads sion. The Delta Storage on the other hand is optimized for associated with atomic read-modify-write operations on the efficient insert operations and thus maintains an unsorted dic- reader lock can limit scalability on modern systems. Further, tionary where new values can only be appended. It therefore even a modest fraction of insert operations (5%) can degrade maintains a secondary index tree, the Delta Storage Index, for performance. These results motivate the use of lock elision fast searches on this unsorted dictionary. Figure 2 shows an to eliminate the cache-to-cache transfers of the reader-writer example organization. The column data on the left consists of lock, even with only readers. IDs that point to entries in the dictionary containing unsorted Section 3 describes the Intel Transactional Synchronization distinct values and IDs. The Delta Storage Index on the right Extensions, its implementation and programming interface, contains IDs. The order of the IDs in the index depends on the and provides key guidelines to effectively use Intel TSX. dictionary entries represented. Storing only the IDs of entries Section 4 applies Intel TSX-based lock elision to the two ensures a fixed node size and constant cost for copying data. indexes. For this study, we use a mutual-exclusion spin lock A query always starts at the root node and looks up the for the Intel TSX version and compare against a more complex dictionary for every relevant ID in a node. For example, a reader-writer lock. We find that a simple spin lock when elided query for “Oakland” looks up ID 4 and 7 of the root node. provides better scalability across a range of configurations. ID 4 represents “Fresno” and ID 7 represents “Palo Alto”. In The initial performance gains are significant for the B+Tree alphabetical order “Oakland” lies between the two. The search index but more modest for the Delta Storage Index. We in- continues to the second child node and looks up ID 4 and 8. vestigate this further and identify cases where lock elision “Oakland” lies in the second position (ID 8). This is done for was unsuccessful. Section 5 presents simple transformations every search or insert operation. If an entry being inserted to make the Delta Storage Index more elision friendly. With already exists in the index, the ID is added to the column data, these transformations, a spin lock with lock elision signifi- otherwise the entry is added to the dictionary and an ID is cantly outperforms a reader-writer lock. assigned. This ID is also added to the index and column data While we have focused on complex database data structures list. While the index supports efficient search and insert, it such as index trees, the key concepts outlined in the paper are can also be used to sort the dictionary if needed. Reading all applicable to other similar data structures that are accessed the IDs of all the leaf nodes in order (e.g., 0;5;4;8;7;6;1;3;2) concurrently by multiple threads but where the probability of returns the IDs in alphabetical order of their entries. an actual data conflict is relatively low. 2.1. Experimental Analysis 2. SAP HANA Delta Storage Index We first study the concurrency performance of the B+Tree and The SAP HANA Database System is a read optimized column Delta Storage Index. store system. Each column consists of a Main Storage and 2.1.1.

Load more