Self-Organizing Tuple Reconstruction in Column-Stores

Self-organizing Tuple Reconstruction in Column-stores Stratos Idreos Martin L. Kersten Stefan Manegold CWI Amsterdam CWI Amsterdam CWI Amsterdam The Netherlands The Netherlands The Netherlands [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Column-stores gained popularity as a promising physical de- A prime feature of column-stores is to provide improved sign alternative. Each attribute of a relation is physically performance over row-stores in the case that workloads re- stored as a separate column allowing queries to load only quire only a few attributes of wide tables at a time. Each the required attributes. The overhead incurred is on-the-fly relation R is physically stored as a set of columns; one col- tuple reconstruction for multi-attribute queries. Each tu- umn for each attribute of R. This way, a query needs to load ple reconstruction is a join of two columns based on tuple only the required attributes from each relevant relation. IDs, making it a significant cost component. The ultimate This happens at the expense of requiring explicit (partial) physical design is to have multiple presorted copies of each tuple reconstruction in case multiple attributes are required. base table such that tuples are already appropriately orga- Each tuple reconstruction is a join between two columns nized in multiple different orders across the various columns. based on tuple IDs/positions and becomes a significant cost This requires the ability to predict the workload, idle time component in column-stores especially for multi-attribute to prepare, and infrequent updates. queries [2, 6, 10]. Whenever possible, position-based join- In this paper, we propose a novel design, partial side- matching and sequential data access are exploited. For each ways cracking, that minimizes the tuple reconstruction cost relation Ri in a query plan q, a column-store needs to per- in a self-organizing way. It achieves performance similar form at least Ni − 1 tuple reconstruction operations for Ri to using presorted data, but without requiring the heavy within q, given that Ni attributes of Ri participate in q. initial presorting step itself. Instead, it handles dynamic, Column-stores perform tuple reconstruction in two ways [2]. unpredictable workloads with no idle time and frequent up- With early tuple reconstruction, the required attributes are dates. Auxiliary dynamic data structures, called cracker glued together as early as possible, i.e., while the columns maps, provide a direct mapping between pairs of attributes are loaded, leveraging N-ary processing to evaluate the query. used together in queries for tuple reconstruction. A map On the other hand, late tuple reconstruction exploits the is continuously physically reorganized as an integral part of column-store architecture to its maximum. During query query evaluation, providing faster and reduced data access processing, \reconstruction" merely refers to getting the at- for future queries. To enable flexible and self-organizing be- tribute values of qualifying tuples from their base columns as havior in storage-limited environments, maps are material- late as possible, i.e., only once an attribute is required in the ized only partially as demanded by the workload. Each map query plan. This approach allows the query engine to exploit is a collection of separate chunks that are individually reor- CPU- and cache-optimized vector-like operator implementa- ganized, dropped or recreated as needed. We implemented tions throughout the whole query evaluation. N-ary tuples partial sideways cracking in an open-source column-store. A are formed only once the final result is delivered. detailed experimental analysis demonstrates that it brings Like most modern column-stores [12, 4, 15], we focus on significant performance benefits for multi-attribute queries. late reconstruction. Comparing early and late reconstruc- Categories and Subject Descriptors: H.2 [DATABASE tion, the educative analysis in [2] observes that the latter MANAGEMENT]: Physical Design - Systems incurs the overhead of reconstructing a column more than once, in case it occurs more than once in a query. Further- General Terms: Algorithms, Performance, Design more, exploiting sequential access patterns during recon- Keywords: Database Cracking, Self-organization struction is not always possible since many operators (joins, group by, order by etc.) are not tuple order-preserving. The ultimate access pattern is to have multiple copies for each relation R, such that each copy is presorted on an other attribute in R. All tuple reconstructions of R attributes initiated by a restriction on an attribute A can be performed Permission to make digital or hard copies of all or part of this work for using the copy that is sorted on A. This way, the tuple personal or classroom use is granted without fee provided that copies are reconstruction does not only exploit sequential access, but not made or distributed for profit or commercial advantage and that copies also benefits from focused access to only a small consecutive bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific area in the base column (as defined by the restriction on A) permission and/or a fee. rather than scattered access to the whole column. However, SIGMOD’09, June 29–July 2, 2009, Providence, Rhode Island, USA. such a direction requires the ability to predict the workload Copyright 2009 ACM 978-1-60558-551-2/09/06 ...$5.00. 1 and the luxury of idle time to prepare the physical design. lection of Binary Association Tables (BATs). Each BAT is In addition, up to date there is no efficient way to maintain a set of two columns. For a relation R of k attributes, there multiple sorted copies under updates in a column-store; thus exist k BATs, each BAT storing the respective attribute as it requires read-only or infrequently updated environments. (key,attr) pairs. The system-generated key identifies the In this paper, we propose a self-organizing direction that relational tuple that attribute value attr belongs to, i.e., all achieves performance similar to using presorted data, but attribute values of a single tuple are assigned the same key. comes without the hefty initial price tag of presorting it- Key values form a dense ascending sequence representing the self. Instead, it handles dynamic unpredictable workloads position of an attribute value in the column. Thus, for base with frequent updates and with no need for idle time. Our BATs, the key column typically is a virtual non-materialized approach exploits database cracking [7, 8, 9] that sets a column. For each relational tuple t of R, all attributes of t promising direction towards continuous self-organization of are stored in the same position in their respective column data storage based on selections in incoming queries. representations. The position is determined by the inser- We introduce a novel design, partial sideways cracking, tion order of the tuples. This tuple-order alignment across that provides a self-organizing behavior for both selections all base columns allows the column-oriented system to per- and tuple reconstructions. It gracefully handles any kind form tuple reconstructions efficiently in the presence of tuple of complex multi-attribute query. It uses auxiliary self- order-preserving operators. Basically, the task boils down organizing data structures to materialize mappings between to a simple merge-like sequential scan over two columns, re- pairs of attributes used together in queries for tuple recon- sulting in low data access costs through all levels of modern struction. Based on the workload, these cracker maps are hierarchical memory systems. Let us go through some of the continuously kept aligned by being physically reorganized, basic operators of MonetDB's two-column physical algebra. while processing queries, allowing the DBMS to handle tuple Operator select(A,v1,v2) searches all (key,attr) pairs reconstruction using cache-friendly access patterns. in base column A for attribute values between v1 and v2. For To enhance performance and adaptability, in particular in each qualifying attribute value, the key value (position) is environments with storage restrictions, cracker maps are im- included in the result. Since selections are mostly performed plemented as dynamic collections of separate chunks. This on base columns, the underlying implementation preserves enables flexible storage management by adaptively main- the key-order also in the intermediate results. taining only those chunks of a map that are required to Operator join(j1,j2) performs a join between attr1 of process incoming queries. Chunks adapt individually to the j1 and attr2 of j2. The result contains the qualifying query workload. Each chunk of a map is separately reorga- (key1,key2) pairs. In general, this operator can maintain nized, dropped if extra storage space is needed, or recreated the tuple order only for the outer join input. (entirely or in parts) if necessary. Similarly, groupby and orderby operators cannot main- We implemented partial sideways cracking on top of an tain tuple order for any of their inputs. open-source column-oriented DBMS, MonetDB1 [15]. The Operator reconstruct(A,r) returns all (key,attr) pairs paper presents an extensive experimental analysis using both of base column A at the positions specified by r. If r is the synthetic workloads and the TPC-H benchmark. It clearly result of a tuple order-preserving operator, then iterating shows that partial sideways cracking brings a self-organizing over r, it uses cache-friendly in-order positional lookups into behavior and significant benefits even in the presence of ran- A. Otherwise, it requires expensive random access patterns. dom workloads, storage restrictions and updates. The remainder of this paper is organized as follows. Sec- 2.2 Selection-based Cracking tion 2 provides the necessary background.

Self-Organizing Tuple Reconstruction in Column-Stores

On Free Products of N-Tuple Semigroups

Equivalents to the Axiom of Choice and Their Uses A

Python Mock Test

Efficient Skyline Computation Over Low-Cardinality Domains

Canonical Models for Fragments of the Axiom of Choice∗

Session 5 – Main Theme

Defining Sets

The Number of Countable Models

A Understanding Cardinality Estimation Using Entropy Maximization

On the Necessary Use of Abstract Set Theory

Small Gaps Between Three Almost Primes and Almost Prime Powers

LECTURE 18 1. Chapter 6.1 Again! Definition (Ordered N-Tuple)