Self-organizing Tuple Reconstruction in Column-stores Stratos Idreos Martin L. Kersten Stefan Manegold CWI Amsterdam CWI Amsterdam CWI Amsterdam The Netherlands The Netherlands The Netherlands
[email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Column-stores gained popularity as a promising physical de- A prime feature of column-stores is to provide improved sign alternative. Each attribute of a relation is physically performance over row-stores in the case that workloads re- stored as a separate column allowing queries to load only quire only a few attributes of wide tables at a time. Each the required attributes. The overhead incurred is on-the-fly relation R is physically stored as a set of columns; one col- tuple reconstruction for multi-attribute queries. Each tu- umn for each attribute of R. This way, a query needs to load ple reconstruction is a join of two columns based on tuple only the required attributes from each relevant relation. IDs, making it a significant cost component. The ultimate This happens at the expense of requiring explicit (partial) physical design is to have multiple presorted copies of each tuple reconstruction in case multiple attributes are required. base table such that tuples are already appropriately orga- Each tuple reconstruction is a join between two columns nized in multiple different orders across the various columns. based on tuple IDs/positions and becomes a significant cost This requires the ability to predict the workload, idle time component in column-stores especially for multi-attribute to prepare, and infrequent updates. queries [2, 6, 10].