Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems 1Jia Yu 2Mohamed Sarwat School of Computing, Informatics, and Decision Systems Engineering Arizona State University, 699 S. Mill Avenue, Tempe, AZ
[email protected],
[email protected] ABSTRACT Table 1: Index overhead and storage dollar cost Classic database indexes (e.g., B+-Tree), though speed up + queries, suffer from two main drawbacks: (1) An index usu- (a) B -Tree overhead ally yields 5% to 15% additional storage overhead which re- TPC-H Index size Initialization time Insertion time sults in non-ignorable dollar cost in big data scenarios espe- 2GB 0.25 GB 30 sec 10 sec cially when deployed on modern storage devices. (2) Main- 20 GB 2.51 GB 500 sec 1180 sec taining an index incurs high latency because the DBMS has 200 GB 25 GB 8000 sec 42000 sec to locate and update those index pages affected by the un- derlying table changes. This paper proposes Hippo a fast, (b) Storage dollar cost yet scalable, database indexing approach. It significantly HDD EnterpriseHDD SSD EnterpriseSSD shrinks the index storage and mitigates maintenance over- 0.04 $/GB 0.1 $/GB 0.5 $/GB 1.4 $/GB head without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space oc- table. Even though classic database indexes improve the cupied by the index. It maintains simplified histograms that query response time, they face the following challenges: represent the data distribution and adopts a page group- ing technique that groups contiguous pages into page ranges • Indexing Overhead: Adatabaseindexusually based on the similarity of their index key attribute distri- yields 5% to 15% additional storage overhead.