(Fall 2019) :: Tree Indexes II

Tree Indexes 08 Part II Intro to Database Systems Andy Pavlo 15-445/15-645 Computer Science Fall 2019 AP Carnegie Mellon University 2 UPCOMING DATABASE EVENTS Vertica Talk → Monday Sep 23rd @ 4:30pm → GHC 8102 CMU 15-445/645 (Fall 2019) 3 TODAY'S AGENDA More B+Trees Additional Index Magic Tries / Radix Trees Inverted Indexes CMU 15-445/645 (Fall 2019) 4 B+TREE: DUPLICATE KEYS Approach #1: Append Record Id → Add the tuple's unique record id as part of the key to ensure that all keys are unique. → The DBMS can still use partial keys to find tuples. Approach #2: Overflow Leaf Nodes → Allow leaf nodes to spill into overflow nodes that contain the duplicate keys. → This is more complex to maintain and modify. CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert 6 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 9 <5 <9 1 3 6 7 8 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 97 9 <5 <9 1 3 6 7 8 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 97 9 <5 <9 <9 ≥9 1 3 6 76 8 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 6 B+TREE: OVERFLOW LEAF NODES Insert 6 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 6 CMU 15-445/645 (Fall 2019) 6 B+TREE: OVERFLOW LEAF NODES Insert 6 5 9 Insert 7 <5 <9 ≥9 1 3 6 7 8 9 13 6 7 CMU 15-445/645 (Fall 2019) 6 B+TREE: OVERFLOW LEAF NODES Insert 6 5 9 Insert 7 <5 <9 ≥9 Insert 6 1 3 6 7 8 9 13 6 7 6 CMU 15-445/645 (Fall 2019) 7 DEMO B+Tree vs. Hash Indexes Table Clustering CMU 15-445/645 (Fall 2019) 9 IMPLICIT INDEXES Most DBMSs automatically create an index to enforce integrity constraints but not referential constraints (foreign keys). → Primary Keys → Unique Constraints CREATE TABLE foo ( CREATE UNIQUE INDEX foo_pkey id SERIAL PRIMARY KEY, ON foo (id); val1 INT NOT NULL, val2 VARCHAR(32) UNIQUE CREATE UNIQUE INDEX foo_val2_key ); ON foo (val2); CMU 15-445/645 (Fall 2019) 9 IMPLICIT INDEXES Most DBMSs automatically create an index to enforce integrity constraints but not referential constraints (foreign keys). → Primary Keys → Unique Constraints CREATE TABLE foo ( CREATE TABLE bar ( id SERIAL PRIMARY KEY, id INT REFERENCES foo (val1), val1 INT NOT NULL, val VARCHAR(32) val2 VARCHAR(32) UNIQUE ); ); CMU 15-445/645 (Fall 2019) 9 IMPLICIT INDEXES Most DBMSs automatically create an index to enforce integrity constraints but not referential constraints (foreign keys). → Primary Keys CREATE INDEX foo_val1_key → Unique Constraints ON foo (val1); CREATE TABLE foo ( CREATE TABLE bar ( id SERIAL PRIMARY KEY, id INT REFERENCES foo (val1), val1 INT NOT NULL, val VARCHAR(32) val2 VARCHAR(32) UNIQUE ); ); CMU 15-445/645 (Fall 2019) 9 IMPLICIT INDEXES Most DBMSs automatically create an index to enforce integrity constraints but not referential constraints (foreign keys). → Primary Keys CREATE INDEX foo_val1_key → Unique Constraints ON foo (val1); CREATE TABLE foo ( CREATE TABLE bar ( id SERIAL PRIMARY KEY, id INT REFERENCES foo (val1), val1 INT NOT NULL, val VARCHAR(32) X val2 VARCHAR(32) UNIQUE ); ); CMU 15-445/645 (Fall 2019) 9 IMPLICIT INDEXES Most DBMSs automatically create an index to enforce integrity constraints but not referential constraints (foreign keys). → Primary Keys → Unique Constraints CREATE TABLE foo ( CREATE TABLE bar ( id SERIAL PRIMARY KEY, id INT REFERENCES foo (val1), val1 INT NOT NULL UNIQUE, val VARCHAR(32) val2 VARCHAR(32) UNIQUE ); ); CMU 15-445/645 (Fall 2019) 10 PARTIAL INDEXES Create an index on a subset of the CREATE INDEX idx_foo entire table. This potentially reduces ON foo (a, b) WHERE c = 'WuTang'; its size and the amount of overhead to maintain it. One common use case is to partition indexes by date ranges. → Create a separate index per month, year. CMU 15-445/645 (Fall 2019) 10 PARTIAL INDEXES Create an index on a subset of the CREATE INDEX idx_foo entire table. This potentially reduces ON foo (a, b) WHERE c = 'WuTang'; its size and the amount of overhead to maintain it. SELECT b FROM foo WHERE a = 123 One common use case is to partition AND c = 'WuTang'; indexes by date ranges. → Create a separate index per month, year. CMU 15-445/645 (Fall 2019) 11 COVERING INDEXES If all the fields needed to process the CREATE INDEX idx_foo query are available in an index, then ON foo (a, b); the DBMS does not need to retrieve SELECT b FROM foo the tuple. WHERE a = 123; This reduces contention on the DBMS's buffer pool resources. CMU 15-445/645 (Fall 2019) 11 COVERING INDEXES If all the fields needed to process the CREATE INDEX idx_foo query are available in an index, then ON foo (a, b); the DBMS does not need to retrieve SELECT b FROM foo the tuple. WHERE a = 123; This reduces contention on the DBMS's buffer pool resources. CMU 15-445/645 (Fall 2019) 12 INDEX INCLUDE COLUMNS Embed additional columns in indexes CREATE INDEX idx_foo to support index-only queries. ON foo (a, b) INCLUDE (c); These extra columns are only stored in the leaf nodes and are not part of the search key. CMU 15-445/645 (Fall 2019) 12 INDEX INCLUDE COLUMNS Embed additional columns in indexes CREATE INDEX idx_foo to support index-only queries. ON foo (a, b) INCLUDE (c); These extra columns are only stored SELECT b FROM foo in the leaf nodes and are not part of WHERE a = 123 the search key. AND c = 'WuTang'; CMU 15-445/645 (Fall 2019) 12 INDEX INCLUDE COLUMNS Embed additional columns in indexes CREATE INDEX idx_foo to support index-only queries. ON foo (a, b) INCLUDE (c); These extra columns are only stored SELECT b FROM foo in the leaf nodes and are not part of WHERE a = 123 the search key. AND c = 'WuTang'; CMU 15-445/645 (Fall 2019) 12 INDEX INCLUDE COLUMNS Embed additional columns in indexes CREATE INDEX idx_foo to support index-only queries. ON foo (a, b) INCLUDE (c); These extra columns are only stored SELECT b FROM foo in the leaf nodes and are not part of WHERE a = 123 the search key. AND c = 'WuTang'; CMU 15-445/645 (Fall 2019) 13 FUNCTIONAL/EXPRESSION INDEXES An index does not need to store keys SELECT * FROM users in the same way that they appear in WHERE EXTRACT(dow FROM login) = 2; their base table. ⮱ CREATE INDEX idx_user_login ON users (login); CMU 15-445/645 (Fall 2019) 13 FUNCTIONAL/EXPRESSION INDEXES An index does not need to store keys SELECT * FROM users in the same way that they appear in WHERE EXTRACT(dow FROM login) = 2; their base table. ⮱ CREATE INDEX idx_user_login ON usersX (login); CMU 15-445/645 (Fall 2019) 13 FUNCTIONAL/EXPRESSION INDEXES An index does not need to store keys SELECT * FROM users in the same way that they appear in WHERE EXTRACT(dow FROM login) = 2; their base table. ⮱ CREATE INDEX idx_user_login You can use expressions when ON users (login); declaring an index. X CREATE INDEX idx_user_login ON users (EXTRACT(dow FROM login)); CMU 15-445/645 (Fall 2019) 13 FUNCTIONAL/EXPRESSION INDEXES An index does not need to store keys SELECT * FROM users in the same way that they appear in WHERE EXTRACT(dow FROM login) = 2; their base table. ⮱ CREATE INDEX idx_user_login You can use expressions when ON users (login); declaring an index. X CREATE INDEX idx_user_login ON users (EXTRACT(dow FROM login)); CMU 15-445/645 (Fall 2019) 13 FUNCTIONAL/EXPRESSION INDEXES An index does not need to store keys SELECT * FROM users in the same way that they appear in WHERE EXTRACT(dow FROM login) = 2; their base table. ⮱ CREATE INDEX idx_user_login You can use expressions when ON users (login); declaring an index. X CREATE INDEX idx_user_login ON users (EXTRACT(dow FROM login)); CREATE INDEX idx_user_login ON foo (login) WHERE EXTRACT(dow FROM login) = 2; CMU 15-445/645 (Fall 2019) 14 OBSERVATION The inner node keys in a B+Tree cannot tell you whether a key exists in the index. You must always traverse to the leaf node. This means that you could have (at least) one buffer pool page miss per level in the tree just to find out a key does not exist. CMU 15-445/645 (Fall 2019) 15 TRIE INDEX Keys: HELLO, HAT, HAVE H Use a digital representation of keys to examine prefixes one- A E by-one instead of comparing entire key. T V L → Also known as Digital Search Tree, Prefix Tree. ¤ E L ¤ O ¤ CMU 15-445/645 (Fall 2019) 15 TRIE INDEX Keys: HELLO, HAT, HAVE H Use a digital representation of keys to examine prefixes one- A E by-one instead of comparing entire key.

(Fall 2019) :: Tree Indexes II

KP-Trie Algorithm for Update and Search Operations

The Adaptive Radix Tree

Artful Indexing for Main-Memory Databases

STAT 534 Lecture 4 April, 2019 Radix Trees C 2002 Marina Meil˘A [email protected]

Wormhole: a Fast Ordered Index for In-Memory Data Management

Hyperion: Building the Largest In-Memory Search Tree

START – Self-Tuning Adaptive Radix Tree

Virtual Memory Algorithm Improvement

File Processing & Organization Course No: 5901227-3

Advanced Data Structures – SCSA1304

DART: Distributed Adaptive Radix Tree for Efficient Affix-Based

Space-Efficient Data Structures for Top-K Completion