Constraints, Indices, B-Trees
1 Maintaining Integrity of Data • You are creating a search engine for Rhodes' website, called Rhoogle. • You have an SQL query: – "SELECT * FROM pages WHERE name='" + VAR + "';"
3 Maintaining Integrity of Data • Data is dirty. • How does an application ensure that a database modification does not corrupt the tables?
• Two approaches: – Application programs check that database modifications are consistent. – Use the features provided by SQL.
4 Integrity Checking in SQL
• Data type constraints (including NOT NULL). • PRIMARY KEY and UNIQUE constraints. • FOREIGN KEY constraints. • Constraints on attributes and tuples. • Triggers (schema-level constraints).
5 Constraints and Queries
• Often, constraints involve attributes we often perform searches (SQL SELECTs) on.
• To speed up queries, DBs will often create indices automatically for you.
6 Indexes
• Index = data structure used to speed access to tuples of a relation, given values of one or more attributes.
7 Declaring Indexes
• No standard! • Typical syntax: CREATE INDEX MovieIdx ON Movie(MovieId); CREATE INDEX CastsIdx ON Casts(ActorId, MovieId);
8 Types of Indexes
• Primary: index on a key – Used to enforce constraints • Secondary: index on non-key attribute
9 Using Indexes: Equality Searches
• Given a value v, the index takes us to only those tuples that have v in the attribute(s) of the index. • What data structure would be useful here?
10 Using Indexes: Range Searches
• "Find all students with GPA > 3.0" • What data structure(s) work here?
11 Range Searches
• "Find all students with GPA > 3.0" • May be slow, even on sorted file • Solution: Create an index file.
k1 k2 kN Index File
Page 1 Page 2 Page 3 Page N Data File
12 13 B-trees
• Extension of binary search trees to n-way search trees (where n > 2) • Balanced (like red-black trees)
14 Why B-Trees Are So Great for DB Indexes • DBs are usually on disk, not RAM – B-tree structure aligns with disk pages – Hierarchical structure minimizes number of disk reads. • Keeps info in sorted order for equality or range searches. • Balanced tree structure gives fast searches, insertions, deletions.
15 Definition
• B-tree of order d is a (2d+1) tree: – Internal nodes have one more child (pointer) than data elements (keys). Leaf nodes have no children. – Root has between 1 and 2d data elements. – Non-root nodes have between d and 2d elements. – All leaves are at the same depth in the tree. – Has extended search property (binary search tree property extended to multiway tree)
16 Algorithms: Search
• Extrapolated from binary tree search algorithm.
17 Algorithms: Insert • First, find leaf node where data would go. • Insert(data, node): – If data can fit in node, add it to the node. – If causes overflow: • split node at the median value. • Everything less than median becomes new leaf node. • Everything greater than median becomes new leaf node. • Promote median to parent node; call insert(median, parent) [may create new parent node if there is no parent]
18 Algorithms: Delete
• Search for item to delete • If at leaf node, delete the item – Rebalance up from leaf if necessary • If at internal node, swap with largest child in left sub-tree (analogous to BST deletion swap) – Rebalance if necessary
19