<<

Constraints, Indices, B-Trees

1 Maintaining Integrity of Data • You are creating a search engine for Rhodes' website, called Rhoogle. • You have an SQL query: – "SELECT * FROM pages WHERE name='" + VAR + "';"

3 Maintaining Integrity of Data • Data is dirty. • How does an application ensure that a database modification does not corrupt the tables?

• Two approaches: – Application programs check that database modifications are consistent. – Use the features provided by SQL.

4 Integrity Checking in SQL

• Data type constraints (including NOT NULL). • PRIMARY KEY and UNIQUE constraints. • FOREIGN KEY constraints. • Constraints on attributes and . • Triggers (schema-level constraints).

5 Constraints and Queries

• Often, constraints involve attributes we often perform searches (SQL SELECTs) on.

• To speed up queries, DBs will often create indices automatically for you.

6 Indexes

• Index = used to speed access to tuples of a relation, given values of one or more attributes.

7 Declaring Indexes

• No standard! • Typical syntax: CREATE INDEX MovieIdx ON Movie(MovieId); CREATE INDEX CastsIdx ON Casts(ActorId, MovieId);

8 Types of Indexes

• Primary: index on a key – Used to enforce constraints • Secondary: index on non-key attribute

9 Using Indexes: Equality Searches

• Given a value v, the index takes us to only those tuples that have v in the attribute(s) of the index. • What data structure would be useful here?

10 Using Indexes: Range Searches

• "Find all students with GPA > 3.0" • What data structure(s) work here?

11 Range Searches

• "Find all students with GPA > 3.0" • May be slow, even on sorted file • Solution: Create an index file.

k1 k2 kN Index File

Page 1 Page 2 Page 3 Page N Data File

12 13 B-trees

• Extension of binary search trees to n-way search trees (where n > 2) • Balanced (like red-black trees)

14 Why B-Trees Are So Great for DB Indexes • DBs are usually on disk, not RAM – B- structure aligns with disk pages – Hierarchical structure minimizes number of disk reads. • Keeps info in sorted order for equality or range searches. • Balanced gives fast searches, insertions, deletions.

15 Definition

• B-tree of order d is a (2d+1) tree: – Internal nodes have one more child (pointer) than data elements (keys). Leaf nodes have no children. – Root has between 1 and 2d data elements. – Non-root nodes have between d and 2d elements. – All leaves are at the same depth in the tree. – Has extended search property (binary property extended to multiway tree)

16 Algorithms: Search

• Extrapolated from .

17 Algorithms: Insert • First, find leaf node where data would go. • Insert(data, node): – If data can fit in node, add it to the node. – If causes overflow: • split node at the median value. • Everything less than median becomes new leaf node. • Everything greater than median becomes new leaf node. • Promote median to parent node; call insert(median, parent) [may create new parent node if there is no parent]

18 Algorithms: Delete

• Search for item to delete • If at leaf node, delete the item – Rebalance up from leaf if necessary • If at internal node, swap with largest child in left sub-tree (analogous to BST deletion swap) – Rebalance if necessary

19