<<

M-ary Search

B-Trees • Maximum branching factor of M • Complete tree has height = Section 4.7 in Weiss # disk accesses for find :

Runtime of find :

2

Solution: B-Trees B-Trees

• specialized M-ary search trees What makes them disk-friendly?

• Each has (up to) M-1 keys: 1. Many keys stored in a node – subtree between two keys x and y contains leaves with values v such that • All brought to memory/cache in one access! 3 7 12 21 x ≤ v < y 2. Internal nodes contain only keys; • Pick branching factor M Only leaf nodes contain keys and actual data such that each node • The tree can be loaded into memory takes one full irrespective of data object size {page, block } x<3 3≤x<7 7≤x<12 12 ≤x<21 21 ≤x • Data actually resides in disk of memory 3 4

B-Tree: Example B-Tree Properties ‡

B-Tree with M = 4 (# pointers in internal node) and L = 4 (# data items in leaf) – Data is stored at the leaves – All leaves are at the same depth and contain between 10 40 »L/2 ÿ and L data items – Internal nodes store up to M-1 keys 3 15 20 30 50 – Internal nodes have between »M/2 ÿ and M children – Root (special case) has between 2 and M children (or root could be a leaf) 1 2 10 11 12 20 25 26 40 42 AB xG 3 5 6 9 15 17 30 32 33 36 50 60 70

Data objects, that I’ll Note: All leaves at the same depth! ignore in slides 5 ‡These are technically B +-Trees 6

1 Example, Again B-trees vs. AVL trees

B-Tree with M = 4 Suppose we have 100 million items (100,000,000): and L = 4

10 40 • Depth of AVL Tree

3 15 20 30 50 • Depth of B+ Tree with M = 128, L = 64

1 2 10 11 12 20 25 26 40 42 3 5 6 9 15 17 30 32 33 36 50 60 70

(Only showing keys, but leaves also have data!) 7 8

Building a B-Tree M = 3 L = 2 Splitting the Root

Too many keys in a leaf!

1 3 14 14 3 3 14 Insert( 1) And create Insert( 3) Insert( 14 ) 3 14 a new root 1 3 14 The empty 1 3 14 B-Tree M = 3 L = 2 So, split the leaf.

Now, Insert( 1)?

9 10

M = 3 L = 2 M = 3 L = 2 Propagating Splits Overflowing leaves Too many keys in a leaf! 14 59 14 14 59 14 14 Insert( 5) 14 26 59 Add new Insert( 59 ) Insert( 26 ) 1 3 1 3 14 26 59 child 1 3 14 1 3 14 59 1 3 5 14 26 59 Split the leaf, but no space in parent!

So, split the leaf . 14 5 14 59

14 59 5 59 Create a And add new root a new child 1 3 14 26 59 1 3 5 14 26 59 1 3 5 14 26 59

11 12 So, split the node.

2 M = 3 L = 2 Insertion Algorithm After More Routine Inserts

1. Insert the key in its leaf 3. If an internal node ends up 14 2. If the leaf ends up with L+1 with M+1 items, overflow ! items, overflow ! – Split the node into two nodes: Insert( 89 ) » ÿ 5 59 – Split the leaf into two nodes: • original with »»(M+1)/2 ÿÿ items (M+1)/2 ⁄⁄⁄ Insert( 79 ) • original with »»»(L+1)/2 ÿÿÿ items • new one with items • new one with (L+1)/2 ⁄⁄⁄ items – Add the new child to the parent 1 3 5 14 26 59 – Add the new child to the parent – If the parent ends up with M+1 14 – If the parent ends up with M+1 items, overflow ! items, overflow ! 4. Split an overflowed root in two 5 59 89 and hang the new nodes under a new root 1 3 5 14 26 59 79 89 This makes the tree deeper!

13 14

M = 3 L = 2 M = 3 L = 2 Deletion Deletion and Adoption A leaf has too few keys! 1. Delete item from leaf 2. Update keys of ancestors if necessary 14 14 Delete( 5) 5 79 89 ? 79 89 14 14

Delete( 59 ) 1 3 5 14 26 79 89 1 3 14 26 79 89 5 59 89 5 79 89

So, borrow from a sibling 1 3 5 14 26 59 79 89 1 3 5 14 26 79 89 14 What could go wrong? 3 79 89

15 1 3 3 14 26 79 89 16

M = 3 L = 2 Does Adoption Always Work? Deletion and Merging A leaf has too few keys! • What if the sibling doesn’t have enough for you to 14 14 borrow from? Delete( 3) 3 79 89 ? 79 89 e.g. you have »L/2 ÿ-1 and sibling has »L/2 ÿ ? 1 3 14 26 79 89 1 14 26 79 89 And no sibling with surplus!

14

So, delete 79 89 the leaf But now an internal node 17 18 has too few subtrees! 1 14 26 79 89

3 M = 3 L = 2 Deletion with Propagation M = 3 L = 2 A Bit More Adoption (More Adoption)

14 79 79 79

Adopt a Delete( 1) 79 89 14 89 14 89 26 89 neighbor (adopt a sibling) 1 14 26 79 89 1 14 26 79 89 1 14 26 79 89 14 26 79 89

19 20

M = 3 L = 2 M = 3 L = 2 Pulling out the Root A leaf has too few keys! Pulling out the Root (continued) And no sibling with surplus! The root 79 79 has just one subtree! Delete( 26 ) So, delete Simply make 26 89 89 the leaf; the one child the new root! merge 79 89 14 26 79 89 14 79 89

14 79 89 But now the root A node has too few subtrees has just one subtree! and no neighbor with surplus! 79 89 79

Delete 79 89 89 the node 14 79 89 21 22 14 79 89 14 79 89

Deletion Algorithm Deletion Slide Two

1. Remove the key from its leaf 3. If an internal node ends up with fewer than »»»M/2 ÿÿÿ items, underflow ! – Adopt from a neighbor; 2. If the leaf ends up with fewer update the parent than »»»L/2 ÿÿÿ items, underflow ! – If adoption won’t work, – Adopt data from a sibling; merge with neighbor update the parent – If the parent ends up with fewer than – If adopting won’t work, delete »»»M/2 ÿÿÿ items, underflow ! This reduces the node and merge with neighbor height of the tree! – If the parent ends up with 4. If the root ends up with only one fewer than »»»M/2 ÿÿÿ items, child, make the child the new root underflow ! 23 of the tree 24

4 Thinking about B-Trees Tree Names You Might Encounter

• B-Tree insertion can cause (expensive) splitting and FYI: propagation – B-Trees with M = 3 , L = x are called 2-3 trees • Nodes can have 2 or 3 keys • B-Tree deletion can cause (cheap) adoption or – B-Trees with M = 4 , L = x are called 2-3-4 trees (expensive) deletion, merging and propagation • Nodes can have 2, 3, or 4 keys • Propagation is rare if M and L are large (Why?) • If M = L = 128 , then a B-Tree of height 4 will store at least 30,000,000 items

25 26

5