CS 350 Algorithms and Complexity

Fall 2015

Lecture 12: Space & Time Tradeoffs. Part 2: Hashing & B-Trees

Andrew P. Black

Department of Computer Science Portland State University Space-for-time tradeoffs ✦ Two varieties of space-for-time algorithms: ✦ input enhancement: preprocess the input (all or part) to store some info to be used later in solving the problem ✦ counting sorts ✦ string searching algorithms ✦ prestructuring — preprocess the input to make accessing its elements easier ✦ hashing ✦ indexing schemes (e.g., B-trees)

2 Hashing ✦ A very efficient method for implementing a dictionary, i.e., a mapping from keys to values with the operations: ✦ find(k) — also called at(k), lookup(k), [k] ✦ insert(k, v) — also called at(k) put(v), put (k, v), [k] := v ✦ delete(k) ✦ Based on representation-change and space-for- time tradeoff ideas ✦ Important applications: ✦ symbol tables ✦ (values are not relevant) ✦ databases (extendible hashing)

3 Hash tables and hash functions ✦ The idea of hashing is to map n keys into an array of size m, called the hash table, by using a predefined function, called the , h: key → index in the hash table ✦ Example: student records, key = StudentID. ✦ Hash function: h(K) = K mod m where m is some integer (typically, prime) ✦ Generally, a hash function should: ✦ be easy to compute ✦ distribute keys about evenly throughout the hash table

4 Question: ✦ If m = 1000, h(k) = k mod m,

where is record with StudentId = 990159265 stored?

✦ Numeric answer:

5 Collisions

✦ If h(k1) = h(k2), there is a collision ✦ Good hash functions result in fewer collisions than bad ones, but: ✦ some collisions should be expected (birthday paradox) ✦ Two principle hashing schemes handle collisions differently:

✦ Open hashing – each cell is a header of “chain” — a list of all keys hashed to it ✦ Closed hashing ✦ one key per cell ✦ in case of collision, finds another cell in the hash table itself by ✦ : use next free bucket, or ✦ : use second hash function to compute increment

6 What Levitin didn’t tell you ✦ Hash functions usually designed in two parts: 1. the hash Object [0..232) ! 2. the address calculation [0..232) [0..hashTableSize) !

7 A Real Hash Function: function robertJenkins32bit5shiftHash(n) { var answer = n & 0xFFFFFFFF; answer = 0x479AB41D + answer + (answer << 8); answer = answer & 0xFFFFFFFF; answer = (0xE4AA10CE ⊻ answer) ⊻ (answer >> 5); answer = 0x09942F0A6 + answer - (answer << 14); answer = (0x5AEDD67D ⊻ answer) ⊻ (answer >> 3); answer = 0x17BEA992 + answer + (answer << 7); return (answer & 0xFFFFFFFF); }

Returns 32 “scrambled” bits

8 Open hashing (Separate chaining) ✦ Keys are stored in linked lists outside a hash table whose elements serve as the lists’ headers. ✦ Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(k) = sum of key’s letters’ positions in the alphabet mod 13 key A FOOL AND HIS MONEY ARE SOON PARTED h(key) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

A AND MONEY FOOL HIS ARE PARTED

SOON

9 Open hashing (Separate chaining) ✦ Keys are stored in linked lists outside a hash table whose elements serve as the lists’ headers. ✦ Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(k) = sum of key’s letters’ positions in the alphabet mod 13 key A FOOL AND HIS MONEY ARE SOON PARTED h(key) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

A AND MONEY FOOL HIS ARE PARTED

Search for KID SOON

9 Open hashing (Separate chaining) ✦ Keys are stored in linked lists outside a hash table whose elements serve as the lists’ headers. ✦ Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(k) = sum of key’s letters’ positions in the alphabet mod 13 key A FOOL AND HIS MONEY ARE SOON PARTED h(key) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

A AND MONEY FOOL HIS ARE PARTED

Search for KID SOON h(KID) = 11 9 Open hashing (cont.) ✦ The ratio α = n/m is called the load factor ✦ If hash function distributes keys uniformly, average length of will be α ✦ Average number of key comparisons in successful, S, and unsuccessful searches, U: S ≈ 1+α/2, U = α ✦ α is typically kept small (ideally, about 1) ✦ Open hashing still works if n > m – but it’s much less efficient

10 Efficiencies (open hashing)

Successful Unsuccessful α S ≈ 1+α/2 U = α 0.33 1.17 0.33 0.50 1.25 0.50 0.75 1.38 0.75 0.90 1.45 0.90 0.95 1.48 0.95 1.00 1.50 1.00 1.50 1.75 1.50 2.00 2.00 2.00

11 Mnemonic ✦ This method is called open hashing because the hash table is open: ✦ the stored keys and values are not in the table, but “hang” out of it ... ✦ on separate chains (hence the alternative name) 0 1 2 3 4 5 6 7 8 9 10 11 12

A AND MONEY FOOL HIS ARE PARTED

SOON

12 Question: ✦ Suppose that you have 1000 records in memory, and you index them in a hash table with α = 1. How many words of additional memory will be needed if you use an open hash table with separate chaining? Each pointer occupies 1 word. Additional means beyond the table and the records.) ✦ Numeric answer:

13 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (Open addressing) ✦ All keys are stored inside the hash table.

Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON

14 Closed hashing (cont.) ✦ Does not work if n > m ✦ Avoids pointer chains ✦ Deletions are not straightforward ✦ Number of probes to find/insert/delete a key depends on load factor α = n/m and on collision resolution strategy. ✦ For linear probing: S ≈ ½ (1+ 1/(1- α)) and U ≈ ½ (1+ 1/(1- α)²)

15 Efficiencies (closed hashing) ✦ As the table fills (α approaches 1), number of probes in linear probing increases dramatically: Successful Unsuccessful

α S ≈ ½(1+1/(1-α)) U ≈ ½(1+1/(1-α)2) 0.33 1.25 1.625 0.50 1.5 2.5 0.75 2.5 8.5 0.90 5.5 50.5 0.95 10.5 200.5 0.99 50.5 5000.5

16 Question: ✦ Suppose that you have 1000 records in memory, and you index them in a hash table with α = 1/2. How many words of additional memory will be needed if you use a closed hash table? Each pointer occupies 1 word. Additional means beyond the table and the records.) ✦ Numeric answer:

17 Question: ✦ For the input: 30 20 56 75 31 19

and hash function h(k)= k mod 11 1. Construct an open hash table of size 11 0 1 2 3 4 5 6 7 8 9 10

2. What is the largest number of key comparisons in a successful search in this table?

18 Question: ✦ For the input: 30 20 56 75 31 19 30 mod 11 = 8 and hash function h(k)= k mod 11 1. Construct an open hash table of size 11 0 1 2 3 4 5 6 7 8 9 10

2. What is the largest number of key comparisons in a successful search in this table?

18 Question: ✦ For the input: 30 20 56 75 31 19 30 mod 11 = 8 and hash function h(k)= k mod 11 1. Construct an open hash table of size 11 0 1 2 3 4 5 6 7 8 9 10

30 2. What is the largest number of key comparisons in a successful search in this table?

18 Question: ✦ For the input: 30 20 56 75 31 19

and hash function h(k)= k mod 11 1. Construct a closed hash table of size 11 0 1 2 3 4 5 6 7 8 9 10 30

2. What is the largest number of key comparisons in a successful search in this table?

19 Question: ✦ For the input: 30 20 56 75 31 19 30 mod 11 = 8 and hash function h(k)= k mod 11 1. Construct a closed hash table of size 11 0 1 2 3 4 5 6 7 8 9 10 30

2. What is the largest number of key comparisons in a successful search in this table?

19 Question: ✦ Suppose that you have 1000 records in memory, and you have 3000 words of memory to use for indexing them. What’s the best way to use that memory if most searches are expected to be successful? (Assume that each pointer occupies 1 word) A. Open Hash table with separate chaining B. Closed Hash table with open addressing C. Binary search

20 Problem ✦ Fill in the following table with the average-case efficiencies for 5 implementations of Dictionary

Open Hash sorted Binary Closed Hash array w/separate w/linear array chaining probing

search

insertion

deletion

21 2–3 trees ✦ 2–3 trees are perfectly-balanced search trees where each node has either: ✦ 2 children (and 1 key), or ✦ 3 children (and 2 keys). ✦ Details: ✦ Levitin pp. 223–5. ✦ Lyn Turbak on 2–3 Trees ✦ Visualization of 2–3 trees

22 B+ Trees (slides based on those of Miriam Sy) ✦ The B+ Tree index is the most widely-used of several index that maintain their efficiency despite insertion and deletion of . ✦ Balanced tree: every path from the root of the tree to a leaf is of the same length. ✦ All data are in leaves. ✦ Hence: B+ tree

23 B Trees & B+ Trees

B Trees: B+ Trees: • Multi-way trees • Contains features from B • Dynamic growth Trees • Data stored with keys • Dynamic growth throughout the tree • Separates index and data pages pages • Internal nodes contain just keys • All data are at the leaves • leaves can be linked sequentially 24 B Tree Order ✦ The “order” of a B tree is the: A. way the purchasing department buys it B. maximum number of children any node can have C. minimum number of children any node can have D. number of keys in each index node E. height of the tree

25 Fill Factor

k1 k2 k3

26 Fill Factor ✦ B+ Trees use a “fill factor” to control growth: fill factor = fraction of keys filled

k1 k2 k3

26 Fill Factor ✦ B+ Trees use a “fill factor” to control growth: fill factor = fraction of keys filled

k1 k2 k3

✦ What is the minimum fill factor for a B+ Tree? A. 33.3% B. 50% C. 75% D. None of the above

26 B+ Tree Statistics

27 B+ Tree Statistics ✦ Each node* has between ⌈m/2⌉ and m children

27 B+ Tree Statistics * Except the root ✦ Each node* has between ⌈m/2⌉ and m children

27 B+ Tree Statistics * Except the root ✦ Each node* has between ⌈m/2⌉ and m children ✦ 50% fill factor is the minimum for a B+ Tree.

27 B+ Tree Statistics * Except the root ✦ Each node* has between ⌈m/2⌉ and m children ✦ 50% fill factor is the minimum for a B+ Tree. ✦ For tree of order 5, these guidelines must be met:

27 B+ Tree Statistics * Except the root ✦ Each node* has m, max between ⌈m/2⌉ and pointers/page 5

m children n = m−1, max 4 ✦ 50% fill factor is keys/page the minimum for a ⌈m/2⌉, min 3 B+ Tree. pointers/page min keys/page ✦ For tree of order 5, 2 these guidelines minimum fill 50% must be met: factor

27 B+ Tree Statistics * Except the root ✦ Each node* has m, max between ⌈m/2⌉ and pointers/page 5

m children n = m−1, max 4 ✦ 50% fill factor is keys/page the minimum for a ⌈m/2⌉, min 3 B+ Tree. pointers/page min keys/page ✦ For tree of order 5, 2 these guidelines minimum fill 50% must be met: factor

27 B+ Tree Statistics * Except the root ✦ Each node* has m, max between ⌈m/2⌉ and pointers/page 5

m children n = m−1, max 4 ✦ 50% fill factor is keys/page the minimum for a ⌈m/2⌉, min 3 B+ Tree. pointers/page min keys/page ✦ For tree of order 5, 2 these guidelines minimum fill 50% must be met: factor

27 B+ Tree Statistics * Except the root ✦ Each node* has m, max between ⌈m/2⌉ and pointers/page 5

m children n = m−1, max 4 ✦ 50% fill factor is keys/page the minimum for a ⌈m/2⌉, min 3 B+ Tree. pointers/page min keys/page ✦ For tree of order 5, 2 these guidelines minimum fill 50% must be met: factor

27 B+ Tree Statistics * Except the root ✦ Each node* has m, max between ⌈m/2⌉ and pointers/page 5

m children n = m−1, max 4 ✦ 50% fill factor is keys/page the minimum for a ⌈m/2⌉, min 3 B+ Tree. pointers/page min keys/page ✦ For tree of order 5, 2 these guidelines minimum fill 50% must be met: factor

27 B+ Trees ✦ B+ Tree with m=5

✦ Note: B+ Tree in commercial database might have m=2048

28 Inserting Records ✦ The key determines a record’s placement in a B+ Tree ✦ The leaf pages are maintained in sequential order. A doubly-linked list (usually not shown) connects each leaf page with its sibling page(s). ✦ Why?

29 Inserting Records ✦ The key determines a record’s placement in a B+ Tree ✦ The leaf pages are maintained in sequential order. A doubly-linked list (usually not shown) connects each leaf page with its sibling page(s). ✦ Why?

29 Insertion Algorithm

Leaf Page Index Action Full? Page Full? No No Place the record in sorted position in the appropriate leaf page

1. Split the leaf page 2. Copy middle key into the index page in Yes No sorted order. 3. Left leaf page contains records with keys < the middle key. 4. Right leaf page contains records with keys ≥ than the middle key.

30 Insertion Algorithm, cont…

Leaf Page Index Action Full? Page Full?

1. Split the leaf page Yes Yes 2. Records with keys < middle key go to the left leaf page 3. Records with keys ≥ middle key go to the right leaf page. 4. Split the index page 5. Keys < middle key go to the left index page. 6. Keys > middle key go to the right index page. 7. The middle key goes to the next (higher) level. If the next level index page is full, continue splitting the index pages.

31 1st Case: leaf page & index page not full ✦ Original Tree:

✦ Insert 28:

32 1st Case: leaf page & index page not full ✦ Original Tree:

✦ Insert 28:

32 2nd Case: leaf page Full, index page Not Full ✦ Insert a record with a key value of 70.

33 2nd Case: leaf page Full, index page Not Full ✦ Insert a record with a key value of 70. ✦ This record should go in the leaf page with 50, 55, 60, and 65. ✦ This page is full, so we need to split it as old leaf page 50 55 60 65

new left leaf page new right leaf page 50 55 60 65 70

34 Now add the new leaf to the index: ✦ The middle key (60) is copied into the index page between 50 and 75.

35 Now add the new leaf to the index: ✦ The middle key (60) is copied into the index page between 50 and 75.

35 Now add the new leaf to the index: ✦ The middle key (60) is copied into the index page between 50 and 75.

35 Now add the new leaf to the index: ✦ The middle key (60) is copied into the index page between 50 and 75.

new page

35 3rd Case: leaf page & index page both full ✦ To prior example, we need to add 95:

36 3rd Case: leaf page & index page both full ✦ We need to add 95 to leaf with 75 80 85 90 which is full. So we need to split it into two pages: Left leaf page Right leaf page 75 80 85 90 95 ✦ The middle key, 85, rises to the index page.

37 3rd Case: leaf page & index page both full

✦ The index page is also full so we must split it:

New root index page 60

Left index page Right index page 25 50 75 85 38 Before…

and after insertion of 95: Rotation ✦ B+ Trees can incorporate rotation to reduce the number of page splits. ✦ A rotation occurs when a leaf page is full, but one of its siblings pages is not full. Rather than splitting the leaf page, we move a record to its sibling, adjusting indices as necessary. ✦ The left sibling is usually checked first, then the right sibling.

40 Rotation ✦ For example, consider again the B+ Tree before the addition of 70. This record belongs in the leaf node containing 50 55 60 65. Notice how this node is full but its left sibling is not.

✦ Using rotation, we move the record with the lowest key to its sibling. We must also modify the index page:

41 Before rotation:

After rotation:

42 Deletion Algorithm ✦ Like the insertion algorithm, there are three cases to consider when deleting a record:

Leaf Page Below Index Page Below Action Fill Factor Fill Factor

Delete the record from the leaf page. No No Arrange keys in ascending order to fill void. If the key of the deleted record appears in the index page, use the next key to replace it.

43 Leaf Page Index Page Action Below Fill Below Fill Factor ? Factor ? 1. Combine the leaf page and its sibling. Yes No 2. Change the index page to reflect the deletion of the leaf.

1. Combine the leaf page and its sibling. 2. Adjust the index page to reflect the Yes Yes other change. 3. Combine the index page with its sibling. 4. Continue combining index pages until you reach a page with the correct fill factor, or you reach the root.

44 Deletion ✦ To refresh our memories, here’s our B+ Tree right after the insertion of 95:

45 Deletion (1st case) ✦ Delete record with Key 70

46 Deletion (2nd case) ✦ Delete 25 from the B+ Tree ✦ Because 25 appears in the index page, when we delete 25, we must replace it with another key (here, with 28)

47 Deletion (2nd case) ✦ Delete 25 from the B+ Tree ✦ Because 25 appears in the index page, when we delete 25, we must replace it with another key (here, with 28)

47 Deletion (3rd case) ✦ Deleting 60 from the B+ Tree ✦ The leaf page containing 60 (60 65) will be below the fill factor after the deletion. Thus, we must combine leaf pages. ✦ The recombination removes one key from the index page. Hence, it will also fall below the fill factor. Thus, we must combine index pages. ✦ 60 appears as the only key in the root index page. Obviously, it will be removed with the deletion.

48 Deletion (3rd case) ✦ B+ Tree after deletion of 60

49 B+ Trees in Practice ✦ Typical Order: 200 to 1000 ✦ Typical Fill Factor: 67% ✦ Typical Capacities: ✦ Height 4: 1334 = 312,900,700 records ✦ Height 3: 1333 = 2,352,637 records

50 B+ Trees in Practice ✦ Fill factor for newly created tree is a design parameter ✦ Suppose I create a new B+ tree with a fill factor of 100% A. Index occupies minimum number of nodes B. Adding a new entry will always require recursive splitting C. Both of the above D. None of the above

51 B+ Trees in Practice ✦ Fill factor for newly created tree is a design parameter ✦ Suppose I create a new B+ tree with a fill factor of 50% A. Index occupies more nodes B. Adding a new entry is unlikely to require recursive splitting C. Both of the above D. None of the above

52 Example: IBM Red Brick Warehouse 6.3 Each index node is 8172 bytes and corresponds to one 8 kB file system block minus 20 bytes overhead, and key size is the size in bytes of the key itself, plus 6 bytes of address. Example Assume a key size of 10 bytes and a table with 50 000 rows. To calculate the number of 8 kbyte blocks required to store the index for various fill factors, use the formulas in Estimating the size of indexes. The results are: Fill factor keys per node Blocks 10% 81 627 50% 409 124 100% 817 63 If this table is to be loaded once and you anticipate no additions, use a fill factor of 100%.

53 Concluding Remarks ✦ Tree structured indexes are ideal for range- searches, also good for equality searches ✦ B+ Tree is a dynamic structure ✦ Insertions and deletions leave tree height- balanced ✦ High fanout means depth is often just 3 or 4 ✦ Root can be kept in main memory ✦ Almost always better than maintaining a sorted file

54 Problem ✦ Find the minimum order of the B+ tree that guarantees that no more than 3 disk reads will be necessary to access any record in a file of 100 million records. ✦ Assume that the root is in main memory. ✦ Hint: look at inequality 7.7

55 Problem ✦ Find the minimum order of the B+ tree that guarantees that no more than 3 disk reads will be necessary to access any record in a file of 100 million records. ✦ Assume that the root is in main memory. ✦ Hint: look at inequality 7.7

55 Problem ✦ Find the minimum order of the B+ tree that guarantees that no more than 3 disk reads will be necessary to access any record in a file of 100 million records. ✦ Assume that the root is in main memory. ✦ Hint: look at inequality 7.7

Numeric Answer:

55 Problem ✦ + 278 Space and Time Trade-OffsFind the minimum order of the B tree that guarantees that no more than 3 disk reads will be necessaryh 1 to access any n 4 m/2 − 1, record in a file≥ of⌈ 100⌉ million− records. which, in turn, yields✦ Assume the following that the upper root boundis in main on memory. the height h of the B-tree of order m with n nodes:✦ Hint: look at inequality 7.7 n 1 h log m/2 + 1. (7.7) ≤⌊ ⌈ ⌉ 4 ⌋ + Inequality (7.7) immediately implies that searching in a B-tree is a O(log n) Numeric Answer: operation. But it is important to ascertain here not just the efficiency class but the actual number of disk accesses implied by this formula. The55 following table contains the values of the right-hand-side estimates for a file of 100 million records and a few typical values of the tree’s order m:

order m 50 100 250 h’s upper bound 6 5 4

Keep in mind that the table’s entries are upper estimates for the number of disk accesses. In actual applications, this number rarely exceeds 3, with the B-tree’s root and sometimes first-level nodes stored in the fast memory to minimize the number of disk accesses. The operations of insertion and deletion are less straightforward than search- ing, but both can also be done in O(log n) time. Here we outline an insertion algorithm only; a deletion algorithm can be found in the references (e.g., [Aho83], [Cor09]). The most straightforward algorithm for inserting a new record into a B- tree is quite similar to the algorithm for insertion into a 2-3 tree outlined in Section 6.3. First, we apply the search procedure to the new record’s key K to find the appropriate leaf for the new record. If there is room for the record in that leaf, we place it there (in an appropriate position so that the keys remain sorted) and we are done. If there is no room for the record, the leaf is split in half by sending the second half of the records to a new node. After that, the smallest key K′ in the new node and the pointer to it are inserted into the old leaf’s parent (immediately after the key and pointer to the old leaf). This recursive procedure may percolate up to the tree’s root. If the root is already full too, a new root is created with the two halves of the old root’s keys split between two children of the new root. As an example, Figure 7.9 shows the result of inserting 65 into the B-tree in Figure 7.8 under the restriction that the leaves cannot contain more than three items. You should be aware that there are other algorithms for implementing inser- tions into a B-tree. For example, to avoid the possibility of recursive node splits, we can split full nodes encountered in searching for an appropriate leaf for the new record. Another possibility is to avoid some node splits by moving a key to the node’s sibling. For example, inserting 65 into the B-tree in Figure 7.8 can be done by moving 60, the smallest key of the full leaf, to its sibling with keys 51 and 55, and replacing the key value of their parent by 65, the new smallest value in Exercises 7.4

1. Give examples of using an index in real-life applications that do not involve computers. 2. a. Prove the equality

h−1 ! i−1 h−1 h−1 1+ 2⌈m/2⌉ (⌈m/2⌉−1) + 2⌈m/2⌉ =4⌈m/2⌉ − 1 i=1 that was used in the derivation of upper bound (7.7) for the height of a B-tree.

b. Complete the derivation of inequality (7.7). 3. Find the minimum order of the B-tree that guarantees that the number of disk accesses in searching in a file of 100 million records does not exceed Problem3. Assume that the root’s page is stored in main memory. 4. Draw the B-tree obtained after inserting 30 and then 31 in the B-tree in Figure 7.8. Assume that a leaf cannot contain more than three items. 5. Outline an algorithm for finding the largest key in a B-tree. 6. a. A top-down 2-3-4 tree is a B-tree of order 4 with the following modification of the insert operation. Whenever a search for a leaf for a new key encounters a full node (i.e., a node with three keys), the node is split into two nodes by sending its middle key to the node’s parent (or, if the full node happens to be the root, the new root for the middle key is created). Construct a top-down 2-3-4 tree by inserting the following list of keys in the initially empty tree:

10, 6, 15, 31, 20, 27, 50, 44, 18.

b. What is the principal advantage of this insertion procedure compared with the one described for 2-3 trees in Section 6.3? What is its disadvan- tage? 56 7. a. ◃ Write a program implementing a key insertion algorithm in a B-tree.

b. " Write a program for visualization of a key insertion algorithm in a B-tree.

20 3. If the tree’s root is stored in main memory, the number of disk accesses will be equal to the number of the levels minus 1, which is exactly the height of the tree. So, we need to find the smallest value of the order m so that the height of the B-tree with n =108 keys does not exceed 3. Using the upper bound of the B-tree’s height, we obtain the following inequality n +1 n +1 log +1 3 or log 2. ⌊ ⌈m/2⌉ 4 ⌋ ≤ ⌊ ⌈m/2⌉ 4 ⌋≤ By “trial and error,” we can find that the smallest value of m that satisfies this inequality is 585.

4. Since there is enough room for 30 in the leaf for it, the resulting B-tree will look as follows

Solution 20 51

11 15 25 34 40. 60

4, 7,10 11, 14 15, 16, 19 20, 24 25, 28, 30 34, 38 40, 43, 46 51, 55 60, 68, 80

Inserting 31 will require the leaf’s split and then its parent’s split:

20 34 51

11 15 25 30 34 40 60

4, 7,10 11, 14 15, 16, 19 20, 24 25, 28 30, 31 34, 38 40, 43, 46 51, 55 60, 68, 80

5. Starting at the root, follow the chain of the rightmost pointers to the 57 (rightmost) leaf. The largest key is the last key in that leaf.

6. a. Constructing a top-down 2-3-4 tree by inserting the following list of keys in the initially empty tree:

10, 6, 15, 31, 20, 27, 50, 44, 18.

23 Exercises 7.4

1. Give examples of using an index in real-life applications that do not involve computers. 2. a. Prove the equality

h−1 ! i−1 h−1 h−1 1+ 2⌈m/2⌉ (⌈m/2⌉−1) + 2⌈m/2⌉ =4⌈m/2⌉ − 1 i=1 that was used in the derivation of upper bound (7.7) for the height of a B-tree.

b. Complete the derivation of inequality (7.7). 3. Find the minimum order of the B-tree that guarantees that the number of disk accesses in searching in a file of 100 million records does not exceed 3. Assume that the root’s page is stored in main memory. 4. Draw the B-tree obtained after inserting 30 and then 31 in the B-tree in ProblemFigure 7.8. Assume that a leaf cannot contain more than three items. 5. Outline an algorithm for finding the largest key in a B-tree.+ 6. a. A top-down 2-3-4 tree is a B-tree of order 4 with the following modification of the insert operation. Whenever a search for a leaf for a new key encounters a full node (i.e., a node with three keys), the node is split into two nodes by sending its middle key to the node’s parent (or, if the full node happens to be the root, the new root for the middle key is created). Construct a top-down 2-3-4 tree by inserting the following list of keys in the initially empty tree:

10, 6, 15, 31, 20, 27, 50, 44, 18.

b. What is the principal advantage of this insertion procedure compared with the one described for 2-3 trees in Section 6.3? What58 is its disadvan- tage? 7. a. ◃ Write a program implementing a key insertion algorithm in a B-tree.

b. " Write a program for visualization of a key insertion algorithm in a B-tree.

20 Problem ✦ Why are “recursive splits” bad news in a real ?

59 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20, 27, 50, 44, 18

10 10

10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10, 20 10, 20 10,20,31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

20

10 31

6 15, 18 27 44, 50

b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3).

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20, 27, 50, 44, 18

10 10

10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10, 20 10, 20 10,20,31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

20

10 31

6 15, 18 27 44, 50

b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3).

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20, 27, 50, 44, 18

10 10

10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10, 20 10, 20 10,20,31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

20

10 31

6 15, 18 27 44, 50

b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3).

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20, 27, 50, 44, 18

10 10

10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10, 20 10, 20 10,20,31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

20

10 31

6 15, 18 27 44, 50

b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3).

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20, 27, 50, 44, 18

10 10

10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10, 20 10, 20 10,20,31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

20

10 31

6 15, 18 27 44, 50

b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3).

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20, 27, 50, 44, 18

10 10

10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10, 20 10, 20 10,20,31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

20

10 31

6 15, 18 27 44, 50

b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3).

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20, 27, 50, 44, 18

10 10

10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10, 20 10, 20 10,20,31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

20

10 31

6 15, 18 27 44, 50

b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3).

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10, 6, 15, 31, 20,10 27, 50, 44, 1810 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10 10 10, 20 10, 20 10,20,31 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 50 10, 20 10, 20 10,20,31

20 6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

10 31

20 6 15, 18 27 44, 50

10 31

6 15, 18 27 44, 50 b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- actionb. of The splits principal because advantage the leaf’s of parent splitting will full always nodes have (4-nodes a room with for 3 an keys) extraon key. a way (If the down parent during is insertionfull before of the a new insertion, key lies it in is the split fact before that the if the leaf isappropriate reached.) leaf This turns is not out the to case be full, for the its insertion split will algorithm never cause employed a chain re- for 2-3action trees of (see splits Section because 6.3). the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the The disadvantageleaf is reached.) of splitting This is not full the nodes case on for the the wayinsertion down algorithm lies in the employed fact thatfor it can 2-3 lead trees to (see a taller Section tree 6.3). than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containingThe disadvantage key 15 and of splitting therefore fulldidn’t nodes require on the a way split down executed lies in by the the fact top-downthat it insertion. can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the 7. n/a top-down insertion.

7. n/a

24

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10 10, 6, 15, 31, 20,10 27, 50, 44, 181010 1010 6, 10 6, 10, 15 6 15, 31 6 6, 10 6, 10, 15 6 15, 31 6 15,15, 20, 20, 31 31

10 10 10, 20 10, 20 10,10, 20 20 10,20,3110,20,31 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31 6 15 6 15 27,27, 31 31 6 6 1515 27,27, 31, 31, 50 50 66 1515 2727 44,44, 50 50 10, 20 10, 20 10,20,31

2020 6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

1010 3131

20 6 6 15,15, 18 18 2727 44,44, 50 50

10 31

6 15, 18 27 44, 50 b.b. The The principal principal advantage advantage of of splitting splitting full full nodes nodes (4-nodes (4-nodes with with 3 3 keys) keys) onon a a way way down down during during insertion insertion of of a a new new key key lies lies in in the the fact fact that that if if the the appropriateappropriate leaf leaf turns turns out out to to be be full, full, its its split split will will never never cause cause a a chain chain re- re- actionactionb. of of The splits splits principal because because advantage the the leaf’s leaf’s of parent parent splitting will will full always always nodes have have (4-nodes a a room room with for for 3 an an keys) extraextra key.on key. a (If way (If the the down parent parent during is is full insertionfull before before of the the a newinsertion, insertion, key lies it it is in is split the split fact before before that the the if the leafleaf is isappropriate reached.) reached.) This leaf This turns is is not not out the the to case case be for full, for the the its insertion insertion split will algorithm algorithm never cause employed employed a chain re- forfor 2-3 2-3action trees trees (see of (see splits Section Section because 6.3). 6.3). the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the TheThe disadvantage disadvantageleaf is reached.) of of splitting splitting This is notfull full the nodes nodes case on on for the the the way wayinsertion down down algorithm lies lies in in the the employed fact fact thatthat itfor it can can 2-3 lead lead trees to to (see a a taller taller Section tree tree 6.3). than than necessary. necessary. For For the the list list of of part part (a), (a), forfor example, example, the the tree tree before before the the last last one one had had a a room room for for key key 18 18 in in the the leafleaf containing containingThe disadvantage key key 15 15 and and of splitting therefore therefore fulldidn’tdidn’t nodes require require on the a a split way split down executed executed lies inby by the the the fact top-downtop-downthat insertion. it insertion. can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the 7.7. n/a n/a top-down insertion.

7. n/a

2424

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10 10, 6, 15, 31, 20,10 27, 50, 44, 181010 1010 6, 10 6, 10, 15 6 15, 31 6 6, 10 6, 10, 15 6 15, 31 6 15,15, 20, 20, 31 31

10 10 10, 20 10, 20 10,10, 20 20 10,20,3110,20,31 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31 6 15 6 15 27,27, 31 31 6 6 1515 27,27, 31, 31, 50 50 66 1515 2727 44,44, 50 50 10, 20 10, 20 10,20,31

2020 6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

1010 3131

20 6 6 15,15, 18 18 2727 44,44, 50 50

10 31

6 15, 18 27 44, 50 b.b. The The principal principal advantage advantage of of splitting splitting full full nodes nodes (4-nodes (4-nodes with with 3 3 keys) keys) onon a a way way down down during during insertion insertion of of a a new new key key lies lies in in the the fact fact that that if if the the appropriateappropriate leaf leaf turns turns out out to to be be full, full, its its split split will will never never cause cause a a chain chain re- re- actionactionb. of of The splits splits principal because because advantage the the leaf’s leaf’s of parent parent splitting will will full always always nodes have have (4-nodes a a room room with for for 3 an an keys) extraextra key.on key. a (If way (If the the down parent parent during is is full insertionfull before before of the the a newinsertion, insertion, key lies it it is in is split the split fact before before that the the if the leafleaf is isappropriate reached.) reached.) This leaf This turns is is not not out the the to case case be for full, for the the its insertion insertion split will algorithm algorithm never cause employed employed a chain re- forfor 2-3 2-3action trees trees (see of (see splits Section Section because 6.3). 6.3). the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the TheThe disadvantage disadvantageleaf is reached.) of of splitting splitting This is notfull full the nodes nodes case on on for the the the way wayinsertion down down algorithm lies lies in in the the employed fact fact thatthat itfor it can can 2-3 lead lead trees to to (see a a taller taller Section tree tree 6.3). than than necessary. necessary. For For the the list list of of part part (a), (a), forfor example, example, the the tree tree before before the the last last one one had had a a room room for for key key 18 18 in in the the leafleaf containing containingThe disadvantage key key 15 15 and and of splitting therefore therefore fulldidn’tdidn’t nodes require require on the a a split way split down executed executed lies inby by the the the fact top-downtop-downthat insertion. it insertion. can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the 7.7. n/a n/a top-down insertion.

7. n/a

2424

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10 10 10 10, 6, 15, 31, 20,10 27, 50, 44, 181010 10 6, 10 6, 10, 15 1010 6, 10 6, 10, 15 6 6 15, 15,31 31 6 6 15, 20, 31 6, 10 6, 10, 15 6 15, 31 6 15,15, 20, 20, 31 31

10 10 10,10, 20 20 10, 20 10,20,31 10, 20 10,10, 20 20 10,20,3110,20,31 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31 6 6 15 15 27, 31 6 15 27, 31, 50 6 15 27 44, 50 6 15 27,27, 31 31 6 6 1515 27,27, 31, 31, 50 50 66 1515 2727 44,44, 50 50 10, 20 10, 20 10,20,31

202020 6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

101010 313131

20 6 6 6 15,15, 15,18 18 18 272727 44,44, 44,50 50 50

10 31

6 15, 18 27 44, 50 b.b.b. The The The principal principal principal advantage advantage advantage of of of splitting splitting splitting full full full nodes nodes nodes (4-nodes (4-nodes (4-nodes with with with 3 3 keys) 3 keys) keys) ononon a a waya way way down down down during during during insertion insertion insertion of of of a a new a new new key key key lies lies lies in in in the the the fact fact fact that that that if if the if the the appropriateappropriateappropriate leaf leaf leaf turns turns turns out out out to to to be be be full, full, full, its its its split split split will will will never never never cause cause cause a a chain a chain chain re- re- re- actionactionactionb. of of of The splits splits splits principal because because because advantage the the the leaf’s leaf’s leaf’s of parent parent splitting parent will will will full always always always nodes have have (4-nodes have a a room a room room with for for for3 an an keys) an extraextraextra key.on key. key. a (If way (If (If the the down the parent parent parent during is is isfull insertionfull full before before before of the the a the newinsertion, insertion, insertion, key lies it it is init is split is the split split fact before before before that the the if the the leafleafleaf is is isappropriatereached.) reached.) reached.) This leaf This This turns is is not is not not out the the the to case case becase for full, for for the the its the insertion insertion split insertion will algorithm algorithm never algorithm cause employed employed employeda chain re- forforfor 2-3 2-3 2-3action trees trees trees (see of (see (see splits Section Section Section because 6.3). 6.3). 6.3). the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the TheTheThe disadvantage disadvantageleaf disadvantage is reached.) of of of splitting splitting This splitting is notfull full full the nodes nodes nodes case on on for on the the the the way wayinsertion way down down down algorithm lies lies lies in in thein the employed the fact fact fact thatthatthat itfor it it can can can2-3 lead lead trees lead to to (see to a a taller a taller Section taller tree tree tree 6.3). than than than necessary. necessary. necessary. For For For the the the list list list of of partof part part (a), (a), (a), forforfor example, example, example, the the the tree tree tree before before before the the the last last last one one one had had had a a room a room room for for for key key key 18 18 18 in in inthe the the leafleafleaf containing containingThe containing disadvantage key key key 15 15 15 and and of and splitting therefore therefore therefore fulldidn’tdidn’tdidn’t nodes require require requireon the a a split way a split split down executed executed executed lies inby by the by the the the fact top-downtop-downtop-downthat insertion. it insertion. insertion. can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the 7.7.7. n/a n/a n/a top-down insertion.

7. n/a

242424

24 Problem ✦ Levitin mentions that recursive splits can be avoided by “splitting on the way down”. ✦ Construct a B tree of order 4 by inserting (in order) into an empty tree 10 10 10, 6, 15, 31, 20, 27, 50, 44, 18 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

10 10 10, 20 10, 20 10,20,31 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31 6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 50

10, 20 10, 20 10,20,31

20 6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 5060

10 31

6 15, 18 2027 44, 50

10 31

6 15, 18 27 44, 50 b. The principal advantage of splitting full nodes (4-nodes with 3 keys) on a way down during insertion of a new key lies in the fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- actionb. of The splits principal because advantage the leaf’s of splitting parent will full always nodes (4-nodes have a room with for3 keys) an extraon key. a way (If down the parent during is insertion full before of a the new insertion, key lies init is the split fact before that if the the leaf isappropriate reached.) leaf This turns is not out the to becase full, for its the split insertion will never algorithm cause employeda chain re- for 2-3action trees of (see splits Section because 6.3). the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the Theleaf disadvantage is reached.) of This splitting is not full the nodes case for on the the insertion way down algorithm lies in employed the fact thatfor it can2-3 trees lead (see to a Section taller tree 6.3). than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leafThe containing disadvantage key 15 of and splitting therefore fulldidn’t nodes requireon the way a split down executed lies in the by the fact top-downthat it insertion. can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the 7. n/a top-down insertion.

7. n/a

24

24 Problem

✦ Levitin mentions10 that recursive10 splits can be

10 avoided by “splitting on the way down”. 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31 ✦ Construct a B tree of order 4 by inserting (in 10, 20 order)10, 20 into an empty tree10,20,31 10 10 6 15 27, 31 6 15 27,10, 31, 50 6, 15,6 31,15 20,27 27,44, 50, 50 44, 18 10 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31

20 10 10 10, 20 10, 20 10,20,31 10 106, 10 6, 10,31 15 6 15, 31 6 15, 20, 31 6 15 27, 31 6 15 44, 50 6 15, 18 27 44, 50 27, 31, 50 6 15 27 10, 20 10, 20 10,20,31

20 6 15 27, 31 44, 50 b. The principal advantage of splitting full nodes6 (4-nodes15 27, 31, with 50 3 keys)6 15 27 60 on a way down during insertion of a new key lies10 in the fact that31 if the appropriate leaf turns out to be full, its split will never cause a chain re- action of splits because the leaf’s parent will always6 15, have 18 a20 room27 for44, an 50 extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion10 algorithm employed31 for 2-3 trees (see Section 6.3). 6 15, 18 27 44, 50 b. The principal advantage of splitting full nodes (4-nodes with 3 keys) The disadvantageon of a splitting way down full during nodes on insertion the way of down a new lies key in the lies fact in the fact that if the that it can leadappropriate to a taller tree leaf than turns necessary. out to be For full, the its list split of part will (a), never cause a chain re- for example, the tree before the last one had a room for key 18 in the actionb. of The splits principal because advantage the leaf’s of splitting parent will full always nodes (4-nodes have a room with for3 keys) an leaf containing keyextra 15 and key. therefore (If the parentdidn’t require is full a before split executed the insertion, by the it is split before the top-down insertion. on a way down during insertion of a new key lies in the fact that if the leaf isappropriate reached.) leaf This turns is not out the to becase full, for its the split insertion will never algorithm cause employeda chain re- for 2-3action trees of (see splits Section because 6.3). the leaf’s parent will always have a room for an 7. n/a extra key. (If the parent is full before the insertion, it is split before the Theleaf disadvantage is reached.) of This splitting is not full the nodes case for on the the insertion way down algorithm lies in employed the fact thatfor it can2-3 trees lead (see to a Section taller tree 6.3). than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leafThe containing disadvantage key 15 of and splitting therefore fulldidn’t nodes requireon the way a split down executed lies in the by the fact top-downthat it insertion. can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the 7. n/a top-down insertion.

7. n/a

24

24

24 Problem ✦ Levitin mentions that recursive splits can10 be 10

avoided by “splitting10 on the way down” 6, 10 6, 10, 15 6 15, 31 6 15, 20, 31 ✦ What are the advantages and disadvantages of 10this variation (called10 10, 20 a top-down10, 20 B tree)? 10,20,31 10 6, 10 6, 10, 15 6 15, 31 6 615, 20,15 31 27, 31 6 15 27, 31, 50 6 15 27 44, 50

10, 20 10, 20 10,20,31 20

10 31 6 15 27, 31 6 15 27, 31, 50 6 15 27 44, 50 6 15, 18 27 44, 50

20 b. The principal advantage of splitting full nodes (4-nodes with 3 keys) 10 31 on a way down during insertion of a new key lies in the61 fact that if the appropriate leaf turns out to be full, its split will never cause a chain re- 6 15, 18 27 44, 50 action of splits because the leaf’s parent will always have a room for an extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3). b. The principal advantage of splitting full nodes (4-nodes withThe 3 disadvantage keys) of splitting full nodes on the way down lies in the fact on a way down during insertion of a new key lies in the factthat that it if can the lead to a taller tree than necessary. For the list of part (a), appropriate leaf turns out to be full, its split will never causefor a chain example, re- the tree before the last one had a room for key 18 in the action of splits because the leaf’s parent will always have a roomleaf containing for an key 15 and therefore didn’t require a split executed by the top-down insertion. extra key. (If the parent is full before the insertion, it is split before the leaf is reached.) This is not the case for the insertion algorithm employed for 2-3 trees (see Section 6.3). 7. n/a

The disadvantage of splitting full nodes on the way down lies in the fact that it can lead to a taller tree than necessary. For the list of part (a), for example, the tree before the last one had a room for key 18 in the leaf containing key 15 and therefore didn’t require a split executed by the top-down insertion.

7. n/a

24

24