What are search trees?

Search Trees Allow efficient searching of ordered data Implement Ordered Dictionary ADT Provide flexible mechanism for storing and retrieving data

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Binary Search Tree Search

TreeSearch(k, node) { Each node stores a key-element pair if ((k < key(node)) && (leftChild(node) != null)) key(leftsubtree) <= key(node) <= key(rightsubtree) return TreeSearch(k, leftchild(node)); Trees in Goodrich/Tamassia store elements only at else if (k > key(node)) && Trees in Goodrich/Tamassia store elements only at (rightchild(node) != null)) internal nodes return TreeSearch(k, rightchild(node)); • I generally store elements at all nodes else return node; }

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Analysis of TreeSearch Inserting into

Call TreeSearch on root to find appropriate parent node At each node, perform O(1) work • Call again using a child if key already exists Maximum nodes visited is h, the height of Parent node will be external tree Insert element as new child of parent Total running time is thus O(h)

Also takes O(h)

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

1 Removing from Binary Tree AVL Trees Call TreeSearch to locate node for removal

If node is external, just remove node Binary search trees which maintain O(logn) If node has one child, replace node with child height If node has two children Maintain height balance property • Find smallest element greater than node (will • Heights of children differ by at most 1 have 0 or 1 children) • Local property to maintain, but guarantees • Replace element with that node global property of overall height Again, O(h) Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Analyzing AVL height Inserting with balanced height n(h) : minimum nodes for AVL tree of height h Base conditions n(0) = 1; n(1) = 2 Insert node into as usual Recurrance relation n(h) = 1 + n(h-1) + n(h-2) > 2*n(h-2) • Increases height of some nodes along path to n(h) > 2*n(h-2) > 4*n(h-4) > 8*n(h-6), etc. root n(h) > 2i*n(h-2i) i to achieve base condition Walk up towards root h-2i=1 → i=(h-1)/2 n(h) > 2(h-1)/2*n(1) = 2(h-1)/2*2 = 2(h-1/2)+1 • If unbalanced height is found, restructure Bounding h unbalanced region with rotation operation • logn(h) > (h-1)/2 + 1 • h < 2(logn(h) - 1) + 1 = 2logn(h) -1 = O(logn) Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Balanced Tree Insert (case 1)

3 4 2 3 2 50 2 3 50 2 30 75 30 75 0 1 0 2 1 0 1 0 15 40 65 15 40 65 0 0 80 0 1 80 0 0 0 0 45 Unbalanced 45 35 60 70 node 35 60 70 0 43

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

2 “Rotate Left” . . . “Rotate Left” . . .

4 4 3 50 2 50 2 3 30 75 2 75 2 0 30 0 0 1 40 1 1 40 15 65 80 0 45 65 80 1 0 0 0 0 0 0 0 0 0 0 0 45 70 15 70 35 60 35 0 60 0 43 43

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

“Rotate Left” - Balanced! Insert (case 2)

3 4 50 2 50 2 2 3 75 30 75 1 40 1 0 1 1 0 2 0 30 45 65 15 40 65 80 0 0 80 0 0 0 0 0 1 0 35 45 15 35 43 60 70 60 70 Unbalanced 0 node 53

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

“Rotate Right”. . . “Rotate Right”. . .

4 4 2 50 2 50 3 3 3 30 30 2 75 0 1 75 0 1 2 0 65 15 40 15 40 1 0 0 0 65 80 0 0 0 1 0 60 80 35 45 35 45 70 60 70 0 0 53 53

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

3 “Rotate Right” - Balanced! Insert (case 3)

3 4 2 50 3 50 2 2 30 30 75 0 1 65 1 0 2 1 1 0 75 15 40 0 0 15 40 65 0 0 60 1 0 80 0 0 70 Unbalanced 45 0 80 45 35 node 35 60 70 53 0 33

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

“Double Rotation Right-Left” - Right . . . “Double Rotation Right-Left” - Right . . .

4 4 3 3 3 50 2 3 50 2 30 75 30 75 0 2 0 0 2 0 1 1 1 15 40 65 15 40 65 1 80 35 80 0 0 0 0 0 0 35 45 60 70 0 60 70 0 33 45 33

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

“Double Rotation Right-Left” - Right . . .done! “Double Rotation Right-Left” - Left . . .

4 4 3 3 50 2 3 50 2 30 75 30 75 0 2 2 1 0 0 1 0 15 35 0 1 65 80 15 35 1 65 80 0 0 0 0 0 33 40 0 70 33 40 70 60 0 60 45 45

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

4 “Double Rotation Right-Left” - Left . . . “Double Rotation Right-Left” - Balanced!

4 3 50 2 2 50 2 3 3 2 75 1 35 1 75 30 35 1 1 0 1 0 0 65 30 40 65 0 40 80 0 0 0 80 15 0 0 0 0 33 0 15 33 45 60 70 15 60 70 45

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Restructure Procedure Analyzing Insert

Consider the first unbalanced node encountered (walking upward) and its two Upward traversal with height recomputation descendants along that path takes O(h) = O(logn) Sort them in increasing order and label as a, Restructure takes O(1) b, and c b, and c The restructure always reduces the height of Place b as the parent of a and c where the the unbalanced node unbalanced node was • So only one restructure is necessary Hook up the (up to) 4 subtrees as the Total time: O(logn) + O(1) = O(logn) appropriate children of a and c Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Remove Algorithm Multi-way Search Trees

Perform removal as with binary search tree Each node may store multiple key-element • May decrease height of some nodes on path to pairs the root the root Node with d children (d-node) stores d-1 key- Walk upwards to the root element pairs • If unbalanced height is found, restructure Children have keys that fall either before unbalanced region with rotation operation smallest parent key, after largest parent key, or between two parent keys Remove is also O(logn) • But multiple restructure operations may be (for this section, let’s use convention of external necessary along the way nodes storing no element, as in book) Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

5 Example Multi-way Search Tree Multi-way Tree Searching

50 Basically same as for binary tree 20 30 60 70 80 1. Start at root 10 15 25 40 42 45 55 64 66 75 85 90 2. Find appropriate child path to go down 3. Traverse to child 22 27 4. Repeat 1-3 until found or reach external External node between each pair of keys and before/after (n-1) + 1 + 1 = n+1 external nodes Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Multi-way Search Analysis (2,4) Trees

Number of nodes traversed is up to h Efficient multi-way search trees Work at each node is function of d • O(logn) height • O(logd) if structure storing keys provides efficient search, otherwise O(d) Maintain two properties: Total worst case time 1. Size property: nodes have at most 4 children • O(hlogd ) or O(hd ) 2. Depth property: All external nodes have same max max depth (i.e. all at the same level) • If dmax is bounded by small constant, just O(h)

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

(2,4) Tree Height Analysis Inserting into (2,4) Tree Lower bound on h 1. Search for position in deepest internal n(h) <= 4h : max children per node is 4 node # external nodes = n+1 2. Insert into position n+1 <= 4h ⇒ h >= log(n+1) / 2 Upper bound on h 3. If # elements > 3, do a split operation At least 2 nodes at depth 1, 4 at depth 2, etc. • Split node into 2 nodes At least 2d nodes at depth d • Push 1 element up to parent At least 2h external nodes 2h <= n+1 ⇒ h <= log(n+1) —Create new root if no parent h = Θ(logn) ⇒ search is O(logn) —If parent overflows, split parent Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

6 Simple Insertion (no overflow) Insertion with Overflow

10 10 10 Insert 11 10

5 12 14 5 12 14 15 5 12 14 15 5 11 12 14 15

Split Insert 15 10 14

5 11 12 15

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Insert with Cascading Split Removing from (2,4) Tree

1. Search for element 6 8 10 Insert 11 6 8 10 2. Remove element 3. If element’s child is internal 5 7 9 12 14 15 57 9 11 12 14 15 • Swap next larger element into hole (so we’ve removed element above an external) 10 Split 4. If node has no elements If an adjacent sibling has > 1 element Split 6 8 14 6 8 10 14 Perform transfer (kind of rotation) Else 57 9 11 12 15 57 9 11 12 15 Perform fusion (can cascade upward)

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Simple Removal Removal with Swap

6 8 10 Remove 14 6 8 10 6 8 10 Remove 10 6 8

5 7 9 12 14 15 5 7 9 12 15 5 7 9 12 14 15 5 7 9 12 14 15

Swap

6 8 12

5 7 9 14 15

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

7 Removal with Transfer Removal with Fusion

Remove 9 6 8 10 6 8 10 6 8 10 Remove 7 6 8 10

5 7 9 12 14 15 5 7 12 14 15 5 7 9 12 14 15 5 9 12 14 15

Transfer Fusion (~rotate) 6 8 12 6 10

5 8 9 12 14 15 5 7 10 14 15 Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Performance External Memory Searching

Memory Hierarchy Insertion Registers • Find is O(logn) Cache • Each split is O(1); only O(logn) splits necessary

Remove RAM • Each transfer/fusion is O(1); only O(logn) necessary External Memory

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

Types of External Memory Primary Motivation

Hard disk External memory access much slower than internal memory access Floppy disk internal memory access • orders of magnitude slower Compact disc • need to minimize I/O Complexity Tape • can afford slightly more work on data in Distributed/networked memory memory in exchange for lower I/O complexity

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

8 Application Areas Disk Blocks

Searching Data is read one block at a time Sorting • pack as much into a block as possible Data Processing • minimize number of block reads necessary Data Mining Data Exploration

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

I/O Efficient Dictionaries (a,b) Trees

Generalization of (2,4) trees Balanced tree structures Size property: internal node has at least a • Typically O(log2n) transfers for query or children and at most b children update • 2 <= a <= (b+1)/2 • Want to reduce height by constant factor as much as possible Depth property: all external nodes have same depth • Can be reduced to O(logBn) = O(log2n/log2B) —B is number of nodes per block Height of (a,b) tree is Ω (logn/logb) and O(logn/loga)

Johns Hopkins Department of Computer Science Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen Course 600.226: Data Structures, Professor: Jonathan Cohen

B-Trees

Choose a and b to be Θ(B)

Height is now O(logBn)

I/O complexity for search is O(logBn)

Johns Hopkins Department of Computer Science Course 600.226: Data Structures, Professor: Jonathan Cohen

9