Fall 2021

Fang Yu

Software Security Lab. Data Structures Dept. Management Information Systems, National Chengchi University Lecture 11 Search Trees Binary Search Trees, AVL trees, and Splay Trees 3 Binary Search Trees

¡ A binary search is a storing keys (or key-value ¡ An inorder traversal of a binary entries) at its internal nodes and search trees visits the keys in an satisfying the following increasing order property: ¡ Let u, v, and w be three nodes such that u is in the left subtree of v and w is in the right subtree of v. 6 We have key(u) ≤ key(v) ≤ key(w) 2 9

¡ External nodes do not store items 1 4 8 4 TreeSearch(k, v) ¡ To search for a key k, we trace if T.isExternal (v) a downward path starting at the return v root if k < key(v) return TreeSearch(k, T.left(v)) ¡ The next node visited depends else if k key(v) on the comparison of k with the = return v key of the current node else { k > key(v) } ¡ If we reach a leaf, the key is return TreeSearch(k, T.right(v)) not found < 6 ¡ Example: get(4): 2 9 > ¡ Call TreeSearch(4,root) 1 4 = 8 ¡ The algorithms for floorEntry and ceilingEntry are similar 5 Insertion 6 ¡ To perform operation put(k, o), < we search for key k (using 2 9 > TreeSearch) 1 4 8 > ¡ Assume k is not already in the tree, and let w be the leaf w reached by the search 6 ¡ We insert k at node w and expand w into an internal node 2 9

¡ Example: insert 5 1 4 8 w 5 6 Deletion 6 ¡ To perform operation remove(k), < we search for key k 2 9 > ¡ Assume key k is in the tree, and 1 4 v 8 let v be the node storing k w 5 ¡ If node v has a leaf child w, we remove v and w from the tree with operation removeExternal (w), which removes w and its 6 parent 2 9

¡ Example: remove 4 1 5 8 7 Deletion (cont.) 1 ¡ We consider the case where the v 3 key k to be removed is stored at a node v whose children are both 2 8 internal 6 9 ¡ we find the internal node w that w follows v in an inorder traversal 5 ¡ we copy key(w) into node v z ¡ we remove node w and its left child z (which must be a leaf) by 1 v means of operation 5 removeExternal(z) 2 8 ¡ Example: remove 3 6 9 8 Performance ¡ Consider n ordered items implemented by means of a binary of height h ¡ the space used is O(n) ¡ methods get, put and remove take O(h) time

¡ The height h is O(n) in the worst case and O(log n) in the best case

We want a balanced binary tree! 9 AVL Tree Definition

¡ AVL trees are balanced 4 ¡ An AVL Tree is a binary 44 search tree such that for 2 3 every internal node v of T, 17 78 the heights of the children of 1 2 1 v can differ by at most 1 32 50 88 1 1 48 62

An example of an AVL tree where the heights are shown next to the nodes: n(2) 3 10 4 n(1) Height of an AVL Tree ¡ Fact: The height of an AVL tree storing n keys is O(log n). ¡ Proof: Let us bound n(h): the minimum number of internal nodes of an AVL tree of height h. ¡ We easily see that n(1) = 1 and n(2) = 2 ¡ For n > 2, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and another of height n-2. ¡ That is, n(h) = 1 + n(h-1) + n(h-2) ¡ Knowing n(h-1) > n(h-2), we get n(h) > 2n(h-2). So n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(n-6), … (by induction), n(h) > 2in(h-2i) ¡ Solving the base case we get: n(h) > 2 h/2-1 ¡ Taking logarithms: h < 2log n(h) +2 ¡ Thus the height of an AVL tree is O(log n) AVL Trees 11 Insertion ¡ Insertion is as in a ¡ Always done by expanding an external node. ¡ Example: 44 44

=z 17 78 17 78 a=y

32 50 88 32 50 88

48 62 48 62 b=x

54 w

before insertion after insertion After Insertion

¡ All nodes along the path increase their height by 1

¡ It may violate the AVL property 44 5 44 4 17 78 4 17 78 3 32 50 3 88 32 50 2 88 48 62 2 48 62 1 54 1 Search and repair

¡ Let z be the first violation node from the bottom along the path

¡ Let y be z’child with the higher height (y is 2 greater than its sibling)

¡ Let x be y’s child with the higher height

¡ We rebalance z by calling trinode restructuring method 14 Trinode Restructuring ¡ let (a,b,c) be an inorder listing of x, y, z ¡ perform the rotations needed to make b the topmost node of the three

(other two cases a=z a=z are symmetrical) case 2: double rotation (a right rotation about c, c=y then a left rotation about a) b=y

T0 T0 c=x b=x

T3 T1 b=y b=x

T1 T2 T2 T3 a=z c=x a=z c=y case 1: single rotation (a left rotation about a) T0 T1 T2 T3 T0 T1 T2 T3

AVL Trees Restructuring 15 (as Single Rotations) ¡ Single Rotations:

a = z single r otation b = y b = y a = z c = x c = x

T0 T3 T1 T3 T0 T1 T2 T2

c = z single rotation b = y b = y a = x c = z a = x

T3 T3 T0 T2 T2 T1 T0 T AVL Trees 1 Restructuring 16 (as Double Rotations) ¡ double rotations:

a = z double rotation b = x c = y a = z c = y b = x

T0 T2 T2 T3 T0 T1 T3 T1

c = z double rotation b = x a = y a = y c = z b = x

T0 T2 T3 T2 T3 T1 T0 T1 AVL Trees 17 Insertion Example, continued 5 44 2 z 64 17 78 7 2 y 1 3 1 32 1 50 4 88 2 1 x 48 62 1 3 5 54 T3 unbalanced... T T0 2 4 T 1 44 4 2 3 x 17 62 6 2 y z 1 2 2 32 1 50 3 78 7 1 1 5 1 ...balanced 48 54 88

T 2

T0 T 1 T3 AVL Trees 18 Removal ¡ Removal begins as in a binary search tree, which means the node removed will become an empty external node. Its parent, w, may cause an unbalance. ¡ Example:

44 44

17 62 17 62

32 50 78 50 78

48 54 88 48 54 88

before deletion of 32 after deletion

AVL Trees 19 Rebalancing after a Removal ¡ Let z be the first unbalanced node encountered while travelling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height ¡ As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached

62 a=z 44

44 78 w 17 62 b=y

17 50 88 50 78 c=x

48 54 48 54 88

AVL Trees 20 AVL Tree Performance

¡ a single restructure takes O(1) time ¡ using a linked-structure binary tree ¡ get takes O(log n) time ¡ height of tree is O(log n), no restructures needed ¡ put takes O(log n) time ¡ initial find is O(log n) ¡ Restructuring up the tree, maintaining heights is O(log n) ¡ remove takes O(log n) time ¡ initial find is O(log n) ¡ Restructuring up the tree, maintaining heights is O(log n) 21

Splay Tree

¡ a is a binary search tree where a node is splayed after it is accessed (for a search or update)

¡ deepest internal node accessed is splayed

¡ splay: move the node to the root ¡ splaying costs O(h), where h is height of the tree – which is still O(n) worst-case ¡ O(h) rotations, each of which is O(1) 22 Splay Tree

¡ which nodes are splayed after each operation?

method splay node

if key found, use that node get(k) if key not found, use parent of ending external node

put(k,v) use the new node containing the entry inserted

use the parent of the internal node that was actually removed from the tree (the parent of the node that the removed item was remove(k) swapped with) Searching in a Splay Tree: 23 Starts the Same as in a BST (20,Z)

(10,A) (35,R) ¡ Search proceeds down the tree (14,J) to found item or an external (7,T) (21,O) (37,P) node. (1,Q) (8,N) (36,L) (40,X) ¡ Example: Search for the item with key 11. (1,C) (5,H) (7,P) (10,U)

(2,R) (5,G)

(5,I) (6,Y)

Splay Trees 24 Example Searching in a BST, continued (20,Z)

(10,A) (35,R) ¡ search for key 8, ends at an internal node. (14,J) (7,T) (21,O) (37,P)

(1,Q) (8,N) (36,L) (40,X)

(1,C) (5,H) (7,P) (10,U)

(2,R) (5,G)

(5,I) (6,Y)

Splay Trees Splay Trees do Rotations after 25 Every Operation (Even Search) ¡ new operation: splay ¡ splaying moves a node to the root using rotations n right rotation n left rotation n makes the left child x of a node y into n makes the right child y of a node x y’s parent; y becomes the right child into x’s parent; x becomes the left of x child of y

y x a right rotation about y a left rotation about x

x y x y T T3 1

y x T T T T 1 2 T 2 3 1 T3

(structure of tree above y (structure of tree above x is not modified) T2 T3 is not modified) T1 T2

Splay Trees n “x is a left-left grandchild” means x is a left child of its Splaying: parent, which is itself a left child of its parent 26 n p is x’s parent; g is p’s parent start with node x is x a left-left is x the yes stop grandchild? zig-zig root? yes right-rotate about g, no right-rotate about p is x a right-right zig-zig is x a child of no grandchild? the root? yes left-rotate about g, left-rotate about p yes is x a right-left grandchild? zig-zag is x the left no left-rotate about p, child of the yes right-rotate about g root? is x a left-right yes zig zig grandchild? zig-zag

right-rotate left-rotate about yes right-rotate about p, about the root the root left-rotate about g Splay Trees Visualizing the 27

Splaying Cases zig-zag z x z y z y y

T4 x T1 x T T T T T4 1 2 3 4

T3 zig-zig T2 T3 T1 T2 x y zig

x y x

T1 T4 z w w y

T2 T3

T1 T2 T3 T4 T3 T4 T1 T2 Splay Trees (20,Z) Splaying Example 28 (10,A) (35,R) ¡ let x = (8,N) g (14,J) 1. ¡ x is the right child of its parent, which p (7,T) (21,O) (37,P) (before is the left child of the grandparent rotating) (1,Q) (8,N) (36,L) (40,X) ¡ left-rotate around p, then right-rotate x around g (1,C) (5,H) (7,P) (10,U)

(2,R) (5,G) (20,Z) (5,I) (6,Y) (20,Z) g (10,A) (35,R)

(8,N) (14,J) x (8,N) (35,R) x (21,O) (37,P) g p (7,T) (10,U) (36,L) (40,X) p (7,T) (10,A) (21,O) (37,P)

(1,Q) (7,P) (1,Q) (7,P) (14,J) (36,L) (40,X) 2. (1,C) (5,H) (after first rotation) (1,C) (5,H) (10,U) 3. (after second

(2,R) (5,G) (2,R) (5,G) rotation)

(5,I) (6,Y) (5,I) (6,Y) x is not yet the root, so Splay Trees we splay again Splaying Example, Continued 29

¡ now x is the left child of the root

(20,Z) ¡ right-rotate around root

x (8,N) (35,R)

(7,T) (10,A) (21,O) (37,P)

(1,Q) (7,P) (14,J) (36,L) (40,X) x (8,N) 2. (after rotation) (1,C) (5,H) (10,U) (20,Z) 1. (7,T)

(before applying (1,Q) (7,P) (35,R) (2,R) (5,G) rotation) (10,A)

(5,I) (6,Y) (1,C) (5,H) (14,J) (21,O) (37,P)

(10,U) (36,L) (40,X) (2,R) (5,G)

(5,I) (6,Y) x is the root, so stop Splay Trees (20,Z) 30 Example Result of (10,A) (35,R) before (14,J) Splaying (7,T) (21,O) (37,P) ¡ tree might not be more balanced (1,Q) (8,N) (36,L) (40,X) ¡ e.g. splay (40,X) (1,C) (5,H) (7,P) (10,U) ¡ before, the depth of the shallowest leaf is 3 and the deepest is 7 (2,R) (5,G) ¡ after, the depth of shallowest leaf is 1 and (40,X) deepest is 8 (5,I) (6,Y) (20,Z) (20,Z)

(10,A) (37,P) (10,A) (40,X)

(14,J) (35,R) (14,J) (37,P) (7,T) (7,T)

(1,Q) (8,N) (1,Q) (8,N) (35,R) (21,O) (36,L)

(1,C) (5,H) (7,P) (10,U) (1,C) (5,H) (7,P) (10,U) (21,O) (36,L)

(2,R) (5,G) (2,R) (5,G) after first splay after second (5,I) (6,Y) splay (5,I) (6,Y) Splay Trees 31 Performance of Splay Trees

¡ Amortized cost of any splay operation is O(log n)

¡ This implies that splay trees can actually adapt to perform searches on frequently-requested items much faster than O (log n) in some cases

Splay Trees Project Demo on Jan. 7

Beat Google:

¡ Stage 1 : Rank web pages by keywords

¡ Stage 2 : Rank web sites by keywords

¡ Stage 3 : Re-rank google web sites by keywords

¡ Stage 4 : Derive relative keywords by top-ranked web sites

¡ Stage 5: Webrize your search engine

¡ Stage 6: Mobilize your search engine Project Demo

¡ Location: The MIS 5F PC classroom

¡ Each team gives 8 minutes PPT presentation focusing on the project interests, key ideas, and achievements + 7 minutes system demo

¡ In the demo, each team needs to run your system to show how it works and how it achieves the requirement for each stage. I will also check your source code.

¡ BONUS: Students who successfully challenge other team’s system may get extra points. Project Hints

¡ How to call google?

¡ How to find the reference links?

¡ How to encode Chinese? HW10 (Due on Dec. 16)

Use Google and get the links!

¡ Get a keyword from user

¡ Return the urls listed in the search result

¡ Save the results in a (we will discuss it in the next lecture)

¡ After this HW, you can step to the forth stage of the project

¡ You can apply the same technique to other search engines Coming Up

¡ Recap: Binary Search Trees ¡ TB Chapter 10

¡ Maps and Hash tables ¡ TB Chapter 9 and 10