Algorithms ROBERT SEDGEWICK | KEVIN WAYNE

3.2 BINARY SEARCH TREES

‣ BSTs ‣ iteration Algorithms ‣ ordered operations FOURTH EDITION ‣ deletion (see book or videos)

ROBERT SEDGEWICK | KEVIN WAYNE https://algs4.cs.princeton.edu

Last updated on 3/1/19 3:45 PM 3.2 BINARY SEARCH TREES

‣ BSTs ‣ iteration Algorithms ‣ ordered operations ‣ deletion (see book or videos)

ROBERT SEDGEWICK | KEVIN WAYNE https://algs4.cs.princeton.edu Binary search trees

Definition. A BST is a binary in symmetric order. root a left link a subtree A is either: Empty. ・ right child ・Two disjoint binary trees (left and right). of root null links Anatomy of a binary tree

parent of A and R Symmetric order. Each node has a key, key left link S and every node’s key is: of E E X Larger than all keys in its left subtree. A R 9 value ・ H associated ・Smaller than all keys in its right subtree. with R keys smaller than E keys larger than E

Anatomy of a binary

3 Differences between heaps and binary search trees

Heap BST

Supported insert, search, delete, Insert, delete-max operations ordered operations

What is inserted Keys Key-value pairs

Underlying Resizing array Linked nodes

Fixed shape given n Varies; Tree shape (complete binary tree) depends on data

Somewhat ordered Totally ordered Ordering of keys parent > child for max left child < parent < right child

Duplicate keys Yes No allowed?

4 Binary search trees: quiz 1

Which of the following properties hold?

A. If a binary tree is heap ordered, then it is symmetrically ordered.

B. If a binary tree is symmetrically ordered, then it is heap ordered.

C. Both A and B.

D. Neither A nor B.

max key

X X min key H H max key E E S S C C S S A C A R C HR H A E A R E XR X

symmetricsymmetric order order heapheap order orderheap order symmetric order

5 BST representation in Java

A BST contains a reference to a root Node.

A Node is composed of four fields: ・A Key and a Value. ・A reference to the left and right subtree.

smaller keys larger keys

private class Node { private Key key; BST private Value val; private Node left, right; Node key val

public Node(Key key, Value val) { left right this.key = key; this.val = val; } BST with smaller keys BST with larger keys } Key and Value are generic types; Key is Comparable 6 BST implementation (skeleton)

public class BST, Value> { private Node root; root of BST

private class Node { /* see previous slide */ }

public void put(Key key, Value val) { /* see next slide */ }

public Value get(Key key) { /* see next slide */ }

public Iterable keys() { /* see slides in next section */ }

public void delete(Key key) { /* see textbook */ }

}

7 BST search (get)

If less, go left; if greater, go right; if equal, search hit; if null node, search miss.

H: search hit S

E X

A R G: search miss S C H E X M A R

C H

M

8 BST search: Java implementation

Get. Return value corresponding to given key, or null if no such key.

public Value get(Key key) { Node x = root; while (x != null) { int cmp = key.compareTo(x.key); if (cmp < 0) x = x.left; else if (cmp > 0) x = x.right; else if (cmp == 0) return x.val; } return null; }

Cost. Number of compares = 1 + depth of node.

9 BST insert (put)

Associate value with key. inserting L S E X A R C H Search for key, then two cases: M Key in tree ⇒ reset value. search for L ends P ・ at this null link Key not in tree ⇒ add new node. S ・ E X A R C H M create new node L P

S E X A R C H M reset links L P on the way up

Insertion into a BST

10 BST insert: Java implementation

Put. Associate value with key. inserting L Tr i c k y ! S E X A R public void put(Key key, Value val) C H { root = put(root, key, val); } M P private Node put(Node x, Key key, Value val) search for L ends at this null link { S if (x == null) return new Node(key, val); E X int cmp = key.compareTo(x.key); A R

if (cmp < 0) x.left = put(x.left, key, val); C H

else if (cmp > 0) x.right = put(x.right, key, val); M

else if (cmp == 0) x.val = val; create new node L P

return x; S } E X A R C H M reset links L P on the way up Cost. Number of compares = 1 + depth of node. Insertion into a BST

11 best case H C S Tree shape A E R X

Many BSTs correspond to same of keys. ・ typical case S Number of compares for bestsearch/insert case = 1 + depth of node. ・ H E X C S A R A E R X C H

best case typical case A worst case H S C C S E X E A E R X A R H C H R S X typical case S A worst case E X C BST possibilities A R E C H H R S X BottomA line. worstTree case shape depends on order of insertion. C E BST possibilities H 12 R S X

BST possibilities BST insertion: random order visualization

Ex. Insert keys in random order.

13 Binary search trees: quiz 2

Suppose that you insert n keys in random order into a BST. What is the expected height of the resulting BST?

P

H T A. ~ lg n

D O S U B. ~ ln n height

A E I Y C. ~ 2 lg n

D. ~ 2 ln n C M

E. ~ 4.31107 ln n L

14 Correspondence between BSTs and partitioning

P

H T

D O S U

A E I Y

C M

L

Remark. Correspondence is 1–1 if array has no duplicate keys.

15 BSTs: mathematical analysis

Proposition. If n distinct keys are inserted into a BST in random order, the expected number of compares for a search/insert is ~ 2 ln n. Pf. 1–1 correspondence with quicksort partitioning.

Proposition. [Reed, 2003] If n distinct keys are inserted into a BST in random order, the expected height is ~ 4.31107 ln n.

How Tall is a Tree? expected depth of How Tall is a Tree? function-call stack in quicksort Bruce Reed Bruce Reed CNRS, Paris, France CNRS, Paris, France [email protected] [email protected]

ABSTRACT purpose of this note to prove that for /3 -- ½ + ~,3 we ABSTRACT Let H~ be the height of a randompurpose binary of search this note tree toon prove n thathave: for /3 -- ½ + ~,3 we Let H~ be the height of a randomnodes. binary We search show thattree onthere n existshave: constants a = 4.31107... THEOREM 1. E(H~) = ~logn -/31oglogn + O(1) nodes. We show that there existsand/3 constants = 1.95... a = 4.31107... such that E(H~) = c~logn -/31oglogn + and Var(Hn) = O(1) . and/3 = 1.95... such that E(H~)O(1), = c~logn We also -/31oglogn show that +Var(H~) =THEOREM O(1). 1. E(H~) = ~logn -/31oglogn + O(1) and O(1), We also show that Var(H~) = O(1). Var(Hn) = O(1) . Remark By the definition of a, /3 = 3~ Categories and Subject Descriptors 7"g~" The first defi- Remark By the definition of a, nition/3 = 7"g~"given3~ Theis more first suggestive defi- of why this value is correct, Categories and Subject Descriptors as we will see. But… Worst-case height is n – 1. E.2 [Data Structures]: Trees nition given is more suggestive of why this value is correct, E.2 [Data Structures]: Trees as we will see. For more information on random binary search trees, one 1. THE RESULTS For more information on randommay binary consult search [6],[7], trees, [1], one [2], [9], [4], and [8]. may consult [6],[7], [1], [2], [9], [4],Remark and [8]. After I announced these results, Drmota(unpublished) Unlike quicksort, worst case matters — client may notA binary insertsearch tree is a inbinary random tree to each node oforder. which 1. THE RESULTS Remark After I announced these developedresults, Drmota(unpublished) an alternative proof of the fact that Var(Hn) = A binary search tree is a binary treewe haveto each associated node of awhich key; these keys axe drawn from some O(1) using completely different techniques. As our two totally ordered set and the key developedat v cannot an be alternative larger than proof of the fact that Var(Hn) = we have associated a key; these keys axe drawn from some O(1) using completely different proofstechniques. 16 illuminate As differentour two aspects of the problem, we have totally ordered set and the key atthe v keycannot at its be right larger child than nor smaller than the key at its left decided to submit the journal versions to the same journal child. Given a binary search treeproofs T and illuminate a new key different k, we aspects of the problem, we have the key at its right child nor smaller than the key at its left decided to submit the journal versionsand asked to the that same they journal be published side by side. child. Given a binary search treeinsert T and k intoa new T bykey traversing k, we the tree starting at the root and inserting k into the first emptyand asked position that at they which be publishedwe side by side. insert k into T by traversing the tree starting at the root 2. A MODEL and inserting k into the first emptyarrive. position We traverse at which the treewe by moving to the left child of the If we construct a binary search tree from a permutation arrive. We traverse the tree by movingcurrent to thenode left if kchild is smaller of the than the2. currentA MODEL key and moving of 1, ..., n and i is the first key in the permutation then: current node if k is smaller than theto currentthe right key child and movingotherwise. GivenIf we someconstruct permutation a binary of search tree from a permutation i appears at the root of the tree, the tree rooted at the to the right child otherwise. Givena set someof keys, permutation we construct of a binaryof 1, ...,search n and tree i isfrom the thisfirst key in the permutation then: left child of i contains the keys 1, ..., i - 1 and its shape a set of keys, we construct a binarypermutation search tree by insertingfrom this them iin appears the given at orderthe root into of an the tree, the tree rooted at the depends only on the order in which these keys appear in permutation by inserting them ininitially the given empty order tree. into an left child of i contains the keys 1, ..., i - 1 and its shape the permutation, mad the tree rooted at the right child of i initially empty tree. The height Hn of a random binarydepends search only tree on theT,~ orderon n in which these keys appear in nodes, constructed in this mannerthe startingpermutation, from amad random the tree rootedcontains at the the right keys childi + 1, of ..., i n and its shape depends only on The height Hn of a random binary search tree T,~ on n the order in which these keys appear in the permutation. nodes, constructed in this mannerequiprobable starting from permutation a random of 1,...,contains n, is knownthe keys to i be+ 1,close ..., n and its shape depends only on From this observation, one deduces that Hn is also the num- equiprobable permutation of 1,...,to n,alogn is known where to a be= close4.31107... the is orderthe unique in which solution these onkeys appear in the permutation. ber of levels of recursion required when Vazfilla Quicksort to alogn where a = 4.31107... [2,is ~)the ofunique the equation solution a log((2e)/a)on From = 1this (here observation, and elsewhere, one deduces that Hn is also the num- (i.e. the version of Quicksort in which the first element in [2, ~) of the equation a log((2e)/a)log = is1 the(here natural and elsewhere, logarithm). First,ber ofPittel[10] levels of showed recursion that required when Vazfilla Quicksort the permuation is chosen as the pivot) is applied to a random log is the natural logarithm). First,H,~/log Pittel[10] n --~ 3'showed almost that surely as (i.e.n --+ the c~ versionfor some of positiveQuicksort in which the first element in permutation of 1, ..., n. H,~/log n --~ 3' almost surely as constantn --+ c~ for7. Thissome constantpositive was knownthe permuation not to exceed is chosen c~ [11], as the pivot) is applied to a random Our observation also allows us to construct Tn from the top constant 7. This constant was knownand Devroye[3] not to exceed showed c~ [11], that "7 = permutationa, as a consequence of 1, ..., n.of the down. To ease our exposition, we think of T,~ as a labelling and Devroye[3] showed that "7 = a,fact as that a consequence E(Hn) ~ c~logn.of the Robson[12]Our observation has found also that allows Hn us to construct Tn from the top of a subtree of T~, the complete infinite binary tree. fact that E(Hn) ~ c~logn. Robson[12]does not has vary found much that from Hn experimentdown. Toto easeexperiment, our exposition, and we think of T,~ as a labelling We will expose the key associated with each node t of T~. does not vary much from experimentseems to haveexperiment, a fixed rangeand of widthof a notsubtree depending of T~, uponthe complete n. infinite binary tree. To underscore the relationship with Quicksort, we refer to seems to have a fixed range of widthDevroye not depending and Reed[5] upon proved n. that WeVar(Hn) will expose = O((log the log key n)2), associated with each node t of T~. the key at t as the pivot at t. Suppose then that we have Devroye and Reed[5] proved that Var(Hn)but this does= O((log not quitelog n)2), confirm Robson'sTo underscore findings. the Itrelationship is the with Quicksort, we refer to exposed the pivots for some of the nodes forming a subtree but this does not quite confirm Robson's findings. It is the the key at t as the pivot at t. Suppose then that we have exposed the pivots for some of theof nodesToo, rooted forming at athe subtree root of T~. Suppose further that for of Too, rooted at the root of T~.some Suppose node furthert of T~¢, that all forof the ancestors of t are in T,~ and Permission to make digital or hard copies of all or part of this work for we have chosen their pivots. Then, these choices determine personal or classroom use is granted withoutsome fee node provided t of that T~¢, copies all of the ancestors of t are in T,~ and Permission to make digital or hard copies of all or part of this work for the set of keys Kt which will appear at the (possibly empty) are not made or distributed for profit or wecommercial have chosen advantage their and pivots. that Then, these choices determine personal or classroom use is granted without fee provided that copies copies bear this notice and the full citationthe on set the of first keys page. Kt To which copy will appearsubtree at the of (possiblyT,~ rooted empty) at t, but will have no effect on the order are not made or distributed for profit or commercial advantage and that otherwise, to republish, to post on servers or to redistribute to lists, in which we expect the keys in Kt to appear. Indeed each copies bear this notice and the full citation on the first page. To copy subtree of T,~ rooted at t, but will have no effect on the order requires prior specific permissionand/or in a fee.which we expect the keys in Ktpermutation to appear. ofIndeed Kt is equallyeach likely. Thus, each of the keys otherwise, to republish, to post on servers or to redistribute to lists, in Kt will be equally likely to be the pivot. We let nt be requires prior specific permissionand/or aSTOC fee. 2000 Portland Oregon USApermutation of Kt is equally likely. Thus, each of the keys Copyright ACM 2000 1-58113-184-4/00/5...$5.00 the number of keys in this set and specify the pivot at t by STOC 2000 Portland Oregon USA in Kt will be equally likely to be the pivot. We let nt be Copyright ACM 2000 1-58113-184-4/00/5...$5.00 the number of keys in this set and specify the pivot at t by

479 479 ST implementations: summary

guarantee average case operations implementation on keys search insert search hit insert

sequential search n n n n equals() (unordered list)

binary search compareTo() (ordered array) log n n log n n

BST n n log n log n compareTo()

Why not shuffle to ensure a (probabilistic) guarantee of log n ?

17 3.2 BINARY SEARCH TREES

‣ BSTs ‣ iteration Algorithms ‣ ordered operations ‣ deletion

ROBERT SEDGEWICK | KEVIN WAYNE https://algs4.cs.princeton.edu Binary search trees: quiz 3

In which order does traverse(root) print the keys in the BST?

private void traverse(Node x) { if (x == null) return; traverse(x.left); StdOut.println(x.key); traverse(x.right); }

root

A. A C E H M R S X S E X B. S E A C R H M X A R C. C A M H R E X S C H M D. S E X A R C H M

19 Inorder traversal

・Traverse left subtree. ・Enqueue key. ・Traverse right subtree. public Iterable keys() { Queue q = new Queue(); BST inorder(root, q); return q; key val }

left right private void inorder(Node x, Queue q) { if (x == null) return; BST with smaller keys BST with larger keys inorder(x.left, q); key q.enqueue(x.key); smaller keys, in order larger keys, in order inorder(x.right, q); all keys, in order }

Property. Inorder traversal of a BST yields keys in ascending order.

20 Running time

Property. Inorder traversal of a BST takes linear time.

Silicon Valley

21 LEVEL-ORDER TRAVERSAL

Level-order traversal of a binary tree. ・Process root. ・Process children of root, from left to right. ・Process grandchildren of root, from left to right. … ・ S

E T

A R

C H

M

level-order traversal: S E T A R C H M

22 LEVEL-ORDER TRAVERSAL

Q. Given binary tree, how to compute level-order traversal?

S queue.enqueue(root); while (!queue.isEmpty()) E T { Node x = queue.dequeue(); A R if (x == null) continue; StdOut.println(x.item); queue.enqueue(x.left); C H queue.enqueue(x.right); } M

level-order traversal: S E T A R C H M

23 3.2 BINARY SEARCH TREES

‣ BSTs ‣ iteration Algorithms ‣ ordered operations ‣ deletion Omitted for midterm. Only Rank discussed in lecture. ROBERT SEDGEWICK | KEVIN WAYNE See book/videos for the rest. https://algs4.cs.princeton.edu Minimum and maximum

Minimum. Smallest key in BST. Maximum. Largest key in BST.

S max()max min() min E X A R C H M

Examples of BST order queries

Q. How to find the min / max?

25 Floor and ceiling

Floor. Largest key in BST ≤ query key. Ceiling. Smallest key in BST ≥ query key.

floor(G) S max() min() E X A R C H ceiling(Q) M

floor(D) Examples of BST order queries

Q. How to find the floor / ceiling?

26 Computing the floor

Floor. Largest key in BST ≤ query key.

Key idea. ・To compute floor(key), search for key. ・Both floor(key) and ceiling(key) must be on search path. Why?

floor(G) S max() min() E X A R C H M

floor(D) Examples of BST order queries

27 Computing the floor

key in node is too large (floor can’t be in right subtree)

public Key floor(Key key) { return floor(root, key, null); }

private Key floor(Node x, Key key, Key best) { if (x == null) return best; int cmp = key.compareTo(x.key); if (cmp < 0) return floor(x.left, key, best); else if (cmp > 0) return floor(x.right, key, x.key); else if (cmp == 0) return x.key; }

key in node is a candidate for floor key in node is better candidate than best (floor can’t be in left subtree) (x must be in right subtree of node containing best)

28 Rank and select

Rank. How many keys < key ? Select. Key of rank k.

Q. How to implement rank() and select() efficiently for BSTs? A. In each node, store the number of nodes in its subtree.

subtree count

8

6 S 1

2 E 3 X

A 1 2 R

C H 1 M

29 BST implementation: subtree counts

private class Node public int size() { { return size(root); } private Key key; private Value val; private int size(Node x) private Node left; { private Node right; if (x == null) return 0; private int count; return x.count; ok to call } } when x is null

number of nodes in subtree

private Node put(Node x, Key key, Value val) initialize subtree { count to 1 if (x == null) return new Node(key, val, 1); int cmp = key.compareTo(x.key); if (cmp < 0) x.left = put(x.left, key, val); else if (cmp > 0) x.right = put(x.right, key, val); else if (cmp == 0) x.val = val;

x.countx.count == 11 ++ size(x.left)size(x.left) ++ size(x.right);size(x.right); return x; }

30 Rank

node count 8 Rank. How many keys < key ? 6 S 1 2 E 3 X key < key in node? Recur on left subtree. A 1 2 R key == key in node? Everything in left subtree. C H 1 key > key in node? Everything in left subtree + 1 M + recursive result from right subtree.

public int rank(Key key) { return rank(key, root); }

private int rank(Key key, Node x) { if (x == null) return 0; int cmp = key.compareTo(x.key); if (cmp < 0) return rank(key, x.left); else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right); else if (cmp == 0) return size(x.left);

}

Note: use size(x.left) instead of x.left.count to avoid null reference. 31 BST: ordered symbol table operations summary

sequential binary BST search search

search n log n h

insert n n h

h = height of BST min / max n 1 h

floor / ceiling n log n h

rank n log n h

select n 1 h

ordered iteration n log n n n

order of growth of running time of ordered symbol table operations

32 ST implementations: summary

guarantee average case ordered key implementation ops? interface search insert search hit insert

sequential search equals() (unordered list) n n n n

binary search log n n log n n ✔ compareTo() (ordered array)

BST n � n � log n log n ✔ compareTo()

red-black BST log n log n log n log n ✔ compareTo()

Next week. Guarantee logarithmic performance for all operations.

33