<<

Algorithms ROBERT SEDGEWICK | KEVIN WAYNE

3.2 BINARY SEARCH TREES

‣ BSTs ‣ ordered operations Algorithms ‣ deletion FOURTH EDITION

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu 3.2 BINARY SEARCH TREES

‣ BSTs ‣ ordered operations Algorithms ‣ deletion

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Binary search trees

Definition. A BST is a binary in symmetric order. root a left link a subtree A is either: Empty. ・ right child ・Two disjoint binary trees (left and right). of root null links

Anatomy of a binary tree

parent of A and R Symmetric order. Each node has a key, key left link S and every node’s key is: of E E X ・Larger than all keys in its left subtree. A R 9 value H associated ・Smaller than all keys in its right subtree. with R

keys smaller than E keys larger than E

Anatomy of a binary

3 demo

Search. If less, go left; if greater, go right; if equal, search hit.

successful search for H

S

E X

A R

C H

M

4 Binary search tree demo

Insert. If less, go left; if greater, go right; if null, insert.

insert G

S

E X

A R

C H

G M

5 BST representation in Java

Java definition. A BST is a reference to a root Node.

A Node is composed of four fields: ・A Key and a Value. ・A reference to the left and right subtree.

smaller keys larger keys

private class Node { private Key key; BST private Value val; Node key private Node left, right; val public Node(Key key, Value val) { left right this.key = key; this.val = val; } BST with smaller keys BST with larger keys } Binary search tree

Key and Value are generic types; Key is Comparable 6 BST implementation (skeleton)

public class BST, Value> { private Node root; root of BST

private class Node { /* see previous slide */ }

public void put(Key key, Value val) { /* see next slides */ }

public Value get(Key key) { /* see next slides */ }

public void delete(Key key) { /* see next slides */ }

public Iterable iterator() { /* see next slides */ }

}

7 BST search: Java implementation

Get. Return value corresponding to given key, or null if no such key.

public Value get(Key key) { Node x = root; while (x != null) { int cmp = key.compareTo(x.key); if (cmp < 0) x = x.left; else if (cmp > 0) x = x.right; else if (cmp == 0) return x.val; } return null; }

Cost. Number of compares is equal to 1 + depth of node.

8 BST insert

Put. Associate value with key. inserting L S E X A R Search for key, then two cases: C H M ・Key in tree ⇒ reset value. search for L ends P at this null link ・Key not in tree ⇒ add new node. S E X A R C H M create new node L P

S E X A R C H M reset links L P on the way up

Insertion into a BST

9 BST insert: Java implementation

Put. Associate value with key.

concise, but tricky, public void put(Key key, Value val) recursive code; read carefully! { root = put(root, key, val); }

private Node put(Node x, Key key, Value val) { if (x == null) return new Node(key, val); int cmp = key.compareTo(x.key); if (cmp < 0) x.left = put(x.left, key, val); else if (cmp > 0) x.right = put(x.right, key, val); else if (cmp == 0) x.val = val; return x; }

Cost. Number of compares is equal to 1 + depth of node.

10 best case H C S Tree shape A E R X

・Many BSTs correspond to same of keys. typical case S Number of compares for bestsearch/insert case is equal to 1 + depth of node. ・ H E X C S A R A E R X C H

A best case typical case S worst case H C C S E X E A E R X A R H C H R S X typical case S A worst case E X C BST possibilities A R E C H H R S X Bottom Aline. Treeworst shape case depends on order of insertion. C E BST possibilities H 11 R S X

BST possibilities BST insertion: random order visualization

Ex. Insert keys in random order.

12 Sorting with a binary

Q. What is this ?

0. Shuffle the array of keys. 1. Insert all keys into a BST. 2. Do an inorder traversal of BST.

A. It's not a sorting algorithm (if there are duplicate keys)!

Q. OK, so what if there are no duplicate keys? Q. What are its properties?

13 Correspondence between BSTs and partitioning

P

H T

D O S U

A E I Y

C M

L

Remark. Correspondence is 1–1 if array has no duplicate keys.

14 BSTs: mathematical analysis

Proposition. If N distinct keys are inserted into a BST in random order, the expected number of compares for a search/insert is ~ 2 ln N. Pf. 1–1 correspondence with quicksort partitioning.

Proposition. [Reed, 2003] If N distinct keys are inserted in random order, expected height of tree is ~ 4.311 ln N. How Tall is a Tree? How Tall is a Tree?

Bruce Reed Bruce Reed CNRS, Paris, France CNRS, Paris, France [email protected] [email protected]

ABSTRACT purpose of this note to prove that for /3 -- ½ + ~,3 we ABSTRACT Let H~ be the height of a randompurpose binary of search this note tree toon prove n thathave: for /3 -- ½ + ~,3 we Let H~ be the height of a randomnodes. binary We search show thattree onthere n existshave: constants a = 4.31107... THEOREM 1. E(H~) = ~logn -/31oglogn + O(1) nodes. We show that there existsand/3 constants = 1.95... a = 4.31107... such that E(H~) = c~logn -/31oglogn + and Var(Hn) = O(1) . and/3 = 1.95... such that E(H~)O(1), = c~logn We also -/31oglogn show that +Var(H~) =THEOREM O(1). 1. E(H~) = ~logn -/31oglogn + O(1) and O(1), We also show that Var(H~) = O(1). Var(Hn) = O(1) . Remark By the definition of a, /3 = 3~ Categories and Subject Descriptors 7"g~" The first defi- Remark By the definition of a, nition/3 = 7"g~"given3~ Theis more first suggestive defi- of why this value is correct, Categories and Subject Descriptors as we will see. But… Worst-case height is N. E.2 [Data Structures]: Trees nition given is more suggestive of why this value is correct, E.2 [Data Structures]: Trees as we will see. For more information on random binary search trees, one 1. THE RESULTS For more information on randommay binary consult search [6],[7], trees, [1], one [2], [9], [4], and [8]. may consult [6],[7], [1], [2], [9], [4],Remark and [8]. After I announced these results, Drmota(unpublished) [ exponentially small chance when keys are Ainserted binary search tree inis a binaryrandom tree to each ordernode of which ] 1. THE RESULTS Remark After I announced these developedresults, Drmota(unpublished) an alternative proof of the fact that Var(Hn) = A binary search tree is a binary treewe haveto each associated node of awhich key; these keys axe drawn from some O(1) using completely different techniques. As our two totally ordered set and the key developedat v cannot an be alternative larger than proof of the fact that Var(Hn) = we have associated a key; these keys axe drawn from some O(1) using completely different proofstechniques. illuminate As15 differentour two aspects of the problem, we have totally ordered set and the key atthe v keycannot at its be right larger child than nor smaller than the key at its left decided to submit the journal versions to the same journal child. Given a binary search treeproofs T and illuminate a new key different k, we aspects of the problem, we have the key at its right child nor smaller than the key at its left decided to submit the journal versionsand asked to the that same they journal be published side by side. child. Given a binary search treeinsert T and k intoa new T bykey traversing k, we the tree starting at the root and inserting k into the first emptyand asked position that at they which be publishedwe side by side. insert k into T by traversing the tree starting at the root 2. A MODEL and inserting k into the first emptyarrive. position We traverse at which the treewe by moving to the left child of the If we construct a binary search tree from a permutation arrive. We traverse the tree by movingcurrent to thenode left if kchild is smaller of the than the2. currentA MODEL key and moving of 1, ..., n and i is the first key in the permutation then: current node if k is smaller than theto currentthe right key child and movingotherwise. GivenIf we someconstruct permutation a binary of search tree from a permutation i appears at the root of the tree, the tree rooted at the to the right child otherwise. Givena set someof keys, permutation we construct of a binaryof 1, ...,search n and tree i isfrom the thisfirst key in the permutation then: left child of i contains the keys 1, ..., i - 1 and its shape a set of keys, we construct a binarypermutation search tree by insertingfrom this them iin appears the given at orderthe root into of an the tree, the tree rooted at the depends only on the order in which these keys appear in permutation by inserting them ininitially the given empty order tree. into an left child of i contains the keys 1, ..., i - 1 and its shape the permutation, mad the tree rooted at the right child of i initially empty tree. The height Hn of a random binarydepends search only tree on theT,~ orderon n in which these keys appear in nodes, constructed in this mannerthe startingpermutation, from amad random the tree rootedcontains at the the right keys childi + 1, of ..., i n and its shape depends only on The height Hn of a random binary search tree T,~ on n the order in which these keys appear in the permutation. nodes, constructed in this mannerequiprobable starting from permutation a random of 1,...,contains n, is knownthe keys to i be+ 1,close ..., n and its shape depends only on From this observation, one deduces that Hn is also the num- equiprobable permutation of 1,...,to n,alogn is known where to a be= close4.31107... the is orderthe unique in which solution these onkeys appear in the permutation. ber of levels of recursion required when Vazfilla Quicksort to alogn where a = 4.31107... [2,is ~)the ofunique the equation solution a log((2e)/a)on From = 1this (here observation, and elsewhere, one deduces that Hn is also the num- (i.e. the version of Quicksort in which the first element in [2, ~) of the equation a log((2e)/a)log = is1 the(here natural and elsewhere, logarithm). First,ber ofPittel[10] levels of showed recursion that required when Vazfilla Quicksort the permuation is chosen as the pivot) is applied to a random log is the natural logarithm). First,H,~/log Pittel[10] n --~ 3'showed almost that surely as (i.e.n --+ the c~ versionfor some of positiveQuicksort in which the first element in permutation of 1, ..., n. H,~/log n --~ 3' almost surely as constantn --+ c~ for7. Thissome constantpositive was knownthe permuation not to exceed is chosen c~ [11], as the pivot) is applied to a random Our observation also allows us to construct Tn from the top constant 7. This constant was knownand Devroye[3] not to exceed showed c~ [11], that "7 = permutationa, as a consequence of 1, ..., n.of the down. To ease our exposition, we think of T,~ as a labelling and Devroye[3] showed that "7 = a,fact as that a consequence E(Hn) ~ c~logn.of the Robson[12]Our observation has found also that allows Hn us to construct Tn from the top of a subtree of T~, the complete infinite binary tree. fact that E(Hn) ~ c~logn. Robson[12]does not has vary found much that from Hn experimentdown. Toto easeexperiment, our exposition, and we think of T,~ as a labelling We will expose the key associated with each node t of T~. does not vary much from experimentseems to haveexperiment, a fixed rangeand of widthof a notsubtree depending of T~, uponthe complete n. infinite binary tree. To underscore the relationship with Quicksort, we refer to seems to have a fixed range of widthDevroye not depending and Reed[5] upon proved n. that WeVar(Hn) will expose = O((log the log key n)2), associated with each node t of T~. the key at t as the pivot at t. Suppose then that we have Devroye and Reed[5] proved that Var(Hn)but this does= O((log not quitelog n)2), confirm Robson'sTo underscore findings. the Itrelationship is the with Quicksort, we refer to exposed the pivots for some of the nodes forming a subtree but this does not quite confirm Robson's findings. It is the the key at t as the pivot at t. Suppose then that we have exposed the pivots for some of theof nodesToo, rooted forming at athe subtree root of T~. Suppose further that for of Too, rooted at the root of T~.some Suppose node furthert of T~¢, that all forof the ancestors of t are in T,~ and Permission to make digital or hard copies of all or part of this work for we have chosen their pivots. Then, these choices determine personal or classroom use is granted withoutsome fee node provided t of that T~¢, copies all of the ancestors of t are in T,~ and Permission to make digital or hard copies of all or part of this work for the set of keys Kt which will appear at the (possibly empty) are not made or distributed for profit or wecommercial have chosen advantage their and pivots. that Then, these choices determine personal or classroom use is granted without fee provided that copies copies bear this notice and the full citationthe on set the of first keys page. Kt To which copy will appearsubtree at the of (possiblyT,~ rooted empty) at t, but will have no effect on the order are not made or distributed for profit or commercial advantage and that otherwise, to republish, to post on servers or to redistribute to lists, in which we expect the keys in Kt to appear. Indeed each copies bear this notice and the full citation on the first page. To copy subtree of T,~ rooted at t, but will have no effect on the order requires prior specific permissionand/or in a fee.which we expect the keys in Ktpermutation to appear. ofIndeed Kt is equallyeach likely. Thus, each of the keys otherwise, to republish, to post on servers or to redistribute to lists, in Kt will be equally likely to be the pivot. We let nt be requires prior specific permissionand/or aSTOC fee. 2000 Portland Oregon USApermutation of Kt is equally likely. Thus, each of the keys Copyright ACM 2000 1-58113-184-4/00/5...$5.00 the number of keys in this set and specify the pivot at t by STOC 2000 Portland Oregon USA in Kt will be equally likely to be the pivot. We let nt be Copyright ACM 2000 1-58113-184-4/00/5...$5.00 the number of keys in this set and specify the pivot at t by

479 479 ST implementations: summary

guarantee average case operations implementation on keys search insert search hit insert

sequential search equals() (unordered list) N N ½ N N

binary search compareTo() (ordered array) lg N N lg N ½ N

BST N N 1.39 lg N 1.39 lg N compareTo()

Why not shuffle to ensure a (probabilistic) guarantee of 4.311 ln N?

16 3.2 BINARY SEARCH TREES

‣ BSTs ‣ ordered operations Algorithms ‣ deletion

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Minimum and maximum

Minimum. Smallest key in table. Maximum. Largest key in table.

S max()max min() min E X A R C H M

Examples of BST order queries

Q. How to find the min / max?

18 Floor and ceiling

Floor. Largest key ≤ a given key. Ceiling. Smallest key ≥ a given key.

floor(G) S max() min() E X A R C H ceiling(Q) M

floor(D) Examples of BST order queries

Q. How to find the floor / ceiling?

19 Computing the floor

Case 1. [k equals the key in the node] fnding floor(G) S The floor of k is k. E X A R C H G is less than S so Case 2. [k is less than the key in the node] M floor(G) must be on the left The floor of k is in the left subtree. S E X A R Case 3. [k is greater than the key in the node] C H The floor of is in the right subtree M k G is greater than E so (if there is any key in right subtree); floor(G) could be ≤ k on the right S otherwise it is the key in the node. E X A R C H M floor(G)in left subtree is null S E X A result R C H M 20 Computing the foor function Computing the floor

fnding floor(G) S public Key floor(Key key) E X { A R Node x = floor(root, key); C H G is less than S so if (x == null) return null; M floor(G) must be return x.key; on the left } S private Node floor(Node x, Key key) E X { A R if (x == null) return null; C H int cmp = key.compareTo(x.key); M G is greater than E so floor(G) could be if (cmp == 0) return x; on the right S E X if (cmp < 0) return floor(x.left, key); A R C H Node t = floor(x.right, key); M if (t != null) return t; floor(G)in left else return x; subtree is null S } E X A result R C H M 21 Computing the foor function Rank and select

Q. How to implement rank() and select() efficiently?

A. In each node, we store the number of nodes in the subtree rooted at that node; to implement size(), return the count at the root.

node count 8 6 S 1 2 E 3 X

A 1 2 R C H 1 M

22 BST implementation: subtree counts

private class Node public int size() { { return size(root); } private Key key; private Value val; private int size(Node x) private Node left; { private Node right; if (x == null) return 0; private int count; return x.count; ok to call } } when x is null

number of nodes in subtree

private Node put(Node x, Key key, Value val) initialize subtree { count to 1 if (x == null) return new Node(key, val, 1); int cmp = key.compareTo(x.key); if (cmp < 0) x.left = put(x.left, key, val); else if (cmp > 0) x.right = put(x.right, key, val); else if (cmp == 0) x.val = val; x.count = 1 + size(x.left) + size(x.right); return x; } 23 Rank

Rank. How many keys < k ? node count 8 6 S 1 Easy recursive algorithm (3 cases!) 2 E 3 X

A 1 2 R C H 1 M

public int rank(Key key) { return rank(key, root); }

private int rank(Key key, Node x) { if (x == null) return 0; int cmp = key.compareTo(x.key); if (cmp < 0) return rank(key, x.left); else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right); else if (cmp == 0) return size(x.left); }

24 Inorder traversal

・Traverse left subtree. ・Enqueue key. ・Traverse right subtree.

public Iterable keys()

{ BST Queue q = new Queue(); inorder(root, q); key val return q; } left right

private void inorder(Node x, Queue q) BST with larger keys { BST with smaller keys if (x == null) return; smaller keys, in order key larger keys, in order inorder(x.left, q); q.enqueue(x.key); all keys, in order inorder(x.right, q); }

Property. Inorder traversal of a BST yields keys in ascending order.

25 BST: ordered symbol table operations summary

sequential binary BST search search

search N lg N h

insert N N h h = height of BST (proportional to log N min / max N 1 h if keys inserted in random order)

floor / ceiling N lg N h

rank N lg N h

select N 1 h

ordered iteration N log N N N

order of growth of running time of ordered symbol table operations

26 3.2 BINARY SEARCH TREES

‣ BSTs ‣ ordered operations Algorithms ‣ deletion

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu ST implementations: summary

guarantee average case ordered operations implementation ops? on keys search insert delete search hit insert delete sequential search N N N ½ N N ½ N equals() ()

binary search lg N N N lg N ½ N ½ N ✔ compareTo() (ordered array)

BST N N N 1.39 lg N 1.39 lg N ? ? ? ✔ compareTo()

Next. Deletion in BSTs.

28 BST deletion: lazy approach

To remove a node with a given key: ・Set its value to null. ・Leave key in tree to guide search (but don't consider it equal in search).

E E

A S A S

delete I C I C ☠ tombstone

H R H R

N N

Cost. ~ 2 ln N' per insert, search, and delete (if keys in random order), where N' is the number of key-value pairs ever inserted in the BST.

Unsatisfactory solution. Tombstone (memory) overload.

29 Deleting the minimum

To delete the minimum key:

Go left until finding a node with a null left link. go left until S ・ reaching null Replace that node by its right link. left link E X ・ A R ・Update subtree counts. C H M

return that S node’s right link E X A R C H public void deleteMin() M { root = deleteMin(root); } available for garbage collection private Node deleteMin(Node x) update links and node counts after recursive calls { 7 S if (x.left == null) return x.right; 5 X x.left = deleteMin(x.left); E C R x.count = 1 + size(x.left) + size(x.right); H return x; M } Deleting the minimum in a BST

30 Hibbard deletion

To delete a node with key k: search for node t containing key k.

Case 0. [0 children] Delete t by setting parent link to null.

deleting C update counts after recursive calls 7 S S S 5 E X E X E X 1 A R A R A R H C H H M M replace with M node to delete null link available for garbage collection C

31 Hibbard deletion

To delete a node with key k: search for node t containing key k.

Case 1. [1 child] Delete t by replacing parent link.

deleting R update counts after recursive calls 7 S S 5 S E X E X E X A R A H A H C H C M C M node to delete M replace with child link available for garbage collection R

32 Hibbard deletion deleting E node to delete S To delete a node with key k: search for node tE containingX key k. A R H C search for key E Case 2. [2 children] M x has no left child x t t ・Find successor of . S ・Delete the minimum in t's right subtree. E butX don't garbage collect x A x R still a BST ・Put x in t's spot. C H successor deleting E min(t.right) M go right, then node to delete go left until S reaching null E X left link S A R x E X C H deleteMin(t.right) search for key E t.left H M A R

t C M S E X 7 S A x R 5 H X C H successor min(t.right) A R M go right, then C M update links and go left until node counts after reaching null recursive calls left link S Deletion in a BST x E X 33 deleteMin(t.right) t.left H A R C M

7 5 S H X A R C M update links and node counts after recursive calls

Deletion in a BST Hibbard deletion: Java implementation

public void delete(Key key) { root = delete(root, key); }

private Node delete(Node x, Key key) { if (x == null) return null; int cmp = key.compareTo(x.key);

if (cmp < 0) x.left = delete(x.left, key); search for key else if (cmp > 0) x.right = delete(x.right, key); else { if (x.right == null) return x.left; no right child if (x.left == null) return x.right; no left child

Node t = x; x = min(t.right); replace with x.right = deleteMin(t.right); successor x.left = t.left; } update subtree x.count = size(x.left) + size(x.right) + 1; counts return x; }

34 Hibbard deletion: analysis

Unsatisfactory solution. Not symmetric.

Surprising consequence. Trees not random (!) ⇒ √ N per op. Longstanding open problem. Simple and efficient delete for BSTs.

35 ST implementations: summary

guarantee average case ordered operations implementation ops? on keys search insert delete search hit insert delete sequential search N N N ½ N N ½ N equals() (linked list)

binary search lg N N N lg N ½ N ½ N ✔ compareTo() (ordered array)

BST N N N 1.39 lg N 1.39 lg N √ N ✔ compareTo()

other operations also become √N if deletions allowed

Next lecture. Guarantee logarithmic performance for all operations.

36