Tutorial 2: Heapsort, Quicksort, Counting Sort, Radix Sort
Total Page:16
File Type:pdf, Size:1020Kb
Tutorial 2: Heapsort, Quicksort, Counting Sort, Radix Sort Mayank Saksena September 20, 2006 1 Heapsort We review Heapsort, and prove some loop invariants for it. For further information, see Chapter 6 of Introduction to Algorithms. HEAPSORT(A) 1 BUILD-MAX-HEAP(A) Ð eÒg Øh A 2 for i = ( ) downto 2 A i 3 swap A[1] and [ ] ×iÞ e A heaÔ ×iÞ e A 4 heaÔ- ( )= - ( ) 1 5 MAX-HEAPIFY(A; 1) BUILD-MAX-HEAP(A) ×iÞ e A Ð eÒg Øh A 1 heaÔ- ( )= ( ) Ð eÒg Øh A = 2 for i = ( ) 2 downto 1 3 MAX-HEAPIFY(A; i) MAX-HEAPIFY(A; i) i 1 Ð =LEFT( ) i 2 Ö =RIGHT( ) heaÔ ×iÞ e A A Ð > A i Ð aÖ g e×Ø Ð 3 if Ð - ( ) and ( ) ( ) then = i 4 else Ð aÖ g e×Ø = heaÔ ×iÞ e A A Ö > A Ð aÖ g e×Ø Ð aÖ g e×Ø Ö 5 if Ö - ( ) and ( ) ( ) then = i 6 if Ð aÖ g e×Ø = i A Ð aÖ g e×Ø 7 swap A[ ] and [ ] 8 MAX-HEAPIFY(A; Ð aÖ g e×Ø) 1 Loop invariants First, assume that MAX-HEAPIFY(A; i) is correct, i.e., that it makes the subtree with A root i a max-heap. Under this assumption, we prove that BUILD-MAX-HEAP( ) is correct, i.e., that it makes A a max-heap. A We show: at the start of iteration i of the for-loop of BUILD-MAX-HEAP( ) (line ; i ;:::;Ò 2), each of the nodes i +1 +2 is the root of a max-heap. Ò= Ò= ; Ò= ;:::;Ò Initialization Initially, i = 2 . Each node 2 2 +1 is a leaf, and therefore the root of a max-heap (of size 1). i Maintenance Just before iteration i, by the loop invariant, the children of node are i both roots of max-heaps; then MAX-HEAPIFY(A; i) makes the root of a max- > i heap. Thus, also every node j is the root of a max-heap. Hence, the loop invariant is established after i has been decremented. j > Termination When i =0, all nodes 0 are roots of a max-heap; including node 1. Now we argue that Heapsort is correct by proving the following loop invariant: At ::i the start of each iteration of the for-loop (line 2), the subarray A[1 ] is a max-heap A A i ::Ò Ò i containing the i smallest elements of , and the subarray [ +1 ] contains the largest elements of A sorted. Ò A ::Ò Ò Initialization Initially, i = . By line 1, [1 ] is a max-heap containing the A Ò ::Ò smallest (all) elements of A, and the subarray [ +1 ] is empty. ::i i Maintenance Just before some iteration, A[1 ] is a max-heap containing the small- A i ::Ò Ò i est elements of A, and the subarray [ +1 ] contains the largest elements of A sorted, by the loop invariant. A i A i::Ò Swapping A[1] and [ ] preserves the order of elements [ ], and the elements ::i A i::Ò A[1 1] are all smaller than [ ]. A; A ::i After decrementing heaÔ×iÞ e,thecalltoMAX-HEAPIFY( 1) makes [1 1] a max-heap, restoring the loop invariant before next iteration. A :: A Termination When i = 1, [1 1] is a max-heap containing the least element of , ::Ò Ò A and the subarray A[2 ] contains the 1 largest elements of sorted. In other words, A is sorted. 2 Quicksort QUICKSORT(A; Ô; Ö ) < Ö 1 if Ô A; Ô; Ö 2 Õ = PARTITION( ) 3 QUICKSORT(A; Ô; Õ 1) ;Ö 4 QUICKSORT(A; Õ +1 ) 2 PARTITION(A; Ô; Ö ) A Ö 1 Ü = [ ] Ô 2 i = 1 Ô Ö 3 for j = to 1 j Ü 4 if A[ ] i 5 i = +1 i A j 6 swap A[ ] and [ ] i A Ö 7 swap A[ + 1] and [ ] 8 return i +1 A Õ A Ô::Õ PARTITION(A; Ô; Ö ) rearranges and outputs an index such that [ 1] Õ < A Õ ::Ö A[ ] [ +1 ]. The followingproperty is a loop invariant(proof left as an exercise) for any index k : k i A k Ü 1. If Ô , then [ ] k j A k > Ü 2. If i +1 1, then [ ] Ö A k Ü 3. If k = , then [ ]= Performance analysis A; Ô; Ö The choice of pivot element (Ü in PARTITION( )) has a major impact on the performance of Quicksort. We analyze the effect of partitioning informally. The worst possible partitioning would be if we always chose the largest (or small- est) element as pivot – then we get one recursive call of size Ò 1 and one of size 0. Ì Ò Ì Ò Ò Since the partitioning is Θ(Ò), we then get the behaviour: ( )= ( 1)+Θ( ). Ò Ò( +1) 2 Ò Ò Ò ¡¡¡ Ò Expanding, we get Ì ( )=Θ( )+Θ( 1)+ +Θ(1) = Θ 2 = Θ( ). The best partitioningoccurswhen at each recursivecall, the subproblemsare halved, Ì Ò Ì Ò= Ò a ; b ; or: ( )=2 ( 2)+ Θ( ). The Master theorem ( = 2 = 2 log a = 1) gives b Ò Ò Ò Ì ( )=Θ( lg ). Now consider a case when the partitioning always produces a proportional split, Ò Ì «Ò Ì « Ò Ò : < « < e.g.: Ì ( )= ( )+ ((1 ) )+Θ( ), for some constant 0 5 1. Ò « > We can obtain a bound by e.g. drawing a recursion tree for Ì ( ). Since k « Ò «Ò ¡¡¡ « Ò (1 ), the longest path from the root to a leaf is =1. Hence Ò the height of tree is log Ò. For each level there is a Θ( ) term. Additionally, if the 1=« log Ò log 2 1=« 1=« tree was complete, it would have 2 = Ò leaves each with constant cost. Ò Ç Ò Ò All in all we get Ì ( ) = ( lg ). As an exercise, prove this using the substitution method. For further information about Quicksort, see Chapter 7 of Introduction to Algo- rithms. 3 3 Counting sort k Ò k Ç Ò Counting sort sorts integers in [0; k ] in time Θ( + ). If = ( ), it is therefore B C ::k Θ(Ò). Array is for output, and [0 ] for storage. COUNTING-SORT(A; B ; k ) k C i 1 for i =0 to [ ]=0 Ð eÒg Øh A 2 for i =1 to ( ) A i C A i 3 C [ ( )] = [ ( )]+1 k 4 for i =1 to i C i C i 5 C [ ]= [ ]+ [ 1] Ð eÒg Øh A 6 for i = ( ) downto 1 C A i A i 7 B [ [ [ ]]] = [ ] A i C A i 8 C [ [ ]] = [ [ ]] 1 4 Radix sort d d Radix sort assumes that each element of A has digits, numbered 1 to , where 1 is the lowest-order digit. RADIX-SORT(A; d) d 1 for i =1 to i 2 usea stable sort to sort A on digit Definition A sorting algorithm is stable if elements with the same value appear in the same order in the output as in the input. Ò k Counting sort is stable. Using counting sort, radix sort is Θ(d( + )), when sorting d k Ò -digit numbers where each digit can take possible values. 5 Recommended problems Recommended problems from Introduction to Algorithms. I go through some... ¯ Heapsort time bounds: exercise 6.4-3 on p. 136 Ì Ò Ì «Ò Ì « Ò Ò Ç Ò Ò : < ¯ Prove that ( ) = ( )+ ((1 ) )+Θ( ) = ( lg ), where 0 5 < « 1, using e.g. the substitution method ; ; ; ; ; ¯ Quicksort: exercises 7.1- 1 2 3 p. 148, 7.2- 1 3 4 5 p. 153, 7.4-2 p. 159 ; ; ¯ Linear time sorting: exercises 8.2- 2 3 4 p. 170, 8.3-4 p. 173, problem 8-3 p. 179 4.