Sorting Review

z Insertion Introduction to Algorithms z T(n) = Θ(n2) z In-place z z T(n) = Θ(n lg(n)) z Not in-place CSE 680 z (from homework) Prof. Roger Crawfis z T(n) = Θ(n2) z In-place Seems pretty good. z Heap Sort Can we do better? z T(n) = Θ(n lg(n)) z In-place

Sorting Comppgarison Sorting

z Assumptions z Given a set of n values, there can be n! 1. No knowledge of the keys or numbers we permutations of these values. are sorting on. z So if we look at the behavior of the 2. Each key supports a comparison interface over all possible n! or operator. inputs we can determine the worst -case 3. Sorting entire records, as opposed to complexity of the algorithm. numbers,,p is an implementation detail. 4. Each key is unique (just for convenience). Comparison Sorting Decision Tree Decision Tree Model

z Decision tree model ≤ 1:2 > z Full binary tree 2:3 1:3 z A full binary tree (sometimes proper binary tree or 2- ≤≤> > tree) is a tree in which every node other than the leaves has two child ren <1,2,3> ≤ 1:3 >><2,1,3> ≤ 2:3 z Internal node represents a comparison. z Ignore control, movement, and all other operations, just <1,3,2> <3,1,2> <2,3,1> <3,2,1> see comparison z Each leaf represents one possible result (a permutation of the elements in sorted order). Internal node i:j indicates comparison between ai and aj. z The height of the tree (i . e., longest path) is the suppose three elements < a1, a2, a3> with instance <6, 8, 5> lower bound. Leaf node <π(1), π(2), π(3)> indicates ordering aπ(1)≤ aπ(2)≤ aπ(3). Path of bold lines indicates sorting path for <6,8,5>. There are total 3!=6 possible permutations (paths).

Decision Tree Model QuickSort Design

z The longest path is the worst case number of z Follows the divide-and-conquer paradigm. comparisons. The length of the longest path is the z Divide: Partition (separate) the array A[p..r] into two height of the decision tree. (possibly empty) subarrays A[p..q–1] and A[q+1..r]. z Theorem 8.1: Anyyp al gorithm z Each element in A[p..q–1] < A[q]. requires Ω(nlg n) comparisons in the worst case. z A[q] < each element in A[q+1..r]. z Index q is computed as part of the partitioning procedure. z Proof: z Conquer: Sort the two subarrays by recursive calls to z Suppose height of a decision tree is h, and number of quikicksor t. paths (i,e,, permutations) is n!. z Since a binary tree of height h has at most 2h leaves, z Combine: The subarrays are sorted in place – no z n! ≤ 2h , so h ≥ lg (n!) ≥Ω(nlg n))( (B y e quation 3.18 ). workwork isis neededneeded to cocombinembine tthem.hem. z That is to say: any comparison sort in the worst z How do the divide and combine steps of quicksort case needs at least nlg n comparisons. compare with those of merge sort? Pseudocode Example

p r Quicksort(A, p, r) initially: 2 5 8 3 9 4 1 7 10 6 note: pivot (x) = 6 if p

Exampp(le (Continued ) Partitioning

next iteration: 2 5 3 8 9 4 1 7 10 6 z Select the last element A[r] in the subarray i j A[p..r] as the pivot – the element around which next iteration: 2 5 3 8 9 4 1 7 10 6 to partition. ij next iteration: 252 5 3 4 989 8 17101 7 10 6 Partition(A, p, r) z As the pp,yrocedure executes, the array is ij x, i := A[r], p – 1; partitioned into four (possibly empty) regions. next iteration: 2 5 3 4 1 8 9 7 10 6 for j := p to r – 1 do 1. A[p..i ] — All entries in this region are < pivot. i j if A[j] ≤ x then 2. A[i+1..j – 1] — All entries in this region are > pivot. next iteration: 2 5 3 4 1 8 9 7 10 6 i:= i+ 1; 3. A[r] = pivot. ij A[i] ↔ A[j] 4. A[j..r – 1] — Not known how they compare to pivot. next iteration: 2 5 3 4 1 8 9 7 10 6 A[i+ 1] ↔ A[[]r]; z The above hold before each iteration of the for ijreturn i+ 1 loop, and constitute a loop invariant. (4 is not part after final swap: 2 53416 9 7 10 8 of the loopi.) ij Correctness of Partition Correctness of Partition

z Use loop invariant. Case 1: z Initi ali zati on: p i j r z Before first iteration >x x z A[p..i] and A[i+1..j – 1] are empty – Conds. 1 and 2 are satisfied (trivially). z r is the index of the pivot Partition(A, p, r) ≤ x > x z Cond. 3 is satisfied. x, i := A[r], p – 1; p i j r for j := p to r – 1 do x z Maintenance: if A[j] ≤ x then z Case 1: A[j] > x i:= i+ 1; A[i] ↔ A[j] z Increment j only. ≤ x > x A[i + 1] ↔ A[r]; z Loop Invariant is maintained. return i+ 1

Correctness of Partition Correctness of Partition

z Case 2: A[j] ≤ x z Increment j z Termination: z Condition 2 is z Increment i z When the loop terminates, j = r, so all elements maintained. z Swap A[i] and A[j] in A are partitioned into one of the three cases: z A[r] is unaltered. z Condition 1 is z A[p..i] ≤ pivot z Condition 3 is maintained. z A[i+1..j – 1] > pivot maintained. p i j r z A[r] = pivot ≤ x x z The last two lines swap A[i+1] and A[r]. z Pivot moves from the end of the array to ≤ x > x between the two subarrays. p i j r z Thus, procedure partition correctly performs x the divide step.

≤ x > x Comppylexity of Partition Quicksort Overview

z PartitionTime(n) is given by the number z TtTo sort a[le ft...ri g ht]: of iterations in the for loop. 1. if left < right: z Θ(n) : n = r – p +1+ 1. 111.1. Par tition a[le ft...ri ght] such ththtat: Partition(A, p, r) x, i := A[r], p – 1; all a[left...p-1] are less than a[p], and for j:j := p to r – 1 do if A[j] ≤ x then all a[p+1...right] are >= a[p] i:= i+ 1; 1.2. Quicksort a[left...p-1] A[i] ↔ A[j] A[i + 1] ↔ A[r]; 1.3. Quicksort a[p+1...right] return i+ 1 2. Terminate

Partitioning in Quicksort Alternative Partitioning

z A key step in the Quicksort algorithm is z Choose an array value (sa y, the first) to use partitioning the array as the pivot z Starting from the left end, find the first z We choose some ((y)any) number p in the element that is greater than or equal to the array to use as a pivot pivot z We partition the array into three parts: z Searching backward from the right end, find the first element that is less than the pivot p z Interchange (swap) these two elements z Repeat, searching from where we left off, numbers less p numbers greater than or until done than p equal to p Alternative Partitioning Examppple of partitionin g

z choose pivot: 4 3 6 9 2 4 3 1 2 1 8 9 3 5 6 z To partiti on a[l eft ...ri ght] : z search: 4 3 6 9 2 4 3 1 2 1 8 9 3 5 6 1. Set pivot = a[left], l = left + 1, r = right; z swap: 4 3 3 9 2 4 3 1 2 1 8 9 6 5 6 2. while l < r, do z search: 4 3 3 9 2 4 3 1 2 1 8 9 6 5 6 2.1. while l < right & a[l] < pivot , set l = l + 1 z swap: 4 331 2 4 3 1 2 9 8 9 6 5 6 222.2. while r > lftleft & a[][r] >= pivo t , se t r = r - 1 z search: 4 33 1 24 3 1 2 9 8 9 6 5 6 2.3. if l < r, swap a[l] and a[r] z swap: 4 33 1 22 3 1 4 9 8 9 6 5 6 3. Set a[le ft] = a[r ], a[r ] = pivo t z search: 4 3 3 1 2 2 3 1 4 9 8 9 6 5 6 4. Terminate z swap with pivot: 1 3 3 1 2 2 3 4 4 9 8 9 6 5 6

Partition Implementation (Java) Quicksort Implementation (Java)

static int Partition(int[] a, int left, int right) { static void Quicksort(int[] arra y, int left, int rigg)ht) itint p = a[le ft], l = lftleft + 1, r = rig ht; while (l < r) { { while (l < right && a[l] < p) l++; if (left < right) { while (r > left && a[r] >= p) r--; int p = Partition(array, left, right); if (l < r) { Quicksort(array, left, p - 1); int temp = a[l]; a[l] = a[r]; a[r] = temp; Quicksort(array, p + 1, right); } } } a[left] = a[r]; } a[[]r] = p; return r; } Analysis of quicksort—best case Partitioning at various levels

z Suppose each partition operation divides the array almost exactly in half

z Then the depth of the recursion in log2n z Because that’s how many times we can halve n z We note that z Each partition is linear over its subarray z All the partitions at one level cover the array

Best Case Analysis Worst case

z We cut the array size in half each time z In the worst case, partitioning always z So the depth of the recursion in log n divides the size n array into these three 2 parts: z At each level of the recursion, all the partitions at that level do work that is linear z A length one part, containing the pivot itself in n z A length zero part, and z AlA leng thn-1 parttiithilt, containing everything else z O(log2n) * O(n) = O(n log2n) z We don’t recur on the zero-length part z Hence in the best case, quicksort has O(n log n) z RiRecurring on thlththe length n-1 partit requires 2 (in the worst case) recurring to depth n-1 z What about the worst case? Worst case ppgartitioning Worst case for quicksort

z In the worst case, recursion may be n levels deep (for an array of si ze n) z But the partitioning work done at each level is still n z O(n) * O(n) = O(n2) z So worst case for Quicksort is O(n2) z When does this happen? z There are many arrangements that could make this happen z Here are two common cases: z When the array is already sorted z When the array is inversely sorted (sorted in the opposite order)

Typical case for quicksort Pickinggp a better pivot

z If the array is sorted to begin with, z Before, we picked the first element of the Quicksort is terrible: O(n2) subarray titto use as a pivot z If the array is already sorted, this results in z It is possible to construct other bad cases O(n2) behavior z It’s no better if we pick the last element z However, Quicksort is usually O(n log n) 2 z We could do an optimal quicksort z The constants are so good that Quicksort is (guaranteed O(n log n)) if we always picked a pivot value that exactly cuts the array in generally the faster algorithm. half z Most real-world sorting is done by z Such a value is called a median: half of the values in the array are larger, half are smaller Quicksort z The easiest way to find the median is to sort the arrayyp and pick the value in the middle ()(!) Median of three Quicksort for Small Arrays

z Obviously, it doesn’ t make sense to sort the z For very small arrays (N<= 20), quicksort array in order to find the median to use as a does not perform as well as pivot. z A good cutoff range is N=10 z Instead, compare just three elements of our z (()sub)array —the first,,, the last, and the middle Switching to insertion sort for small arrays can save abou t 15% in the z Take the median (middle value) of these three as the pivot running time

z It’s possible (but not easy) to construct cases which will make this technique O(n2)

Mergesort vs Quicksort Mergesort vs Quicksort

z Both run in O(n lgn) In C++, copying objects can be expensive z Compared with Quicksort, Mergesort has while comparing objects often is less number of comparisons but larger relatively cheap. Therefore, quicksort is number of moving elements the sorting routine commonly used in z In Java, an element comparison is C++ libraries expensive but moving elements is cheap. Therefore, Mergesort is used in the standard Java library for generic sorting