Algorithms: Quicksort

Algorithms: QuickSort QuickSort is an algorithm that proports to sort an input array A[lo..hi] of n = hi − lo + 1 elements recursively and in place. Recall that the way we wrote MergeSort segregated the code other than the recursive calls in a companion procedure Merge. The most easily understood presentation of QuickSort uses the same approach, with the companion procedure typically named Partition. Tony Hoare (Sir Charles Antony Richard Hoare) invented QuickSort in 1959 while an exchange student at Moscow State University, but he was only able to get a good implementation after he could program it in ALGOL. QuickSort spread through inclusion in Unix systems. The C standard library has a sorting algorithm qsort, an homage to QuickSort, even though the C standard never required qsort to be an implementation of QuickSort. Pseudo-code for Hoare’s algorithm is algorithm QuickSort(A, lo, hi) if lo < hi then p ← Partition(A, lo, hi) QuickSort(A, lo, p) QuickSort(A, p + 1, hi) fi end algorithm Partition(A, lo, hi) pivot ← A[lo] i ← lo j ← hi while (true) do while (A[i] < pivot) do i ← i + 1 end while while (A[j] > pivot) do j ← j − 1 end while if i ≥ j then return j swap(A[i],A[j]) i ← i + 1 j ← j − 1 end while end Hoare’s partitioning method divides the array into two adjacent subarrays using A[lo] as 1 a pivot. After partitioning it suffices to know that any element in the front subarray is less than or equal to any element in the back subarray to assure recursive calls will complete the sorting. The correctness of Hoare’s partitioning code is not easy to verify due to some subtle points. One natural concern is whether the main loop might be infinite, and hence one must argue that the conditional of the if statement will be satisfied after a finite number of steps (and hence break out of that loop). Toward that end one argues that during any full pass i increases and j decreases. What about the inner loops? Do we need to change the conditions to avoid letting i exceed hi and potentially create an infinite loop? Likewise for j dropping below lo and possibly becoming a runaway loop. The answer is no in both cases, but this requires an argument. On the initial pass we know A[lo] < pivot is false (they are equal). If we come back to this nested loop for a second pass the value of pivot will have been swapped into some position j where j > i and even after incrementing i will be less than or equal to a position beyond which it cannot pass. Can you do the reasoning that j cannot ever drop below lo? To complete the correctness argument one uses induction on a loop invariant for the outer loop such as “all entries in the subarray A[lo..i − 1] are smaller than all entries in the subarray A[j + 1..hi] at the start of the loop”. We now give the implementation most often presented in introductory computer science texts, calling it QSort. QSort expects its partitioning method to return an index that has the pivot in a correct final position and uses recursive calls QSort(A, lo, p − 1) and QSort(A, p + 1, hi). The partitioning normally uses A[hi] as the pivot, rearrangeing entries so that everything smaller than the pivot comes before it and everything larger comes after it (where do ties go? Do we even care?). Again the pivot itself is in position p, the return value of the partitioning code, which is a correct final position. Code for this partitioning method is algorithm Partition(A, lo, hi) pivot ← A[hi] i ← lo − 1 for j ← lo to hi − 1 do if A[j] < pivot then i ← i + 1 swap(A[i],A[j]) fi end for if A[hi] < A[i + 1] then swap(A[i + 1],A[hi]) fi return i + 1 end 2 Most students find it much easier to prove this version is correct since it clearly doesn’t have any infinite loops. It is tempting to think this version might be faster, but in fact it does about three times as many comparisons as Hoare’s method of partitioning. QuickSort has been a very popular sorting algorithm because it is fast for common data sets. Average case complexity is O(n log n), but worst case complexity is O(n2). So Merge- Sort and HeapSort both have better asymptotics since they are true O(n log n) algorithms. The slow cases for QuickSort arise when the partitioning isn’t into more or less equal length subarrays. People often code hybrid algorithms that start as MergeSort or QuickSort, but revert to using InsertionSort for any short subarrays where InsertionSort is faster. This can reduce the depth of the a recursion stack, saving overhead. Recall we said the C standard library algorithm qsort need not be an implementation of QuickSort. Gnu C does not even promise qsort will be an in place algorithm. That flexibility in the standard allows for use of such hybrid algorithms with the potential for better practical performance. To learn more about hybrid sorting algorithms you might look up TimSort, a hybrid sorting algorithm (of Merge- Sort & InsertionSort) written by Tim Peters (2002) for Python. TimSort is also used in Java, GNU Octave and on the Android platform. As with other sorting algorithms we suggest you trace both implementations on, say, A = [37, 23, 61, 74, 85, 13, 41, 53, 71, 19]. How would they work if the 74 were another 23; that is, do ties matter? What if the 74 were another 19 or another 37, the initial pivot values? 3.

Load more