CS221: Algorithms and Data Structures Lecture #4 Sorting

Today’s Outline CS221: Algorithms and • Categorizing/Comparing Sorting Algorithms Data Structures – PQSorts as examples Lecture #4 • MergeSort Sorting Things Out • QuickSort • More Comparisons Steve Wolfman • Complexity of Sorting 2014W1 1 2 Comparing our “PQSort” Categorizing Sorting Algorithms Algorithms • Computational complexity • Computational complexity – Average case behaviour: Why do we care? – Selection Sort: Always makes n passes with a – Worst/best case behaviour: Why do we care? How “triangular” shape. Best/worst/average case (n2) often do we re-sort sorted, reverse sorted, or “almost” – Insertion Sort: Always makes n passes, but if we’re sorted (k swaps from sorted where k << n) lists? lucky and search for the maximum from the right, only • Stability: What happens to elements with identical keys? constant work is needed on each pass. Best case (n); worst/average case: (n2) How much extra memory is used? • Memory Usage: – Heap Sort: Always makes n passes needing O(lg n) on each pass. Best/worst/average case: (n lg n). Note: best cases assume distinct elements. 3 With identical elements, Heap Sort can get (n) performance.4 Comparing our “PQSort” Insertion Sort Best Case Algorithms 0 1 2 3 4 5 6 7 8 9 10 11 12 13 • Stability 1 2 3 4 5 6 7 8 9 10 11 12 13 14 – Selection: Easily made stable (when building from the PQ left, prefer the left-most of identical “biggest” keys). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 – Insertion: Easily made stable (when building from the left, find the rightmost slot for a new element). PQ – Heap: Unstable 1 2 3 4 5 6 7 8 9 10 11 12 13 14 • Memory use: All three are essentially “in-place” algorithms with small O(1) extra space requirements. PQ • Cache access: Not detailed in 221, but… algorithms that 1 2 3 4 5 6 7 8 9 10 11 12 13 14 don’t “jump around” tend to perform better in modern 5 memory systems. Which of these “jumps around”? 6 If we search from the right: constant time per pass! PQ But note: there’s a trick to make any sort stable. 1 Comparison of growth... Today’s Outline nlgn • Categorizing/Comparing Sorting Algorithms T(n)=100 2 n – PQSorts as examples • MergeSort n • QuickSort • More Comparisons • Complexity of Sorting n=100 Reminder: asymptotic differences matter, but a factor of lg n doesn’t matter as much as a factor of n. 7 8 MergeSort MergeSort Algorithm Mergesort belongs to a class of algorithms known as 1. If the array has 0 or 1 elements, it’s sorted. Else… “divide and conquer” algorithms (your recursion 2. Split the array into two halves sense should be tingling here...). 3. Sort each half recursively (i.e., using mergesort) The problem space is continually split in half, 4. Merge the sorted halves to produce one sorted result: recursively applying the algorithm to each half until the base case is reached. 1. Consider the two halves to be queues. 2. Repeatedly compare the fronts of the queues. Whichever is smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result. 9 10 MergeSort Performance Analysis MergeSort Performance Analysis 1. If the array has 0 or 1 elements, it’s sorted. Else… T(1) = 1 T(1) = 1 2. Split the array into two halves T(n) = 2T(n/2) + n = 4T(n/4) + 2(n/2) + n 3. Sort each half recursively (i.e., using mergesort) 2*T(n/2) = 8T(n/8) + 4(n/4) + 2(n/2) + n 4. Merge the sorted halves to produce one sorted result: n = 8T(n/8) + n + n + n = 8T(n/8) + 3n 1. Consider the two halves to be queues. = 2iT(n/2i) + in. 2. Repeatedly compare the fronts of the queues. Whichever is Let i = lg n smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result. T(n) = nT(1) + n lg n = n + n lg n (n lg n) We ignored floors/ceilings. To prove performance formally, we’d use 11 12 this as a guess and prove it with floors/ceilings by induction. 2 Try it on this array! Try it on this array! 3 -4 3 5 9 1 2 6 3 -4 3 5 9 1 2 6 3 -4 3 5 9 1 2 6 3 -4 3 5 9 1 2 6 3 -4 3 5 9 1 2 6 * -4 3 3 5 1 9 2 6 ** Where does -4 3 3 5 1 2 6 9 the red 3 go? -4 1 2 3 3 5 6 9 -4 1 2 3 3 5 6 9 13 14 Mergesort (by Jon Bentley): Merge (by Jon Bentley): void msort(int x[], int lo, int hi, int tmp[]) { if (lo >= hi) return; void merge(int x[],int lo,int mid,int hi, int mid = (lo+hi)/2; int tmp[]) msort(x, lo, mid, tmp); { msort(x, mid+1, hi, tmp); int a = lo, b = mid+1; merge(x, lo, mid, hi, tmp); for( int k = lo; k <= hi; k++ ) } { if( a <= mid && (b > hi || x[a] < x[b]) ) void mergesort(int x[], int n) { tmp[k] = x[a++]; int *tmp = new int[n]; else tmp[k] = x[b++]; msort(x, 0, n-1, tmp); } delete[] tmp; for( int k = lo; k <= hi; k++ ) } x[k] = tmp[k]; } 15 16 Elegant in one sense… but not how I’d write it. Today’s Outline QuickSort • Categorizing/Comparing Sorting Algorithms In practice, one of the fastest sorting algorithms is – PQSorts as examples Quicksort, developed in 1961 by C.A.R. Hoare. • MergeSort Comparison-based: examines elements by • QuickSort comparing them to other elements • More Comparisons Divide-and-conquer: divides into “halves” (that may be very unequal) and recursively sorts • Complexity of Sorting 17 18 3 QuickSort algorithm Partitioning • The act of splitting up an array according to the • Pick a pivot pivot is called partitioning • Reorder the list such that all elements < pivot are • Consider the following: on the left, while all elements pivot are on the -4 1 -3 2 3 5 4 7 right pivot • Recursively sort each side left partition right partition Are we missing a base 19case? 20 QuickSort (by Jon Bentley): QuickSort Visually void qsort(int x[], int lo, int hi) { int i, p; if (lo >= hi) return; p = lo; P for( i=lo+1; i <= hi; i++ ) if( x[i] < x[lo] ) swap(x[++p], x[i]); swap(x[lo], x[p]); P P P qsort(x, lo, p-1); qsort(x, p+1, hi); } P P P P void quicksort(int x[], int n) { Sorted! qsort(x, 0, n-1); } 21 22 Elegant in one sense… but not how I’d write it. QuickSort Example (using intuitive algorithm, not Bentley’s) (Pick first element as pivot, “scoot” elements to left/right.) 2 -4 6 1 5 -3 3 7 QuickSort: Complexity • Recall that Quicksort is comparison based – Thus, the operations are comparisons • In our partitioning task, we compared each element to the pivot – Thus, the total number of comparisons is N – As with MergeSort, if one of the partitions is about half (or any constant fraction of) the size of the array, complexity is (n lg n). • In the worst case, however, we end up with a partition with a 1 and n-1 split 23 24 4 QuickSort Visually: Worst case QuickSort: Worst Case • In the overall worst-case, this happens at every P step… – Thus we have N comparisons in the first step P – N-1 comparisons in the second step – N-2 comparisons in the third step – : P n(n 1) n 2 n n ( n 1) 2 1 2 2 2 P – …or approximately n2 25 26 QuickSort: Average Case QuickSort: Average Case (Intuition) (Intuition) • Clearly pivot choice is important • Let’s assume that pivot choice is random – It has a direct impact on the performance of the sort – Half the time the pivot will be from the centre half of – Hence, QuickSort is fragile, or at least “attackable” the array • So how do we pick a good pivot? – Thus at worst the split will be n/4 and 3n/4 27 28 Quicksort Complexity: QuickSort: Average Case How does it compare? (Intuition) Insertion • We can apply this to the notion of a good split N Quicksort – Every “good” split: 2 partitions of size n/4 and 3n/4 Sort • Or divides N by 4/3 4.1777 10,000 0.05 sec – Hence, we make up to log4/3(N) splits sec • Expected # of partitions is at most 2 * log4/3(N) – O(lgN) 20,000 20.52 sec 0.11 sec • Given N comparisons at each partitioning step, we 4666 sec have (N lg N) 300,000 2.15 sec (1.25 hrs) 29 30 5 How Do Quick, Merge, Heap, Insertion, Today’s Outline and Selection Sort Compare? • Categorizing/Comparing Sorting Algorithms Complexity – PQSorts as examples – Best case: Insert < Quick, Merge, Heap < Select • MergeSort – Average case: Quick, Merge, Heap < Insert, Select • QuickSort – Worst case: Merge, Heap < Quick, Insert, Select – Usually on “real” data: Quick < Merge < Heap < I/S (not asymptotic) • More Comparisons – On very short lists: quadratic sorts may have an • Complexity of Sorting advantage (so, some quick/merge implementations “bottom out” to these as base cases) Some details depend on implementation! (E.g., an initial check whether the last elt of the left sublist is less 31 than first of the right can make merge’s best case linear.)32 How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare? Today’s Outline Stability • Categorizing/Comparing Sorting Algorithms – Easily Made Stable: Insert, Select, Merge (prefer the – PQSorts as examples “left” of the two sorted sublists on ties) • MergeSort – Unstable: Heap • QuickSort – Challenging to Make Stable: Quick • More Comparisons • Memory use: – Insert, Select, Heap < Quick < Merge • Complexity of Sorting How much stack space does recursive QuickSort use? 33 34 In the worst case? Could we make it better? Complexity of Sorting Using Complexity of Sorting Using Comparisons as a Problem Comparisons as a Problem Each comparison is a “choice point” in the algorithm.

CS221: Algorithms and Data Structures Lecture #4 Sorting

Computational Complexity and Intractability: an Introduction to the Theory of NP Chapter 9 2 Objectives

Algorithms and Computational Complexity: an Overview

A Short History of Computational Complexity

Interactions of Computational Complexity Theory and Mathematics

Computational Complexity: a Modern Approach

Quantum Computational Complexity Theory Is to Un- Derstand the Implications of Quantum Physics to Computational Complexity Theory

Computational Complexity

Foundations of Quantum Computing and Complexity

On the Computational Complexity of Algorithms by J

A Machine-Checked Proof of the Average-Case Complexity of Quicksort in Coq

On the Computational Complexity of Mathematical Functions

Quantum Computational Complexity