
Mergesort and Quicksort Lecture 3: Efficient Sorts Two great sorting algorithms. Full scientific understanding of their properties has enabled us to hammer them into practical system sorts. Occupies a prominent place in world's computational infrastructure. Quicksort honored as one of top 10 algorithms for science and engineering of 20th century. Mergesort Quicksort Mergesort. Analysis of Algorithms Java Arrays sort for type Object. Java Collections sort. Perl stable, Python stable. Quicksort. Java Arrays sort for primitive types. C qsort, Unix, g++, Visual C++, Perl, Python. Princeton University • COS 226 • Algorithms and Data Structures • Spring 2004 • Kevin Wayne • http://www.Princeton.EDU/~cos226 2 Sorting Applications Estimating the Running Time Applications. Total running time is sum of cost P frequency for all of the basic ops. Sort a list of names. obvious applications Cost depends on machine, compiler. Organize an MP3 library. Frequency depends on algorithm, input. Display Google PageRank results. Find the median. Cost for sorting. Find the closest pair. A = # function calls. Binary search in a database. problems become easy once B = # exchanges. Identify statistical outliers. items are in sorted order Find duplicates in a mailing list. C = # comparisons. Cost on a typical machine = 35A + 11B + 4C. Data compression. Computer graphics. Frequency of sorting ops. Computational biology. Supply chain management. non-obvious applications N = # elements to sort. Simulate a system of particles. Selection sort: A = 1, B = N-1, C = N(N-1) / 2. Book recommendations on Amazon. Donald Knuth Load balancing on a parallel computer. 3 4 Estimating the Running Time Big Oh Notation An easier alternative. Big Theta, Oh, and Omega notation. (i) Analyze asymptotic growth as a function of input size N. 2 2 2 2 1.5 (ii) For medium N, run and measure time. (N ) means { N , 17N , N + 17N + 3N, . } (iii) For large N, use (i) and (ii) to predict time. – ignore lower order terms and leading coefficients 2 2 2 2 1.5 1.5 Asymptotic growth rates. O(N ) means { N , 17N , N + 17N + 3N, N , 100N, . } 2 Estimate as a function of input size N. – (N ) and smaller – N, N log N, N2, N3, 2N, N! – use for upper bounds Ignore lower order terms and leading coefficients. 3 2 3 2 2 2 2 1.5 3 5 – Ex. 6N + 17N + 56 is asymptotically proportional to N (N ) means { N , 17N , N + 17N + 3N, N , 100N , . } – (N2) and larger – use for lower bounds Never say: insertion sort makes at least O(N2) comparisons. 5 6 Estimating the Running Time Why It Matters Insertion sort is quadratic. 2 Run time in N / 4 - N / 4 comparisons on average. 1.3 N3 10 N2 47 N log N 48 N nanoseconds --> 2 2 (N ). 1000 1.3 seconds 10 msec 0.4 msec 0.048 msec On arizona: 1 second for N = 10,000. Time to 10,000 22 minutes 1 second 6 msec 0.48 msec solve a 100,000 15 days 1.7 minutes 78 msec 4.8 msec How long for N = 100,000? 100 seconds (100 times as long) problem N = 1 million? 2.78 hours (another factor of 100) of size million 41 years 2.8 hours 0.94 seconds 48 msec 6 10 million 41 millennia 1.7 weeks 11 seconds 0.48 seconds N = 1 billion? 317 years (another factor of 10 ) N = 1 trillion? second 920 10,000 1 million 21 million Max size problem minute 3,600 77,000 49 million 1.3 billion solved hour 14,000 600,000 2.4 trillion 76 trillion in one day 41,000 2.9 million 50 trillion 1,800 trillion N multiplied by 10, 1,000 100 10+ 10 time multiplied by Reference: More Programming Pearls by Jon Bentley 7 8 Orders of Magnitude Mergesort Mergesort (divide-and-conquer) Meters Per Imperial Seconds Equivalent Example Second Units Divide array into two halves. 1 1 second 10-10 1.2 in / decade Continental drift Recursively sort each half. 10 10 seconds 10-8 1 ft / year Hair growing 102 1.7 minutes Merge two halves to make sorted whole. 10-6 3.4 in / day Glacier 103 17 minutes 10-4 1.2 ft / hour Gastro-intestinal tract 104 2.8 hours 10-2 2 ft / minute Ant 105 1.1 days 1 2.2 mi / hour Human walk 106 1.6 weeks 102 220 mi / hour Propeller airplane 107 3.8 months 104 370 mi / min Space shuttle 108 3.1 years A L G O R I T H M S 106 620 mi / sec Earth in galactic orbit 109 3.1 decades 108 62,000 mi / sec 1/3 speed of light 1010 3.1 centuries A L G O R I T H M S divide . forever 210 thousand Powers age of 220 million sort 1017 of 2 A G L O R H I M S T universe 230 billion A G H I L M O R S T merge Reference: More Programming Pearls by Jon Bentley 9 12 Mergesort Implementation in Java Mergesort Analysis Stability? Yes, if underlying merge is stable. public static void mergesort(Comparable[] a, int low, int high) { Comparable temp[] = new Comparable[a.length]; for (int i = 0; i < a.length; i++) temp[i]=a[i]; How much memory does array implementation of mergesort require? mergesort(temp, a, low, high); } Original input = N. Auxiliary array for merging = N. private static void mergesort(Comparable[] from, Comparable[] to, int low, int high) { Local variables: constant. if (high <= low) return; Function call stack: log2 N. int mid =(low + high)/2; Total = 2N + O(log N). mergesort(to, from, low, mid); mergesort(to, from, mid+1, high); How much memory do other sorting algorithms require? int p = low, q = mid+1; for(int i = low; i <= high; i++) { N + O(1) for insertion sort, selection sort, bubble sort. if (q > high) to[i]=from[p++]; In-place = N + O(log N). else if (p > mid) to[i]=from[q++]; else if (less(from[q], from[p])) to[i]=from[q++]; else to[i]=from[p++]; } } 13 14 Mergesort Analysis Proof by Picture of Recursion Tree How long does mergesort take? $ 0 if N 1 ' Bottleneck = merging (and copying). T(N ) % 2T (N /2) N otherwise ' – merging two files of size N/2 requires ? N comparisons & sorting both halves merging T(N) = comparisons to mergesort N elements. – assume N is a power of 2 T(N) N – assume merging requires exactly N comparisons $ 0 if N 1 T(N/2) T(N/2) 2(N/2) ' T(N ) % 2T (N /2) N otherwise ' & sorting both halves merging T(N/4) T(N/4) T(N/4) T(N/4) 4(N/4) log2N . including already sorted T(N / 2k) 2k (N / 2k) Claim. T(N) = N log2 N. Note: same number of comparisons for ANY file. T(2) T(2) We'll give several proofs to illustrate standard techniques. T(2) T(2) T(2) T(2) T(2) T(2) N/2 (2) Nlog2N 15 16 Proof by Telescoping Mathematical Induction Claim. If T(N) satisfies this recurrence, then T(N) = N log2 N. Mathematical induction. Powerful and general proof technique in discrete mathematics. $ 0 if N 1 assumes N is a power of 2 To prove a theorem true for all integers k O 0: ' T(N ) % 2T(N /2) N otherwise – base case: prove it to be true for N = 0 ' merging & sorting both halves – induction hypothesis: assuming it is true for arbitrary N – induction step: show it is true for N + 1 T (N ) 2T (N /2) Proof. For N > 1: 1 N N Claim: 0 + 1 + 2 + 3 + . + N = N(N+1) / 2 for all N O 0. T (N /2) 1 Proof: (by mathematical induction) N /2 Base case (N = 0). T (N / 4) 1 1 – 0 = 0(0+1) / 2. N / 4 Induction hypothesis: assume 0 + 1 + 2 + . + N = N(N+1) / 2 T (N /N ) 1 1 Induction step: 0 + 1 + . + N + N + 1 = (0 + 1 + . + N) + N+1 N /N log2 N = N (N+1) /2 + N+1 log2 N = (N+2)(N+1) / 2 17 18 Proof by Induction Proof by Induction Claim. If T(N) satisfies this recurrence, then T(N) = N log2 N. Q. What if N is not a power of 2? Q. What if merging takes at most N comparisons instead of exactly N? $ 0 if N 1 assumes N is a power of 2 ' T(N ) % 2T (N /2) N otherwise A. T(N) satisfies following recurrence. ' & sorting both halves merging $ 0 if N 1 ' T(N ) ? % T !1N /2 T #3N /2 N otherwise ' & solve left half solve right half merging Proof. (by induction on N) Base case: N = 1. Inductive hypothesis: T(N) = N log2 N. Claim. T(N) ? N !log N1. 2 Goal: show that T(2N) = 2N log2 (2N). Proof. Challenge for the bored. T (2N ) 2T (N ) 2N 2N log2 N 2N 2N log2(2N ) 1 2N 2N log2(2N ) 19 20 Mergesort: Practical Improvements Sorting By Different Fields Eliminate recursion. Bottom-up mergesort. Sedgewick Program 8.5 Design challenge: enable sorting students by email or section. Stop if already sorted. // sort by email 1 Anand Dharan adharan Is biggest element in first half ? smallest element in second half? Student.setSortKey(Student.EMAIL); 1 Ashley Evans amevans 1 Alicia Myers amyers Helps for nearly ordered lists. ArraySort.mergesort(students, 0, N-1); 1 Arthur Shum ashum 1 Amy Trangsrud atrangsr // then by precept 1 Bryant Chen bryantc Insertion sort small files.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-