Algorithms ROBERT SEDGEWICK | KEVIN WAYNE Two classic sorting algorithms: mergesort and quicksort
Critical components in the world’s computational infrastructure. ・Full scientific understanding of their properties has enabled us to develop them into practical system sorts. ・Quicksort honored as one of top 10 algorithms of 20th century 2.3 QUICKSORT in science and engineering.
‣ quicksort Mergesort. [last lecture] ‣ selection ... Algorithms ‣ duplicate keys FOURTH EDITION ‣ system sorts Quicksort. [this lecture] ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu ...
2
Quicksort t-shirt
2.3 QUICKSORT
‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts
ROBERT SEDGEWICK | KEVIN WAYNE
http://algs4.cs.princeton.edu
3 number). 9.9 X 10 45 is used to represent infinity. Imaginary conunent I and J are output variables, and A is the array (with Quicksort valuesTony of x may Hoarenot be negative and reM values of x may not be subscript bounds M:N) which is operated upon by this procedure. smaller than 1. Partition takes the value X of a random element of the array A, Values of Qd~'(x) may be calculated easily by hypergeometric and rearranges the values of the elements of the array in such a series if x is not too small nor (n - m) too large. Q~m(x) can be way that there exist integers I and J with the following properties : computed from an appropriate set of values of Pnm(X) if X is near M _-< J < I =< NprovidedM < N 1.0 or ix is near 0. Loss of significant digits occurs for x as small as A[R] =< XforM =< R _-< J Basic plan. 1.1 if n is largerInvented than 10. Loss of significant quicksort digits is a major diffi-to translateA[R] = RussianXforJ < R < I into English. culty・ in using finite polynomiM representations also if n is larger A[R] ~ Xfor I =< R ~ N than m. However, QLEG has been tested in regions of x and n The procedure uses an integer procedure random (M,N) which Shuffle the array. both large and[ butsmall; couldn't explain hischooses algorithmequiprobably a random integer or F implementbetween M and N, and it! ] ・ procedure・ QLEG(m, nmax, x, ri, R, Q); value In, nmax, x, ri; also a procedure exchange, which exchanges the values of its two real In, mnax, x, ri; real array R, Q; parameters ; begin real t, i, n, q0, s; begin real X; integer F; j n := 20; ・Partition so that, for some ・Learned Algol 60 (and recursion).F := random (M,N); X := A[F]; if nmax > 13 then I:=M; J:=N; n := nmax +7; up: for I : = I step 1 until N do – entry a[j] is in place if Implementedri = 0then quicksort. if X < A [I] then go to down; ・ begin ifm = 0then I:=N; Q[0] := 0.5 X 10g((x + 1)/(x - 1)) down: forJ := J step --1 until M do else if A[J]
847 Communications * This workOctober was 1978 supported in part by the Fannie and John Hertz Foundation, and of in part byVolume the National21 Science Foundation Grants No. GJ-28074 and MCS75-23738. the ACM 22 Acta Informatica,Number 10Vol. 7 Quicksort partitioning demo Quicksort partitioning demo
Repeat until i and j pointers cross. Repeat until i and j pointers cross. ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Exchange a[i] with a[j]. ・Exchange a[i] with a[j].
When pointers cross. ・Exchange a[lo] with a[j].
K R A T E L E P U I M Q C X O S E C A I E K L P U T M Q R X O S
lo i j lo j hi
partitioned! 9 10
The music of quicksort partitioning (by Brad Lyon) Quicksort: Java code for partitioning
private static int partition(Comparable[] a, int lo, int hi) { int i = lo, j = hi+1; while (true) {
while (less(a[++i], a[lo])) find item on left to swap if (i == hi) break;
while (less(a[lo], a[--j])) find item on right to swap if (j == lo) break;
if (i >= j) break; check if pointers cross exch(a, i, j); swap }
exch(a, lo, j); swap with partitioning item before v https://googledrive.com/host/0B2GQktu-wcTicjRaRjV1NmRFN1U/index.html return j; return index of item now known to be in place } lo hi
before v during v Յ v Ն v
lo hi i j
after before v during v Յ v Ն v Յ v v Ն v
lo hi i j lo j hi
during v Յ v Ն v after Յ v v Ն v Quicksort partitioning overview 11 12 i j lo j hi
after Յ v v Ն v Quicksort partitioning overview
lo j hi
Quicksort partitioning overview Quicksort quiz 1 Quicksort: Java implementation
Q. How many compares (in the worst case) to partition an array of length N ?
public class Quick A. ~ ¼ N { private static int partition(Comparable[] a, int lo, int hi) B. ~ ½ N { /* see previous slide */ }
C. ~ N public static void sort(Comparable[] a) { StdRandom.shuffle(a); shuffle needed for D. ~ N lg N performance guarantee sort(a, 0, a.length - 1); (stay tuned) } E. I don't know. private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); } }
13 14
Quicksort trace Quicksort animation
50 random items
lo j hi 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 initial values Q U I C K S O R T E X A M P L E random shufe K R A T E L E P U I M Q C X O S 0 5 15 E C A I E K L P U T M Q R X O S 0 3 4 E C A E I K L P U T M Q R X O S 0 2 2 A C E E I K L P U T M Q R X O S 0 0 1 A C E E I K L P U T M Q R X O S 1 1 A C E E I K L P U T M Q R X O S 4 4 A C E E I K L P U T M Q R X O S 6 6 15 A C E E I K L P U T M Q R X O S no partition 7 9 15 A C E E I K L M O P T Q R X U S for subarrays of size 1 7 7 8 A C E E I K L M O P T Q R X U S 8 8 A C E E I K L M O P T Q R X U S 10 13 15 A C E E I K L M O P S Q R T U X 10 12 12 A C E E I K L M O P R Q S T U X 10 11 11 A C E E I K L M O P Q R S T U X 10 10 A C E E I K L M O P Q R S T U X 14 14 15 A C E E I K L M O P Q R S T U X 15 15 A C E E I K L M O P Q R S T U X algorithm position result A C E E I K L M O P Q R S T U X in order Quicksort trace (array contents after each partition) current subarray not in order http://www.sorting-algorithms.com/quick-sort
15 16 Quicksort: implementation details Quicksort: empirical analysis (1961)
Partitioning in-place. Using an extra array makes partitioning easier Running time estimates: (and stable), but is not worth the cost. ・Algol 60 implementation. ・National-Elliott 405 computer. Terminating the loop. Testing whether the pointers cross is trickier than it might seem.
Equal keys. When duplicates are present, it is (counter-intuitively) better to stop scans on keys equal to the partitioning item's key. stay tuned
Preserving randomness. Shuffling is needed for performance guarantee. Equivalent alternative. Pick a random partitioning item in each subarray.
Elliott 405 magnetic disc sorting N 6-word items with 1-word keys (16K words)
17 18
Quicksort: empirical analysis Quicksort: best-case analysis
Running time estimates: Best case. Number of compares is ~ N lg N. ・Home PC executes 108 compares/second. ・Supercomputer executes 1012 compares/second.
initial values
random shuffle insertion sort (N2) mergesort (N log N) quicksort (N log N) computer thousand million billion thousand million billion thousand million billion
home instant 2.8 hours 317 years instant 1 second 18 min instant 0.6 sec 12 min
super instant 1 second 1 week instant instant instant instant instant instant
Lesson 1. Good algorithms are better than supercomputers. Lesson 2. Great algorithms are better than good ones.
19 20 Quicksort: worst-case analysis Quicksort: average-case analysis
2 Worst case. Number of compares is ~ ½ N . Proposition. The average number of compares CN to quicksort an array of
N distinct keys is ~ 2N ln N (and the number of exchanges is ~ ⅓ N ln N ).
initial values Pf. CN satisfies the recurrence C0 = C1 = 0 and for N ≥ 2: left right random shuffle partitioning
C0 + CN 1 C1 + CN 2 CN 1 + C0 C =(N + 1) + + + ... + N N N N ・Multiply both sides by N and collect terms: partitioning probability
NCN = N(N + 1) + 2(C0 + C1 + ... + CN 1) ・Subtract from this equation the same equation for N - 1:
NCN (N 1)CN 1 =2N +2CN 1 ・Rearrange terms and divide by N (N + 1):
CN CN 1 2 = + N +1 N N +1
21 22
Quicksort: average-case analysis Quicksort: summary of performance characteristics
・Repeatedly apply previous equation: Quicksort is a (Las Vegas) randomized algorithm. CCNN CCNN 1 1 22 Guaranteed to be correct. = ++ ・ NN+1+1 NN NN+1+1 Running time depends on random shuffle. CN 2 2 2 ・ = + + substitute previous equation N 1 N N +1 CN 3 2 2 2 Average case. Expected number of compares is . = + + + ~ 1.39 N lg N N 2 N 1 N N +1 39% more compares than mergesort. 2 2 2 2 ・ = + + + ... + 3 4 5 N +1 ・Faster than mergesort in practice because of less data movement.
・Approximate sum by an integral: Best case. Number of compares is ~ N lg N. Worst case. Number of compares is 2. 1 1 1 1 ~ ½ N C = 2(N + 1) + + + ... N 3 4 5 N +1 [ but more likely that lightning bolt strikes computer during execution ] ✓ ◆ N+1 1 2(N + 1) dx ⇠ x Z3
・Finally, the desired result: C 2(N + 1) ln N 1.39N lg N N ⇥ 23 24 Quicksort properties Quicksort: practical improvements
Proposition. Quicksort is an in-place sorting algorithm. Insertion sort small subarrays. Pf. ・Even quicksort has too much overhead for tiny subarrays. ・Partitioning: constant extra space. ・Cutoff to insertion sort for ≈ 10 items. ・Depth of recursion: logarithmic extra space (with high probability).
can guarantee logarithmic depth by recurring on smaller subarray before larger subarray (requires using an explicit stack)
Proposition. Quicksort is not stable. private static void sort(Comparable[] a, int lo, int hi) Pf. [ by counterexample ] { if (hi <= lo + CUTOFF - 1) i j 0 1 2 3 {
B1 C1 C2 A1 Insertion.sort(a, lo, hi); return; 1 3 B1 C1 C2 A1 } int j = partition(a, lo, hi); 1 3 B1 A1 C2 C1 sort(a, lo, j-1); sort(a, j+1, hi); 0 1 A1 B1 C2 C1 }
25 26
Quicksort: practical improvements
Median of sample. ・Best choice of pivot item = median. ・Estimate true median by taking median of sample. ・Median-of-3 (random) items. UICKSORT ~ 12/7 N ln N compares (14% fewer) 2.3 Q ~ 12/35 N ln N exchanges (3% more) ‣ quicksort ‣ selection
private static void sort(Comparable[] a, int lo, int hi) ‣ duplicate keys { Algorithms if (hi <= lo) return; ‣ system sorts
int median = medianOf3(a, lo, lo + (hi - lo)/2, hi); ROBERT SEDGEWICK | KEVIN WAYNE swap(a, lo, median); http://algs4.cs.princeton.edu int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }
27 Selection Quick-select
Goal. Given an array of N items, find the kth smallest item. Partition array so that: Ex. Min (k = 0), max (k = N - 1), median (k = N / 2). ・Entry a[j] is in place. ・No larger entry to the left of j. Applications. ・No smaller entry to the right of j. ・Order statistics. ・Find the "top k." Repeat in one subarray, depending on j; finished when j equals k.
Use theory as a guide. public static Comparable select(Comparable[] a, intbefore k) v { Easy N log N upper bound. How? lo if a[k] is here if a[k] is herehi ・ StdRandom.shuffle(a); set hi to j-1 set lo to j+1 ・Easy N upper bound for k = 1, 2, 3. How? int lo = 0, hi = a.length - 1; during v Յ v Ն v while (hi > lo) ・Easy N lower bound. Why? { i j int j = partition(a, lo, hi); after Յ v v Ն v if (j < k) lo = j + 1; Which is true? else if (j > k) hi = j - 1; lo j hi else return a[k]; N log N lower bound? is selection as hard as sorting? Quicksort partitioning overview ・ } ・N upper bound? is there a linear-time algorithm? return a[k]; }
29 30
Quick-select: mathematical analysis Theoretical context for selection
Proposition. Quick-select takes linear time on average. Proposition. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] Compare-based selection algorithm whose worst-case running time is linear. Pf sketch. Time Bounds for Selection ・Intuitively, each partitioning step splits array approximately in half: bY . N + N / 2 + N / 4 + … + 1 ~ 2N compares. Manuel Blum, Robert W. Floyd, Vaughan Watt, Ronald L. Rive&, and Robert E. Tarjan ・Formal analysis similar to quicksort analysis yields:
CN = 2 N + 2 k ln (N / k) + 2 (N – k) ln (N / (N – k)) Abstract
L The number of comparisons required to select the i-th smallest of
n numbers is shown to be at most a linear function of n by analysis of Ex: (2 + 2 ln 2) N ≈ 3.38 N compares to find median k = N / 2. i ・ a new selection algorithm -- PICK. Specifically, no more than i L 5.4305 n comparisons are ever required. This bound is improved for extreme values of i , and a new lower bound on the requisite number L of comparisons is also proved. Remark. Constants are high ⇒ not used in practice.
Use theory as a guide.
L ・Still worthwhile to seek practical linear-time (worst-case) algorithm. ・Until one is discovered, use quick-select if you don’t need a full sort.
31 32
This work was supported by the National Science Foundation under grants GJ-992 and GJ-33170X.
1 Duplicate keys
Often, purpose of sort is to bring items with equal keys together. ・Sort population by age. ・Remove duplicates from mailing list. ・Sort job applicants by college attended. sorted by time sorted by city (unstable) sorted by city (stable) 2.3 QUICKSORT Chicago 09:00:00 Chicago 09:25:52 Chicago 09:00:00 Typical characteristics of such applications.Phoenix 09:00:03 Chicago 09:03:13 Chicago 09:00:59 Houston 09:00:13 Chicago 09:21:05 Chicago 09:03:13 ‣ quicksort ・Huge array. Chicago 09:00:59 Chicago 09:19:46 Chicago 09:19:32 Houston 09:01:10 Chicago 09:19:32 Chicago 09:19:46 ・Small number of key values. Chicago 09:03:13 Chicago 09:00:00 Chicago 09:21:05 ‣ selection Seattle 09:10:11 Chicago 09:35:21 Chicago 09:25:52 Seattle 09:10:25 Chicago 09:00:59 Chicago 09:35:21 ‣ duplicate keys Phoenix 09:14:25 Houston 09:01:10 Houston 09:00:13 NOT Algorithms Chicago 09:19:32 Houston 09:00:13 Houston 09:01:10 sorted ‣ system sorts Chicago 09:19:46 Phoenix 09:37:44 sorted Phoenix 09:00:03 Chicago 09:21:05 Phoenix 09:00:03 Phoenix 09:14:25 Seattle 09:22:43 Phoenix 09:14:25 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:10:25 Seattle 09:10:11 ROBERT SEDGEWICK | KEVIN WAYNE Chicago 09:25:52 Seattle 09:36:14 Seattle 09:10:25 http://algs4.cs.princeton.edu Chicago 09:35:21 Seattle 09:22:43 Seattle 09:22:43 Seattle 09:36:14 Seattle 09:10:11 Seattle 09:22:54 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:36:14
Stability when sorting on a second key key
34
Duplicate keys: stop on equal keys Quicksort quiz 2
Our partitioning subroutine stops both scans on equal keys. What is the result of partitioning the following array (skip over equal keys)?
scan until > A scan until < A scan until ≥ P scan until ≤ P
A A A A A A A A A A A A A A A A P G E P A Q B P Y C O U P Z S R 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A. A A A A A A A A A A A A A A A A
Q. Why not continue scans on equal keys? B. A A A A A A A A A A A A A A A A scan until > P scan until < P
P G E P A Q B P Y C O U P Z S R C. A A A A A A A A A A A A A A A A
D. I don't know. 35 36 Quicksort quiz 3 Partitioning an array with all equal keys
What is the result of partitioning the following array (stop on equal keys)?
scan until ≥ A scan until ≤ A
A A A A A A A A A A A A A A A A
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A. A A A A A A A A A A A A A A A A
B. A A A A A A A A A A A A A A A A
C. A A A A A A A A A A A A A A A A
D. I don't know. 37 38
Duplicate keys: partitioning strategies DUTCH NATIONAL FLAG PROBLEM
Bad. Don't stop scans on equal keys. [ ~ ½ N 2 compares when all keys equal ] Problem. [Edsger Dijkstra] Given an array of N buckets, each containing a red, white, or blue pebble, sort them by color. B A A B A B B B C C C A A A A A A A A A A A
input
Good. Stop scans on equal keys. sorted [ ~ N lg N compares when all keys equal ] Operations allowed. B A A B A B C C B C B A A A A A A A A A A A ・swap(i, j): swap the pebble in bucket i with the pebble in bucket j. ・color(i): color of pebble in bucket i.
Better. Put all equal keys in place. How? Requirements. [ ~ N compares when all keys equal ] ・Exactly N calls to color(). ・At most N calls to swap(). A A A B B B B B C C C A A A A A A A A A A A ・Constant extra space.
39 40 3-way partitioning Dijkstra 3-way partitioning demo
Goal. Partition array into three parts so that: ・Let v be partitioning item a[lo]. ・Entries between lt and gt equal to the partition item. ・Scan i from left to right. ・No larger entries to left of lt. – (a[i] < v): exchange a[lt] with a[i]; increment both lt and i ・No smaller entries to right of gt. – (a[i] > v): exchange a[gt] with a[i]; decrement gt before v – (a[i] == v): increment i
lo hi before v during
lo hi lt i gt lt i gt during
lt i gt lo lt gt hi P A B X W P P V P D P C Y Z after
3-way partitioning before v invariant Dutch national flag problem. [Edsger Dijkstra] lo hi ・Conventional wisdom until mid 1990s: not worth doing. during
lo lt gt hi
Dijkstra 3-way partitioning demo Dijkstra's 3-way partitioning:3-way trace partitioning
・Let v be partitioning item a[lo]. ・Scan i from left to right. v a[] – (a[i] < v): exchange a[lt] with a[i]; increment both lt and i lt i gt 0 1 2 3 4 5 6 7 8 9 10 11 – (a[i] > v): exchange a[gt] with a[i]; decrement gt 0 0 11 R B W W R W B R R W B R – (a[i] == v): increment i 0 1 11 R B W W R W B R R W B R 1 2 11 B R W W R W B R R W B R 1 2 10 B R R W R W B R R W B W 1 3 10 B R R W R W B R R W B W
lt gt 1 3 9 B R R B R W B R R W W W 2 4 9 B B R R R W B R R W W W 2 5 9 B B R R R W B R R W W W A B C D P P P P P V W Y Z X 2 5 8 B B R R R W B R R W W W 2 5 7 B B R R R R B R W W W W
lo hi 2 6 7 B B R R R R B R W W W W 3 7 7 B B B R R R R R W W W W before v invariant 3 8 7 B B B R R R R R W W W W lo hi 3 8 7 B B B R R R R R W W W W during
lt i gt
after
lo lt gt hi
3-way partitioning 3-way quicksort: Java implementation 3-way quicksort: visual trace
private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int lt = lo, gt = hi; Comparable v = a[lo]; int i = lo; while (i <= gt) { int cmp = a[i].compareTo(v); if (cmp < 0) exch(a, lt++, i++); else if (cmp > 0) exch(a, i, gt--); equal to partitioning element else i++; } before v
sort(a, lo, lt - 1); lo hi
sort(a, gt + 1, hi); during
after
lo lt gt hi
3-way partitioning 45 46
Duplicate keys: lower bound Sorting summary
Sorting lower bound. If there are n distinct keys and the ith one occurs xi times, any compare-based sorting algorithm must use at least inplace? stable? best average worst remarks n N! xi lg x lg N lg N when all distinct; x ! x ! x ! ⇤ i N selection ✔ ½ N 2 ½ N 2 ½ N 2 N exchanges 1 2 n i=1 linear when only a constant number of distinct keys ··· ⇥ ⇤ compares in the worst case. use for small N insertion ✔ ✔ N ¼ N 2 ½ N 2 or partially ordered
tight code; shell ✔ N log N ? c N 3/2 3 subquadratic
N log N guarantee; merge ✔ ½ N lg N N lg N N lg N Proposition. [Sedgewick-Bentley 1997] proportional to lower bound stable Quicksort with 3-way partitioning is entropy-optimal. improves mergesort timsort ✔ N N lg N N lg N when preexisting order Pf. [beyond scope of course] N log N probabilistic guarantee; quick ✔ N lg N 2 N ln N ½ N 2 fastest in practice
improves quicksort 3-way quick ✔ N 2 N ln N ½ N 2 Bottom line. Quicksort with 3-way partitioning reduces running time when duplicate keys from linearithmic to linear in broad class of applications. ? ✔ ✔ N N lg N N lg N holy sorting grail
47 48 Sorting applications
Sorting algorithms are essential in a broad variety of applications: ・Sort a list of names. ・Organize an MP3 library. obvious applications ・Display Google PageRank results. 2.3 QUICKSORT ・List RSS feed in reverse chronological order.
‣ quicksort ・Find the median. Identify statistical outliers. problems become easy once items ‣ selection ・ are in sorted order Binary search in a database. ‣ duplicate keys ・ Algorithms ・Find duplicates in a mailing list. ‣ system sorts ・Data compression. ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu ・Computer graphics. non-obvious applications ・Computational biology. ・Load balancing on a parallel computer. . . .
50
War story (system sort in C) War story (system sort in C)
A beautiful bug report. [Allan Wilks and Rick Becker, 1991] Bug. A qsort() call that should have taken seconds was taking minutes.
We found that qsort is unbearably slow on "organ-pipe" inputs like "01233210": Why is qsort() so slow? main (int argc, char**argv) { int n = atoi(argv[1]), i, x[100000]; for (i = 0; i < n; i++) x[i] = i; for ( ; i < 2*n; i++) x[i] = 2*n-i-1; qsort(x, 2*n, sizeof(int), intcmp); } At the time, almost all qsort() implementations based on those in: Here are the timings on our machine: Version 7 Unix (1979): quadratic time to sort organ-pipe arrays. $ time a.out 2000 ・ real 5.85s ・BSD Unix (1983): quadratic time to sort random arrays of 0s and 1s. $ time a.out 4000 real 21.64s $time a.out 8000 real 85.11s
51 52 Engineering a system sort (in 1993) A beautiful mailing list post (Yaroslavskiy, September 2009)
Bentley-McIlroy quicksort. samples 9 items Replacement of quicksort in java.util.Arrays with new dual-pivot quicksort ・Cutoff to insertion sort for small subarrays. ・Partitioning item: median of 3 or Tukey's ninther. Hello All, ・Partitioning scheme: Bentley-McIlroy 3-way partitioning. I'd like to share with you new Dual-Pivot Quicksort which is faster than the known implementations (theoretically and experimental). I'd like to propose to replace the JDK's Quicksort implementation by new one. similar to Dijkstra 3-way partitioning SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(11), 1249–1265(but fewer (NOVEMBER exchanges 1993) when not many equal keys) ...
The new Dual-Pivot Quicksort uses *two* pivots elements in this manner: Engineering a Sort Function 1. Pick an elements P1, P2, called pivots from the array.
JON L. BENTLEY 2. Assume that P1 <= P2, otherwise swap it. M. DOUGLAS McILROY AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, U.S.A. 3. Reorder the array into three parts: those less than the smaller pivot, those larger than the larger pivot, and in between are those elements between (or equal to) the two pivots. SUMMARY 4. Recursively sort the sub-arrays. We recount the history of a new qsort function for a C library. Our function is clearer, faster and more robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a novel solution to Dijkstra’s Dutch National Flag problem; and it swaps efficiently. Its behavior was assessed with timing and debugging testbeds, and with a program to certify performance. The design The invariant of the Dual-Pivot Quicksort is: techniques apply in domains beyond sorting.
KEY WORDS Quicksort Sorting algorithms Performance tuning Algorithm design and implementation Testing [ < P1 | P1 <= & <= P2 } > P2 ]
INTRODUCTION C libraries have long included a qsort function to sort an array, usually implemented by ... Very widelyHoare’s used. Quicksort. C,1 BecauseC++, existing Javaqsort 6,s are…. flawed, we built a new one. This paper summarizes its evolution. Compared to existing library sorts, our new qsort is faster—typically about twice as http://mail.openjdk.java.net/pipermail/core-libs-dev/2009-September/002630.html fast—clearer, and more robust under nonrandom inputs. It uses some standard Quicksort 53 54 tricks, abandons others, and introduces some new tricks of its own. Our approach to build- ing a qsort is relevant to engineering other algorithms. The qsort on our home system, based on Scowen’s ‘Quickersort’,2 had served faith- fully since Lee McMahon wrote it almost two decades ago. Shipped with the landmark Sev- enth Edition Unix System,3 it became a model for other qsorts. Yet in the summer of 1991 our colleagues Allan Wilks and Rick Becker found that a qsort run that should have A beautifultaken mailing a few minutes list was chewingpost up (Yaroslavskiy-Bloch-Bentley, hours of CPU time. Had they not interrupted it, it October 2009) Dual-pivot quicksort would have gone on for weeks.4 They found that it took n 2 comparisons to sort an ‘organ- pipe’ array of 2n integers: 123..nn.. 321. Shopping around for a better qsort, we found that a qsort written at Berkeley in 1983 would consume quadratic time on arrays that contain a few elements repeated many times—in particular arrays of random zeros and ones.5 In fact, among a dozen different Use two partitioning items p1 and p2 and partition into three subarrays: ReplacementUnix librariesof quicksort we found no inqsort java.util.Arraysthat could not easily bewith driven new to quadratic dual-pivot behavior; all quicksort were derived from the Seventh Edition or from the 1983 Berkeley function. The Seventh ・Keys less than p1. Date: Thu, 29 Oct 2009 11:19:39 +0000 0038-0644/93/111249–17$13.50 Received 21 August 1992 ・Keys between p1 and p2. Subject: Replace 1993 by Johnquicksort Wiley & Sons, in Ltd. java.util.Arrays withRevised dual-pivot 10 May 1993 implementation ・Keys greater than p2. Changeset: b05abb410c52 Author: alanb Date: 2009-10-29 11:18 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/b05abb410c52 < p1 p1 ≥ p1 and ≤ p2 p2 > p2 6880672: Replace quicksort in java.util.Arrays with dual-pivot implementation Reviewed-by: jjb Contributed-by: vladimir.yaroslavskiy at sun.com, joshua.bloch at google.com, lo lt gt hi jbentley at avaya.com
! make/java/java/FILES_java.gmk ! src/share/classes/java/util/Arrays.java + src/share/classes/java/util/DualPivotQuicksort.java Recursively sort three subarrays.
http://mail.openjdk.java.net/pipermail/compiler-dev/2009-October.txt
degenerates to Dijkstra's 3-way partitioning
Note. Skip middle subarray if p1 = p2.
55 56 Dual-pivot partitioning demo Dual-pivot partitioning demo
Initialization. Main loop. Repeat until i and gt pointers cross. ・Choose a[lo] and a[hi] as partitioning items. ・If (a[i] < a[lo]), exchange a[i] with a[lt] and increment lt and i. ・Exchange if necessary to ensure a[lo] ≤ a[hi]. ・Else if (a[i] > a[hi]), exchange a[i] with a[gt] and decrement gt. ・Else, increment i.
< p1 p1 ≥ p1 and ≤ p2 ? p2 > p2
lo lt i gt hi
S E A Y R L F V Z Q T C M K K E A F R L M C Z Q T V Y S
lo hi lo lt i gt hi
exchange a[lo] and a[hi]
Dual-pivot partitioning demo Dual-pivot quicksort
Finalize. Use two partitioning items p1 and p2 and partition into three subarrays: ・Exchange a[lo] with a[--lt]. ・Keys less than p1. ・Exchange a[hi] with a[++gt]. ・Keys between p1 and p2. ・Keys greater than p2.
< p1 p1 ≥ p1 and ≤ p2 p2 > p2 < p1 p1 ≥ p1 and ≤ p2 p2 > p2
lo lt gt hi lo lt gt hi
C E A F K L M R Q S Z V Y T
lo lt gt hi Now widely used. Java 7, Python unstable sort, Android, …
3-way partitioned 60 Three-pivot quicksort Quicksort quiz 4
Use three partitioning items p1, p2, and p3 and partition into four subarrays: Why do 2-pivot (and 3-pivot) quicksort perform better than 1-pivot?
・Keys less than p1. A. Fewer compares. ・Keys between p1 and p2. B. Fewer exchanges. ・Keys between p2 and p3. ・Keys greater than p3. C. Fewer cache misses. D. I don't know. < p1 p1 ≥ p1 and ≤ p2 p2 ≥ p2 and ≤ p3 p3 > p3
lo a1 a2 a3 hi
Multi-Pivot Quicksort: Theory and Experiments
Shrinu Kushagra Alejandro L´opez-Ortiz J. Ian Munro [email protected] [email protected] [email protected] University of Waterloo University of Waterloo University of Waterloo Aurick Qiao [email protected] University of Waterloo November 7, 2013
Abstract In 2009, Vladimir Yaroslavskiy introduced a novel 61 62 The idea of multi-pivot quicksort has recently received dual-pivot partitioning algorithm. When run on a bat- the attention of researchers after Vladimir Yaroslavskiy tery of tests under the JVM, it outperformed the stan- proposed a dual pivot quicksort algorithm that, con- dard quicksort algorithm [10]. In the subsequent re- trary to prior intuition, outperforms standard quicksort lease of Java 7, the internal sorting algorithm was re- by a a significant margin under the Java JVM [10]. More placed by Yaroslavskiy’s variant. Three years later, recently, this algorithm has been analysed in terms of Wild and Nebel [9] published a rigorous average-case Quicksort comparisonsquiz 4 and swaps by Wild and Nebel [9]. Our con- analysis of the algorithm. They stated that the previ- Which sorting algorithm to use? tributions to the topic are as follows. First, we perform ous lower bound relied on assumptions that no longer the previous experiments using a native C implementa- hold in Yaroslavskiy’s implementation. The dual pivot tion thus removing potential extraneous e↵ects of the approach actually uses less comparisons (1.9n ln n vs JVM. Second, we provide analyses on cache behavior 2.0n ln n) on average. However, the di↵erence in run- Why do 2-pivotof these algorithms.(and We3-pivot) then provide strong quicksort evidence time isperform much greater than better the di↵erence inthan number of1-pivot? Many sorting algorithms to choose from: that cache behavior is causing most of the performance comparisons. We address this issue and provide an ex- planation in 5. di↵erences in these algorithms. Additionally, we build § upon prior work in multi-pivot quicksort and propose Aum¨uller and Dietzfelbinger [1] (ICALP2013) have A. Fewera 3-pivot compares. variant that performs very well in theory and recently addressed the following question: If the previ- practice. We show that it makes fewer comparisons and ous lower bound does not hold, what is really the best has better cache behavior than the dual pivot quicksort we can do with two pivots? They prove a 1.8n ln n lower in the expected case. We validate this with experimen-# entriesbound scanned on the number is ofa comparisonsgood proxy for all dual-pivot sorts algorithms B. Fewer exchanges. quicksort algorithms and introduced an algorithm that tal results, showing a 7-8% performance improvementfor cache performance when in our tests. actually achieves that bound. In their experimentation, comparingthe algorithm quicksort is outperformed variants by Yaroslavskiy’s quick- 1 Introduction sort when sorting integer data. However, their algo- C. Fewer cache misses. rithm does perform better with large data (eg. strings) elementary sorts insertion sort, selection sort, bubblesort, shaker sort, ... 1.1 Background Up until about a decade ago it was since comparisons incur high cost. thought that the classic quicksort algorithm [3] using one pivot is superior to any multi-pivot scheme. It was 1.2 The Processor-Memory Performance Gap previously believed that using more pivots introduces Both presently and historically, the performance of CPU quicksort, mergesort, heapsort, shellsort, samplesort, ... too much overhead for the advantages gained. In 2002, subquadratic sorts registers have far outpaced that of main memory. Ad- Sedgewickpartitioning and Bentley [7] recognised andcompares outlined some exchanges entries scanned
Downloaded 02/07/14 to 96.248.80.75. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php ditionally, this performance gap between the processor of the advantages to a dual-pivot quicksort. However, and memory has been increasing since their introduc- the implementation did not perform as well as the classic tion. Every year, the performance of memory improves quicksort algorithm [9] and this path was not explored system sorts dual-pivot quicksort, timsort, introsort, ... by about 10% while that of the processor improves by again until recent years. 1-pivot 2 N ln N 60% [5]. The performance0.333 N di ↵lnerence N grows so quickly 2 N ln N
external sorts Poly-phase mergesort, cascade-merge, psort, .… 47 Copyright © 2014. median-of-3 1.714 N ln N by the Society0.343 for Industrial N ln andN Applied Mathematics.1.714 N ln N
MSD, LSD, 3-way radix quicksort, … 2-pivot 1.9 N ln N 0.6 N ln N 1.6 N ln N radix sorts
3-pivot 1.846 N ln N 0.616 N ln N 1.385 N ln N parallel sorts bitonic sort, odd-even sort, smooth sort, GPUsort, ...
Reference: Analysis of Pivot Sampling in Dual-Pivot Quicksort by Wild-Nebel-Martínez
Bottom line. Caching can have a significant impact on performance.
beyond scope of this course 63 64 Which sorting algorithm to use? System sort in Java 7
Applications have diverse attributes. Arrays.sort(). ・Stable? ・Has method for objects that are Comparable. ・Parallel? ・Has overloaded method for each primitive type. ・In-place? ・Has overloaded method for use with a Comparator. ・Deterministic? ・Has overloaded methods for sorting subarrays. ・Duplicate keys? ・Multiple key types? Algorithms. ・Linked list or arrays? ・Dual-pivot quicksort for primitive types. Large or small items? Timsort for reference types. ・ many more combinations of ・ ・Randomly-ordered array? attributes than algorithms ・Guaranteed performance? Q. Why use different algorithms for primitive and reference types?
Q. Is the system sort good enough? A. Usually.
65 66
Ineffective sorts
http://xkcd.com/1185 67