Algorithms ROBERT SEDGEWICK | KEVIN WAYNE Two classic sorting : mergesort and

Critical components in the world’s computational infrastructure. ・Full scientific understanding of their properties has enabled us to develop them into practical system sorts. ・Quicksort honored as one of top 10 algorithms of 20th century 2.3 QUICKSORT in science and engineering.

‣ quicksort Mergesort. [last lecture] ‣ selection ... Algorithms ‣ duplicate keys FOURTH EDITION ‣ system sorts Quicksort. [this lecture] ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu ...

2

Quicksort t-shirt

2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE

http://algs4.cs.princeton.edu

3 number). 9.9 X 10 45 is used to represent infinity. Imaginary conunent I and J are output variables, and A is the array (with Quicksort valuesTony of x may Hoarenot be negative and reM values of x may not be subscript bounds M:N) which is operated upon by this procedure. smaller than 1. Partition takes the value X of a random element of the array A, Values of Qd~'(x) may be calculated easily by hypergeometric and rearranges the values of the elements of the array in such a series if x is not too nor (n - m) too large. Q~m(x) can be way that there exist integers I and J with the following properties : computed from an appropriate set of values of Pnm(X) if X is near M _-< J < I =< NprovidedM < N 1.0 or ix is near 0. Loss of significant digits occurs for x as small as A[R] =< XforM =< R _-< J Basic plan. 1.1 if n is largerInvented than 10. Loss of significant quicksort digits is a major diffi-to translateA[R] = RussianXforJ < R < I into English. culty・ in using finite polynomiM representations also if n is larger A[R] ~ Xfor I =< R ~ N than m. However, QLEG has been tested in regions of x and n The procedure uses an integer procedure random (M,N) which Shuffle the array. both large and[ butsmall; couldn't explain hischooses algorithmequiprobably a random integer or F implementbetween M and N, and it! ] ・ procedure・ QLEG(m, nmax, x, ri, R, Q); value In, nmax, x, ri; also a procedure exchange, which exchanges the values of its two real In, mnax, x, ri; real array R, Q; parameters ; begin real t, i, n, q0, s; begin real X; integer F; j n := 20; ・Partition so that, for some ・Learned Algol 60 (and recursion).F := random (M,N); X := A[F]; if nmax > 13 then I:=M; J:=N; n := nmax +7; up: for I : = I step 1 until N do – entry a[j] is in place if Implementedri = 0then quicksort. if X < A [I] then go to down; ・ begin ifm = 0then I:=N; Q[0] := 0.5 X 10g((x + 1)/(x - 1)) down: forJ := J step --1 until M do else if A[J]

847 Communications * This workOctober was 1978 supported in part by the Fannie and John Hertz Foundation, and of in part byVolume the National21 Science Foundation Grants No. GJ-28074 and MCS75-23738. the ACM 22 Acta Informatica,Number 10Vol. 7 Quicksort partitioning demo Quicksort partitioning demo

Repeat until i and j pointers cross. Repeat until i and j pointers cross. ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Exchange a[i] with a[j]. ・Exchange a[i] with a[j].

When pointers cross. ・Exchange a[lo] with a[j].

K R A T E L E P U I M Q C X O S E C A I E K L P U T M Q R X O S

lo i j lo j hi

partitioned! 9 10

The music of quicksort partitioning (by Brad Lyon) Quicksort: Java code for partitioning

private static int partition(Comparable[] a, int lo, int hi) { int i = lo, j = hi+1; while (true) {

while (less(a[++i], a[lo])) find item on left to swap if (i == hi) break;

while (less(a[lo], a[--j])) find item on right to swap if (j == lo) break;

if (i >= j) break; check if pointers cross exch(a, i, j); swap }

exch(a, lo, j); swap with partitioning item before v https://googledrive.com/host/0B2GQktu-wcTicjRaRjV1NmRFN1U/index.html return j; return index of item now known to be in place } lo hi

before v during v Յ v Ն v

lo hi i j

after before v during v Յ v Ն v Յ v v Ն v

lo hi i j lo j hi

during v Յ v Ն v after Յ v v Ն v Quicksort partitioning overview 11 12 i j lo j hi

after Յ v v Ն v Quicksort partitioning overview

lo j hi

Quicksort partitioning overview Quicksort quiz 1 Quicksort: Java implementation

Q. How many compares (in the worst case) to partition an array of length N ?

public class Quick A. ~ ¼ N { private static int partition(Comparable[] a, int lo, int hi) B. ~ ½ N { /* see previous slide */ }

C. ~ N public static void sort(Comparable[] a) { StdRandom.shuffle(a); shuffle needed for D. ~ N lg N performance guarantee sort(a, 0, a.length - 1); (stay tuned) } E. I don't know. private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); } }

13 14

Quicksort trace Quicksort animation

50 random items

lo j hi 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 initial values Q U I C K S O R T E X A M P L E random shufe K R A T E L E P U I M Q C X O S 0 5 15 E C A I E K L P U T M Q R X O S 0 3 4 E C A E I K L P U T M Q R X O S 0 2 2 A C E E I K L P U T M Q R X O S 0 0 1 A C E E I K L P U T M Q R X O S 1 1 A C E E I K L P U T M Q R X O S 4 4 A C E E I K L P U T M Q R X O S 6 6 15 A C E E I K L P U T M Q R X O S no partition 7 9 15 A C E E I K L M O P T Q R X U S for subarrays of size 1 7 7 8 A C E E I K L M O P T Q R X U S 8 8 A C E E I K L M O P T Q R X U S 10 13 15 A C E E I K L M O P S Q R T U X 10 12 12 A C E E I K L M O P R Q S T U X 10 11 11 A C E E I K L M O P Q R S T U X 10 10 A C E E I K L M O P Q R S T U X 14 14 15 A C E E I K L M O P Q R S T U X 15 15 A C E E I K L M O P Q R S T U X algorithm position result A C E E I K L M O P Q R S T U X in order Quicksort trace (array contents after each partition) current subarray not in order http://www.sorting-algorithms.com/quick-sort

15 16 Quicksort: implementation details Quicksort: empirical analysis (1961)

Partitioning in-place. Using an extra array makes partitioning easier Running time estimates: (and stable), but is not worth the cost. ・Algol 60 implementation. ・National-Elliott 405 computer. Terminating the loop. Testing whether the pointers cross is trickier than it might seem.

Equal keys. When duplicates are present, it is (counter-intuitively) better to stop scans on keys equal to the partitioning item's key. stay tuned

Preserving randomness. Shuffling is needed for performance guarantee. Equivalent alternative. Pick a random partitioning item in each subarray.

Elliott 405 magnetic disc sorting N 6-word items with 1-word keys (16K words)

17 18

Quicksort: empirical analysis Quicksort: best-case analysis

Running time estimates: Best case. Number of compares is ~ N lg N. ・Home PC executes 108 compares/second. ・Supercomputer executes 1012 compares/second.

initial values

random shuffle (N2) mergesort (N log N) quicksort (N log N) computer thousand million billion thousand million billion thousand million billion

home instant 2.8 hours 317 years instant 1 second 18 min instant 0.6 sec 12 min

super instant 1 second 1 week instant instant instant instant instant instant

Lesson 1. Good algorithms are better than supercomputers. Lesson 2. Great algorithms are better than good ones.

19 20 Quicksort: worst-case analysis Quicksort: average-case analysis

2 Worst case. Number of compares is ~ ½ N . Proposition. The average number of compares CN to quicksort an array of

N distinct keys is ~ 2N ln N (and the number of exchanges is ~ ⅓ N ln N ).

initial values Pf. CN satisfies the recurrence C0 = C1 = 0 and for N ≥ 2: left right random shuffle partitioning

C0 + CN 1 C1 + CN 2 CN 1 + C0 C =(N + 1) + + + ... + N N N N ・Multiply both sides by N and collect terms: partitioning probability

NCN = N(N + 1) + 2(C0 + C1 + ... + CN 1) ・Subtract from this equation the same equation for N - 1:

NCN (N 1)CN 1 =2N +2CN 1 ・Rearrange terms and divide by N (N + 1):

CN CN 1 2 = + N +1 N N +1

21 22

Quicksort: average-case analysis Quicksort: summary of performance characteristics

・Repeatedly apply previous equation: Quicksort is a (Las Vegas) randomized algorithm. CCNN CCNN 1 1 22 Guaranteed to be correct. = ++ ・ NN+1+1 NN NN+1+1 Running time depends on random shuffle. CN 2 2 2 ・ = + + substitute previous equation N 1 N N +1 CN 3 2 2 2 Average case. Expected number of compares is . = + + + ~ 1.39 N lg N N 2 N 1 N N +1 39% more compares than mergesort. 2 2 2 2 ・ = + + + ... + 3 4 5 N +1 ・Faster than mergesort in practice because of less data movement.

・Approximate sum by an integral: Best case. Number of compares is ~ N lg N. Worst case. Number of compares is 2. 1 1 1 1 ~ ½ N C = 2(N + 1) + + + ... N 3 4 5 N +1 [ but more likely that lightning bolt strikes computer during execution ] ✓ ◆ N+1 1 2(N + 1) dx ⇠ x Z3

・Finally, the desired result: C 2(N + 1) ln N 1.39N lg N N ⇥ 23 24 Quicksort properties Quicksort: practical improvements

Proposition. Quicksort is an in-place sorting algorithm. Insertion sort small subarrays. Pf. ・Even quicksort has too much overhead for tiny subarrays. ・Partitioning: constant extra space. ・Cutoff to insertion sort for ≈ 10 items. ・Depth of recursion: logarithmic extra space (with high probability).

can guarantee logarithmic depth by recurring on smaller subarray before larger subarray (requires using an explicit stack)

Proposition. Quicksort is not stable. private static void sort(Comparable[] a, int lo, int hi) Pf. [ by counterexample ] { if (hi <= lo + CUTOFF - 1) i j 0 1 2 3 {

B1 C1 C2 A1 Insertion.sort(a, lo, hi); return; 1 3 B1 C1 C2 A1 } int j = partition(a, lo, hi); 1 3 B1 A1 C2 C1 sort(a, lo, j-1); sort(a, j+1, hi); 0 1 A1 B1 C2 C1 }

25 26

Quicksort: practical improvements

Median of sample. ・Best choice of pivot item = median. ・Estimate true median by taking median of sample. ・Median-of-3 (random) items. UICKSORT ~ 12/7 N ln N compares (14% fewer) 2.3 Q ~ 12/35 N ln N exchanges (3% more) ‣ quicksort ‣ selection

private static void sort(Comparable[] a, int lo, int hi) ‣ duplicate keys { Algorithms if (hi <= lo) return; ‣ system sorts

int median = medianOf3(a, lo, lo + (hi - lo)/2, hi); ROBERT SEDGEWICK | KEVIN WAYNE swap(a, lo, median); http://algs4.cs.princeton.edu int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }

27 Selection Quick-select

Goal. Given an array of N items, find the kth smallest item. Partition array so that: Ex. Min (k = 0), max (k = N - 1), median (k = N / 2). ・Entry a[j] is in place. ・No larger entry to the left of j. Applications. ・No smaller entry to the right of j. ・Order statistics. ・Find the "top k." Repeat in one subarray, depending on j; finished when j equals k.

Use theory as a guide. public static Comparable select(Comparable[] a, intbefore k) v { Easy N log N upper bound. How? lo if a[k] is here if a[k] is herehi ・ StdRandom.shuffle(a); set hi to j-1 set lo to j+1 ・Easy N upper bound for k = 1, 2, 3. How? int lo = 0, hi = a.length - 1; during v Յ v Ն v while (hi > lo) ・Easy N lower bound. Why? { i j int j = partition(a, lo, hi); after Յ v v Ն v if (j < k) lo = j + 1; Which is true? else if (j > k) hi = j - 1; lo j hi else return a[k]; N log N lower bound? is selection as hard as sorting? Quicksort partitioning overview ・ } ・N upper bound? is there a linear-time algorithm? return a[k]; }

29 30

Quick-select: mathematical analysis Theoretical context for selection

Proposition. Quick-select takes linear time on average. Proposition. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] Compare-based selection algorithm whose worst-case running time is linear. Pf sketch. Time Bounds for Selection ・Intuitively, each partitioning step splits array approximately in half: bY . N + N / 2 + N / 4 + … + 1 ~ 2N compares. , Robert W. Floyd, Vaughan Watt, Ronald L. Rive&, and Robert E. Tarjan ・Formal analysis similar to quicksort analysis yields:

CN = 2 N + 2 k ln (N / k) + 2 (N – k) ln (N / (N – k)) Abstract

L The number of comparisons required to select the i-th smallest of

n numbers is shown to be at most a linear function of n by analysis of Ex: (2 + 2 ln 2) N ≈ 3.38 N compares to find median k = N / 2. i ・ a new selection algorithm -- PICK. Specifically, no more than i L 5.4305 n comparisons are ever required. This bound is improved for extreme values of i , and a new lower bound on the requisite number L of comparisons is also proved. Remark. Constants are high ⇒ not used in practice.

Use theory as a guide.

L ・Still worthwhile to seek practical linear-time (worst-case) algorithm. ・Until one is discovered, use quick-select if you don’t need a full sort.

31 32

This work was supported by the National Science Foundation under grants GJ-992 and GJ-33170X.

1 Duplicate keys

Often, purpose of sort is to bring items with equal keys together. ・Sort population by age. ・Remove duplicates from mailing list. ・Sort job applicants by college attended. sorted by time sorted by city (unstable) sorted by city (stable) 2.3 QUICKSORT Chicago 09:00:00 Chicago 09:25:52 Chicago 09:00:00 Typical characteristics of such applications.Phoenix 09:00:03 Chicago 09:03:13 Chicago 09:00:59 Houston 09:00:13 Chicago 09:21:05 Chicago 09:03:13 ‣ quicksort ・Huge array. Chicago 09:00:59 Chicago 09:19:46 Chicago 09:19:32 Houston 09:01:10 Chicago 09:19:32 Chicago 09:19:46 ・Small number of key values. Chicago 09:03:13 Chicago 09:00:00 Chicago 09:21:05 ‣ selection Seattle 09:10:11 Chicago 09:35:21 Chicago 09:25:52 Seattle 09:10:25 Chicago 09:00:59 Chicago 09:35:21 ‣ duplicate keys Phoenix 09:14:25 Houston 09:01:10 Houston 09:00:13 NOT Algorithms Chicago 09:19:32 Houston 09:00:13 Houston 09:01:10 sorted ‣ system sorts Chicago 09:19:46 Phoenix 09:37:44 sorted Phoenix 09:00:03 Chicago 09:21:05 Phoenix 09:00:03 Phoenix 09:14:25 Seattle 09:22:43 Phoenix 09:14:25 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:10:25 Seattle 09:10:11 ROBERT SEDGEWICK | KEVIN WAYNE Chicago 09:25:52 Seattle 09:36:14 Seattle 09:10:25 http://algs4.cs.princeton.edu Chicago 09:35:21 Seattle 09:22:43 Seattle 09:22:43 Seattle 09:36:14 Seattle 09:10:11 Seattle 09:22:54 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:36:14

Stability when sorting on a second key key

34

Duplicate keys: stop on equal keys Quicksort quiz 2

Our partitioning subroutine stops both scans on equal keys. What is the result of partitioning the following array (skip over equal keys)?

scan until > A scan until < A scan until ≥ P scan until ≤ P

A A A A A A A A A A A A A A A A P G E P A Q B P Y C O U P Z S R 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A. A A A A A A A A A A A A A A A A

Q. Why not continue scans on equal keys? B. A A A A A A A A A A A A A A A A scan until > P scan until < P

P G E P A Q B P Y C O U P Z S R C. A A A A A A A A A A A A A A A A

D. I don't know. 35 36 Quicksort quiz 3 Partitioning an array with all equal keys

What is the result of partitioning the following array (stop on equal keys)?

scan until ≥ A scan until ≤ A

A A A A A A A A A A A A A A A A

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A. A A A A A A A A A A A A A A A A

B. A A A A A A A A A A A A A A A A

C. A A A A A A A A A A A A A A A A

D. I don't know. 37 38

Duplicate keys: partitioning strategies DUTCH NATIONAL FLAG PROBLEM

Bad. Don't stop scans on equal keys. [ ~ ½ N 2 compares when all keys equal ] Problem. [Edsger Dijkstra] Given an array of N buckets, each containing a red, white, or blue pebble, sort them by color. B A A B A B B B C C C A A A A A A A A A A A

input

Good. Stop scans on equal keys. sorted [ ~ N lg N compares when all keys equal ] Operations allowed. B A A B A B C C B C B A A A A A A A A A A A ・swap(i, j): swap the pebble in bucket i with the pebble in bucket j. ・color(i): color of pebble in bucket i.

Better. Put all equal keys in place. How? Requirements. [ ~ N compares when all keys equal ] ・Exactly N calls to color(). ・At most N calls to swap(). A A A B B B B B C C C A A A A A A A A A A A ・Constant extra space.

39 40 3-way partitioning Dijkstra 3-way partitioning demo

Goal. Partition array into three parts so that: ・Let v be partitioning item a[lo]. ・Entries between lt and gt equal to the partition item. ・Scan i from left to right. ・No larger entries to left of lt. – (a[i] < v): exchange a[lt] with a[i]; increment both lt and i ・No smaller entries to right of gt. – (a[i] > v): exchange a[gt] with a[i]; decrement gt before v – (a[i] == v): increment i

lo hi before v during v

lo hi lt i gt lt i gt during v after v

lt i gt lo lt gt hi P A B X W P P V P D P C Y Z after v 3-way partitioning lo lt gt hi lo hi

3-way partitioning before v Dutch national flag problem. [Edsger Dijkstra] lo hi ・Conventional wisdom until mid 1990s: not worth doing. during v ・Now incorporated into C library qsort() and Java 6 system sort. lt i gt 41 after v 42

lo lt gt hi

Dijkstra 3-way partitioning demo Dijkstra's 3-way partitioning:3-way trace partitioning

・Let v be partitioning item a[lo]. ・Scan i from left to right. v a[] – (a[i] < v): exchange a[lt] with a[i]; increment both lt and i lt i gt 0 1 2 3 4 5 6 7 8 9 10 11 – (a[i] > v): exchange a[gt] with a[i]; decrement gt 0 0 11 R B W W R W B R R W B R – (a[i] == v): increment i 0 1 11 R B W W R W B R R W B R 1 2 11 B R W W R W B R R W B R 1 2 10 B R R W R W B R R W B W 1 3 10 B R R W R W B R R W B W

lt gt 1 3 9 B R R B R W B R R W W W 2 4 9 B B R R R W B R R W W W 2 5 9 B B R R R W B R R W W W A B C D P P P P P V W Y Z X 2 5 8 B B R R R W B R R W W W 2 5 7 B B R R R R B R W W W W

lo hi 2 6 7 B B R R R R B R W W W W 3 7 7 B B B R R R R R W W W W before v invariant 3 8 7 B B B R R R R R W W W W lo hi 3 8 7 B B B R R R R R W W W W during v 3-way partitioning trace (array contents after each loop iteration)

lt i gt

after v 43 44

lo lt gt hi

3-way partitioning 3-way quicksort: Java implementation 3-way quicksort: visual trace

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int lt = lo, gt = hi; Comparable v = a[lo]; int i = lo; while (i <= gt) { int cmp = a[i].compareTo(v); if (cmp < 0) exch(a, lt++, i++); else if (cmp > 0) exch(a, i, gt--); equal to partitioning element else i++; } before v

sort(a, lo, lt - 1); lo hi

sort(a, gt + 1, hi); during v } lt i gt

after v Visual trace of quicksort with 3-way partitioning

lo lt gt hi

3-way partitioning 45 46

Duplicate keys: lower bound Sorting summary

Sorting lower bound. If there are n distinct keys and the ith one occurs xi times, any compare-based sorting algorithm must use at least inplace? stable? best average worst remarks n N! xi lg x lg N lg N when all distinct; x ! x ! x ! ⇤ i N selection ✔ ½ N 2 ½ N 2 ½ N 2 N exchanges 1 2 n i=1 linear when only a constant number of distinct keys ··· ⇥ ⇤ compares in the worst case. use for small N insertion ✔ ✔ N ¼ N 2 ½ N 2 or partially ordered

tight code; shell ✔ N log N ? c N 3/2 3 subquadratic

N log N guarantee; merge ✔ ½ N lg N N lg N N lg N Proposition. [Sedgewick-Bentley 1997] proportional to lower bound stable Quicksort with 3-way partitioning is entropy-optimal. improves mergesort ✔ N N lg N N lg N when preexisting order Pf. [beyond scope of course] N log N probabilistic guarantee; quick ✔ N lg N 2 N ln N ½ N 2 fastest in practice

improves quicksort 3-way quick ✔ N 2 N ln N ½ N 2 Bottom line. Quicksort with 3-way partitioning reduces running time when duplicate keys from linearithmic to linear in broad class of applications. ? ✔ ✔ N N lg N N lg N holy sorting grail

47 48 Sorting applications

Sorting algorithms are essential in a broad variety of applications: ・Sort a list of names. ・Organize an MP3 library. obvious applications ・Display Google PageRank results. 2.3 QUICKSORT ・List RSS feed in reverse chronological order.

‣ quicksort ・Find the median. Identify statistical outliers. problems become easy once items ‣ selection ・ are in sorted order Binary search in a database. ‣ duplicate keys ・ Algorithms ・Find duplicates in a mailing list. ‣ system sorts ・Data compression. ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu ・Computer graphics. non-obvious applications ・Computational biology. ・Load balancing on a parallel computer. . . .

50

War story (system sort in C) War story (system sort in C)

A beautiful bug report. [Allan Wilks and Rick Becker, 1991] Bug. A qsort() call that should have taken seconds was taking minutes.

We found that qsort is unbearably slow on "organ-pipe" inputs like "01233210": Why is qsort() so slow? main (int argc, char**argv) { int n = atoi(argv[1]), i, x[100000]; for (i = 0; i < n; i++) x[i] = i; for ( ; i < 2*n; i++) x[i] = 2*n-i-1; qsort(x, 2*n, sizeof(int), intcmp); } At the time, almost all qsort() implementations based on those in: Here are the timings on our machine: Version 7 Unix (1979): quadratic time to sort organ-pipe arrays. $ time a.out 2000 ・ real 5.85s ・BSD Unix (1983): quadratic time to sort random arrays of 0s and 1s. $ time a.out 4000 real 21.64s $time a.out 8000 real 85.11s

51 52 Engineering a system sort (in 1993) A beautiful mailing list post (Yaroslavskiy, September 2009)

Bentley-McIlroy quicksort. samples 9 items Replacement of quicksort in java.util.Arrays with new dual-pivot quicksort ・Cutoff to insertion sort for small subarrays. ・Partitioning item: median of 3 or Tukey's ninther. Hello All, ・Partitioning scheme: Bentley-McIlroy 3-way partitioning. I'd like to share with you new Dual-Pivot Quicksort which is faster than the known implementations (theoretically and experimental). I'd like to propose to replace the JDK's Quicksort implementation by new one. similar to Dijkstra 3-way partitioning SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(11), 1249–1265(but fewer (NOVEMBER exchanges 1993) when not many equal keys) ...

The new Dual-Pivot Quicksort uses *two* pivots elements in this manner: Engineering a Sort Function 1. Pick an elements P1, P2, called pivots from the array.

JON L. BENTLEY 2. Assume that P1 <= P2, otherwise swap it. M. DOUGLAS McILROY AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, U.S.A. 3. Reorder the array into three parts: those less than the smaller pivot, those larger than the larger pivot, and in between are those elements between (or equal to) the two pivots. SUMMARY 4. Recursively sort the sub-arrays. We recount the history of a new qsort function for a C library. Our function is clearer, faster and more robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a novel solution to Dijkstra’s Dutch National Flag problem; and it swaps efficiently. Its behavior was assessed with timing and debugging testbeds, and with a program to certify performance. The design The invariant of the Dual-Pivot Quicksort is: techniques apply in domains beyond sorting.

KEY WORDS Quicksort Sorting algorithms Performance tuning Algorithm design and implementation Testing [ < P1 | P1 <= & <= P2 } > P2 ]

INTRODUCTION C libraries have long included a qsort function to sort an array, usually implemented by ... Very widelyHoare’s used. Quicksort. C,1 BecauseC++, existing Javaqsort 6,s are…. flawed, we built a new one. This paper summarizes its evolution. Compared to existing library sorts, our new qsort is faster—typically about twice as http://mail.openjdk.java.net/pipermail/core-libs-dev/2009-September/002630.html fast—clearer, and more robust under nonrandom inputs. It uses some standard Quicksort 53 54 tricks, abandons others, and introduces some new tricks of its own. Our approach to build- ing a qsort is relevant to engineering other algorithms. The qsort on our home system, based on Scowen’s ‘Quickersort’,2 had served faith- fully since Lee McMahon wrote it almost two decades ago. Shipped with the landmark Sev- enth Edition Unix System,3 it became a model for other qsorts. Yet in the summer of 1991 our colleagues Allan Wilks and Rick Becker found that a qsort run that should have A beautifultaken mailing a few minutes list was chewingpost up (Yaroslavskiy-Bloch-Bentley, hours of CPU time. Had they not interrupted it, it October 2009) Dual-pivot quicksort would have gone on for weeks.4 They found that it took n 2 comparisons to sort an ‘organ- pipe’ array of 2n integers: 123..nn.. 321. Shopping around for a better qsort, we found that a qsort written at Berkeley in 1983 would consume quadratic time on arrays that contain a few elements repeated many times—in particular arrays of random zeros and ones.5 In fact, among a dozen different Use two partitioning items p1 and p2 and partition into three subarrays: ReplacementUnix librariesof quicksort we found no inqsort java.util.Arraysthat could not easily bewith driven new to quadratic dual-pivot behavior; all quicksort were derived from the Seventh Edition or from the 1983 Berkeley function. The Seventh ・Keys less than p1. Date: Thu, 29 Oct 2009 11:19:39 +0000 0038-0644/93/111249–17$13.50 Received 21 August 1992 ・Keys between p1 and p2. Subject: Replace 1993 by Johnquicksort Wiley & Sons, in Ltd. java.util.Arrays withRevised dual-pivot 10 May 1993 implementation ・Keys greater than p2. Changeset: b05abb410c52 Author: alanb Date: 2009-10-29 11:18 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/b05abb410c52 < p1 p1 ≥ p1 and ≤ p2 p2 > p2 6880672: Replace quicksort in java.util.Arrays with dual-pivot implementation Reviewed-by: jjb Contributed-by: vladimir.yaroslavskiy at sun.com, joshua.bloch at google.com, lo lt gt hi jbentley at avaya.com

! make/java/java/FILES_java.gmk ! src/share/classes/java/util/Arrays.java + src/share/classes/java/util/DualPivotQuicksort.java Recursively sort three subarrays.

http://mail.openjdk.java.net/pipermail/compiler-dev/2009-October.txt

degenerates to Dijkstra's 3-way partitioning

Note. Skip middle subarray if p1 = p2.

55 56 Dual-pivot partitioning demo Dual-pivot partitioning demo

Initialization. Main loop. Repeat until i and gt pointers cross. ・Choose a[lo] and a[hi] as partitioning items. ・If (a[i] < a[lo]), exchange a[i] with a[lt] and increment lt and i. ・Exchange if necessary to ensure a[lo] ≤ a[hi]. ・Else if (a[i] > a[hi]), exchange a[i] with a[gt] and decrement gt. ・Else, increment i.

< p1 p1 ≥ p1 and ≤ p2 ? p2 > p2

lo lt i gt hi

S E A Y R L F V Z Q T C M K K E A F R L M C Z Q T V Y S

lo hi lo lt i gt hi

exchange a[lo] and a[hi]

Dual-pivot partitioning demo Dual-pivot quicksort

Finalize. Use two partitioning items p1 and p2 and partition into three subarrays: ・Exchange a[lo] with a[--lt]. ・Keys less than p1. ・Exchange a[hi] with a[++gt]. ・Keys between p1 and p2. ・Keys greater than p2.

< p1 p1 ≥ p1 and ≤ p2 p2 > p2 < p1 p1 ≥ p1 and ≤ p2 p2 > p2

lo lt gt hi lo lt gt hi

C E A F K L M R Q S Z V Y T

lo lt gt hi Now widely used. Java 7, Python unstable sort, Android, …

3-way partitioned 60 Three-pivot quicksort Quicksort quiz 4

Use three partitioning items p1, p2, and p3 and partition into four subarrays: Why do 2-pivot (and 3-pivot) quicksort perform better than 1-pivot?

・Keys less than p1. A. Fewer compares. ・Keys between p1 and p2. B. Fewer exchanges. ・Keys between p2 and p3. ・Keys greater than p3. C. Fewer cache misses. D. I don't know. < p1 p1 ≥ p1 and ≤ p2 p2 ≥ p2 and ≤ p3 p3 > p3

lo a1 a3 hi

Multi-Pivot Quicksort: Theory and Experiments

Shrinu Kushagra Alejandro L´opez-Ortiz J. Ian Munro [email protected] [email protected] [email protected] University of Waterloo University of Waterloo University of Waterloo Aurick Qiao [email protected] University of Waterloo November 7, 2013

Abstract In 2009, Vladimir Yaroslavskiy introduced a novel 61 62 The idea of multi-pivot quicksort has recently received dual-pivot partitioning algorithm. When run on a bat- the attention of researchers after Vladimir Yaroslavskiy tery of tests under the JVM, it outperformed the stan- proposed a dual pivot quicksort algorithm that, con- dard quicksort algorithm [10]. In the subsequent re- trary to prior intuition, outperforms standard quicksort lease of Java 7, the internal sorting algorithm was re- by a a significant margin under the Java JVM [10]. More placed by Yaroslavskiy’s variant. Three years later, recently, this algorithm has been analysed in terms of Wild and Nebel [9] published a rigorous average-case Quicksort comparisonsquiz 4 and swaps by Wild and Nebel [9]. Our con- analysis of the algorithm. They stated that the previ- Which sorting algorithm to use? tributions to the topic are as follows. First, we perform ous lower bound relied on assumptions that no longer the previous experiments using a native C implementa- hold in Yaroslavskiy’s implementation. The dual pivot tion thus removing potential extraneous e↵ects of the approach actually uses less comparisons (1.9n ln n vs JVM. Second, we provide analyses on cache behavior 2.0n ln n) on average. However, the di↵erence in run- Why do 2-pivotof these algorithms.(and We3-pivot) then provide strong quicksort evidence time isperform much greater than better the di↵erence inthan number of1-pivot? Many sorting algorithms to choose from: that cache behavior is causing most of the performance comparisons. We address this issue and provide an ex- planation in 5. di↵erences in these algorithms. Additionally, we build § upon prior work in multi-pivot quicksort and propose Aum¨uller and Dietzfelbinger [1] (ICALP2013) have A. Fewera 3-pivot compares. variant that performs very well in theory and recently addressed the following question: If the previ- practice. We show that it makes fewer comparisons and ous lower bound does not hold, what is really the best has better cache behavior than the dual pivot quicksort we can do with two pivots? They prove a 1.8n ln n lower in the expected case. We validate this with experimen-# entriesbound scanned on the number is ofa comparisonsgood proxy for all dual-pivot sorts algorithms B. Fewer exchanges. quicksort algorithms and introduced an algorithm that tal results, showing a 7-8% performance improvementfor cache performance when in our tests. actually achieves that bound. In their experimentation, comparingthe algorithm quicksort is outperformed variants by Yaroslavskiy’s quick- 1 Introduction sort when sorting integer data. However, their algo- C. Fewer cache misses. rithm does perform better with large data (eg. strings) elementary sorts insertion sort, , bubblesort, shaker sort, ... 1.1 Background Up until about a decade ago it was since comparisons incur high cost. thought that the classic quicksort algorithm [3] using one pivot is superior to any multi-pivot scheme. It was 1.2 The Processor-Memory Performance Gap previously believed that using more pivots introduces Both presently and historically, the performance of CPU quicksort, mergesort, , , , ... too much overhead for the advantages gained. In 2002, subquadratic sorts registers have far outpaced that of main memory. Ad- Sedgewickpartitioning and Bentley [7] recognised andcompares outlined some exchanges entries scanned

Downloaded 02/07/14 to 96.248.80.75. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php ditionally, this performance gap between the processor of the advantages to a dual-pivot quicksort. However, and memory has been increasing since their introduc- the implementation did not perform as well as the classic tion. Every year, the performance of memory improves quicksort algorithm [9] and this path was not explored system sorts dual-pivot quicksort, timsort, , ... by about 10% while that of the processor improves by again until recent years. 1-pivot 2 N ln N 60% [5]. The performance0.333 N di ↵lnerence N grows so quickly 2 N ln N

external sorts Poly-phase mergesort, cascade-merge, psort, .… 47 Copyright © 2014. median-of-3 1.714 N ln N by the Society0.343 for Industrial N ln andN Applied Mathematics.1.714 N ln N

MSD, LSD, 3-way radix quicksort, … 2-pivot 1.9 N ln N 0.6 N ln N 1.6 N ln N radix sorts

3-pivot 1.846 N ln N 0.616 N ln N 1.385 N ln N parallel sorts bitonic sort, odd-even sort, smooth sort, GPUsort, ...

Reference: Analysis of Pivot Sampling in Dual-Pivot Quicksort by Wild-Nebel-Martínez

Bottom line. Caching can have a significant impact on performance.

beyond scope of this course 63 64 Which sorting algorithm to use? System sort in Java 7

Applications have diverse attributes. Arrays.sort(). ・Stable? ・Has method for objects that are Comparable. ・Parallel? ・Has overloaded method for each primitive type. ・In-place? ・Has overloaded method for use with a Comparator. ・Deterministic? ・Has overloaded methods for sorting subarrays. ・Duplicate keys? ・Multiple key types? Algorithms. ・Linked list or arrays? ・Dual-pivot quicksort for primitive types. Large or small items? Timsort for reference types. ・ many more combinations of ・ ・Randomly-ordered array? attributes than algorithms ・Guaranteed performance? Q. Why use different algorithms for primitive and reference types?

Q. Is the system sort good enough? A. Usually.

65 66

Ineffective sorts

http://xkcd.com/1185 67