<<

Algorithms ROBERT SEDGEWICK | KEVIN WAYNE

2.3

‣ quicksort ‣ selection Algorithms ‣ duplicate keys FOURTH EDITION ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Two classic sorting algorithms: mergesort and quicksort

Critical components in the world’s computational infrastructure. ・Full scientific understanding of their properties has enabled us to develop them into practical system sorts. ・Quicksort honored as one of top 10 algorithms of 20th century in science and .

Mergesort. [last lecture]

...

Quicksort. [this lecture]

...

2 Quicksort t-shirt

3 2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Quicksort overview

Step 1. Shuffle the array. Step 2. Partition the array so that, for some j ・Entry a[j] is in place. ・No larger entry to the left of j. ・No smaller entry to the right of j. Step 3. each subarray recursively.

input Q U I K S O R T E X A M P L E shufe K R A T E L E P U I M Q C X O S partitioning item

partition E C A I E K L P U T M Q R X O S not greater not less sort left A C E E I K L P U T M Q R X O S sort right A C E E I K L M O P Q R S T U X result A C E E I K L M O P Q R S T U X Quicksort overview 5 number). 9.9 X 10 45 is used to represent infinity. Imaginary conunent I and J are output variables, and A is the array (with valuesTony of x may Hoarenot be negative and reM values of x may not be subscript bounds M:N) which is operated upon by this procedure. smaller than 1. Partition takes the value X of a random element of the array A, Values of Qd~'(x) may be calculated easily by hypergeometric and rearranges the values of the elements of the array in such a series if x is not too small nor (n - m) too large. Q~m(x) can be way that there exist integers I and J with the following properties : computed from an appropriate set of values of Pnm(X) if X is near M _-< J < I =< NprovidedM < N 1.0 or ix is near 0. Loss of significant digits occurs for x as small as A[R] =< XforM =< R _-< J 1.1 if n is largerInvented than 10. Loss of significant quicksort digits is a major diffi-to translateA[R] = RussianXforJ < R < I into English. culty・ in using finite polynomiM representations also if n is larger A[R] ~ Xfor I =< R ~ N than m. However, QLEG has been tested in regions of x and n The procedure uses an integer procedure random (M,N) which both large and[ butsmall; couldn't explain hischooses algorithmequiprobably a random integer or F implementbetween M and N, and it! ] procedure・ QLEG(m, nmax, x, ri, R, Q); value In, nmax, x, ri; also a procedure exchange, which exchanges the values of its two real In, mnax, x, ri; real array R, Q; parameters ; begin real t, i, n, q0, s; begin real X; integer F; ・n :=Learned 20; Algol 60 (and recursion).F := random (M,N); X := A[F]; if nmax > 13 then I:=M; J:=N; n := nmax +7; up: for I : = I step 1 until N do if Implementedri = 0then quicksort. if X < A [I] then go to down; ・ begin ifm = 0then I:=N; Q[0] := 0.5 X 10g((x + 1)/(x - 1)) down: forJ := J step --1 until M do else if A[J]

Communications of the &CM 319

・Invented quicksort to translate Russian into English. ・ [ but couldn't explain his algorithm or implement it! ] ・Learned Algol 60 (and recursion). ・Implemented quicksort.

Tony Hoare “ There are two ways of constructing a design: One way is 1980 Turing Award to it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. ”

“ I call it my billion-dollar mistake. It was the invention of the null reference in 1965… This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. ”

7 Bob Sedgewick

・Refined and popularized quicksort. ・Analyzed many versions of quicksort.

rithms, and we can learn from that experience to separate Bob Sedgewick good algorithms from bad ones. Third, if the tile fits into the memory of the computer, there is one algorithm, called Quicksort, which has been shown to perform well Programming S. L. Graham, R. L. Rivest in a variety of situations. Not only is this algorithm Techniques Editors simpler than many other sorting algorithms, but empir- ical [2, ll, 13, 21] and analytic [9] studies show that Quicksort can be expected to be up to twice as fast as its Implementing nearest competitors. The method is simple enough to be learned by programmersActa Informatica who have no7, 327--355previous experi-(1977) Quicksort Programs ence with sorting, and9 by those Springer-Verlag who do know 1977 other sorting Robert Sedgewick methods should also find it profitable to learn about Brown University Quicksort. Because of its prominence, it is appropriate to study how Quicksort might be improved.The This Analysis subject has of Quicksort Programs* This paper is a practical study of how to implement received considerable attention (see, for example, [1, 4, the Quicksort sorting algorithm and its best variants on 11, 13, 14, 18, 20]), but few real improvements have beenRobert Sedgewick suggested beyond those described by C.A.R. Hoare, the real computers, including how to apply various code Received January 19, t976 optimization techniques. A detailed implementation inventor of Quicksort, in his original papers [5, 6]. Hoare also showed how to analyze Quicksort and predict its combining the most effective improvements to Summary. The Quicksort sorting algorithm and its best variants are presented Quicksort is given, along with a discussion of how to running time. The analysis has since been extended to the improvements thatand heanalyzed. suggested, Results and used are derivedto indicate which make it possible to obtain exact formulas de- implement it in assembly language. Analytic results scribing the total expected running time of particular implementations on real com- describing the performance of the programs are how they may bestputers be implementedof Quick,sort [9, and 15, an 17]. improvement The called the median-of-three modification. summarized. A variety of special situations are subject of the carefulDetailed implementation analysis of ofthe Quicksort effect of hasan implementation technique called loop unwrapping considered from a practical standpoint to illustrate not been studied asis presented.widely as globalThe paper improvements is intended to not only to present results of direct practical utility, Quicksort's wide applicability as an internal sorting the algorithm, butbut the also savings to illustrate to be therealized intriguing are as mathematics which arises in the complete analysis of this important algorithm. method which requires negligible extra storage. significant. The history of Quicksort is quite complex, Key Words and Phrases: Quicksort, analysis of and [15] contains a full survey of the many variants algorithms, code optimization, sorting which, have been proposed. 1. Introduction CR Categories: 4.0, 4.6, 5.25, 5.31, 5.5 The purpose of this paper is to describe in detail how Quicksort can best beIn implemented t96t-62 C.A.R. to handle Hoare actual presented a new algorithm called Quicksort [7, 8] applications on realwhich computers. is suitable A general for puttingdescription files of into order by computer. This method combines the algorithm is followedelegance by and descriptions efficiency, of andthe mostit remains today the most useful general-purpose effective improvementssorting that method have forbeen computers. proposed The(as practical utility of the algorithm has meant demonstrated in [15]).not only Next, that an it implementationhas been sfibjected of to countless modifications (though few real 8 Quicksort in a typicalimprovements high level language have been is presented, suggested beyond those described by Hoare), but also Introduction and assembly languagethat implementationit has been used issues .in arecountless con- applications, often to sort very large, files. sidered. This discussionConsequently, should easily it is translateimportant to toreal be able to estimate how long an implementation One of the most widely studied practical problems in languages on real machines.of Quicksort Finally, can a benumber expected of special to run, in order to be able to compare variants or is sorting: the use of a computer to put issues are consideredestimate which expenses.may be of Fortunately, importance inas we shall see, this is an algorithm which can be files in order. A person wishing to use a computer to sort particular sorting applications.analyzed. (Hoare recognized this, and gave some analytic results in [8].) It is is faced with the problem of determining which of the This paper is intendedpossible to beto a deriveself-contained exact overviewformulas describing the average performance of real many available algorithms is best suited for his purpose. of the properties ofimplementations Quicksort for use byof thosethe algorithm. who need This task is becoming less difficult than it once was for to actually implement andThe use history the algorithm. of Quicksort A compan- is quite complex, and a full survey of the many variants three reasons. First, sorting is an area in which the ion paper [17] provideswhich the have analytical been proposed results which is given - in [t 7]. In addition, [t 7] gives analytic results mathematical analysis of algorithms has been particu- port much of the discussion presented here. describing many of the improvements which have been suggested for the purpose larly successful: we can predict the performance of many sorting methods and compare them intelligently. Second, of determining which are the most effective. There are many examples in [~ 7] we have a great deal of experience using sorting algo- The Algofithm which illustrate that the simplicity of Quicksort is deceiving. The algorithm has hidden subtleties which can have significant effects on performance. Furthermore, Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct Quicksort is a recursiveas we shall method see, simplefor sorting changes an array to the algorithm or its implementation can radically commercial advantage, the ACM copyright notice and the title of the A[1], A[2] ..... A[N]change by first the "partitioning" analysis. itIn so this that paper,the we shall consider in detail how practical publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy following conditionsimplementations hold: of the best versions of Quicksort may be analyzed. otherwise, or to republish, requires a fee and/or specific permission. In this paper, we will deal with the analysis of: (i) the basic Quicksort algo- (i) Some key v is in its final position in the array. (If it This work was supported in part by the Fannie and John Hertz rithm; (ii) an improvement called the "median-of-three" modification which Foundation and in part by NSF Grants. No. GJ-28074 and MCS75- is thejth smallest, it is in position A[j].) 23738. reduces the average number of comparisons required; and (iii) an implementation (ii) All elements to the left of A[j] are less than or equal Author's address: Division of Applied Mathematics and Computer technique called "loop unwrapping" which reduces the amount of overhead per Science Program, Brown University, Providence, RI 02912. to it. (These elements A [ 1], A [2] ..... A [j - 1] are © 1978 ACM 0001-0782/78/1000-0847 $00.75 called the "leftcomparison. subtile.") These particular methods not only represent the most effective vari-

847 Communications * This workOctober was 1978 supported in part by the Fannie and John Hertz Foundation, and of in part byVolume the National21 Science Foundation Grants No. GJ-28074 and MCS75-23738. the ACM 22 Acta Informatica,Number 10Vol. 7 Quicksort partitioning demo

Repeat until i and j pointers cross. ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Exchange a[i] with a[j].

K R A T E L E P U I M Q C X O S

lo i j

9 Quicksort partitioning demo

Repeat until i and j pointers cross. ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Exchange a[i] with a[j].

When pointers cross. ・Exchange a[lo] with a[j].

E C A I E K L P U T M Q R X O S

lo j hi

partitioned! 10 The music of quicksort partitioning (by Brad Lyon)

https://googledrive.com/host/0B2GQktu-wcTicjRaRjV1NmRFN1U/index.html

11 Quicksort: Java code for partitioning

private static int partition(Comparable[] a, int lo, int hi) { int i = lo, j = hi+1; while (true) {

while (less(a[++i], a[lo])) find item on left to swap if (i == hi) break;

while (less(a[lo], a[--j])) find item on right to swap if (j == lo) break;

if (i >= j) break; check if pointers cross exch(a, i, j); swap }

exch(a, lo, j); swap with partitioning item before v return j; return index of item now known to be in place } lo hi

before v during v Յ v Ն v

lo hi i j

after before v during v Յ v Ն v Յ v v Ն v

lo hi i j lo j hi during v Յ v Ն v after Յ v v Ն v Quicksort partitioning overview 12 i j lo j hi

after Յ v v Ն v Quicksort partitioning overview

lo j hi

Quicksort partitioning overview Quicksort quiz 1

How many compares to partition an array of length N ?

A. ~ ¼ N

B. ~ ½ N

C. ~ N

D. ~ N lg N

E. I don't know.

scan until ≤ M

scan until ≥ M

M A B C D E V W X Y Z

0 1 2 3 4 5 6 7 8 9 10

N+1 compares in worst case (E and V are compared with M twice)

13 Quicksort: Java implementation

public class Quick { private static int partition(Comparable[] a, int lo, int hi) { /* see previous slide */ }

public static void sort(Comparable[] a) { StdRandom.shuffle(a); shuffle needed for performance guarantee sort(a, 0, a.length - 1); (stay tuned) }

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); } }

14 Quicksort trace

lo j hi 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 initial values Q U I C K S O R T E X A M P L E random shufe K R A T E L E P U I M Q C X O S 0 5 15 E C A I E K L P U T M Q R X O S 0 3 4 E C A E I K L P U T M Q R X O S 0 2 2 A C E E I K L P U T M Q R X O S 0 0 1 A C E E I K L P U T M Q R X O S 1 1 A C E E I K L P U T M Q R X O S 4 4 A C E E I K L P U T M Q R X O S 6 6 15 A C E E I K L P U T M Q R X O S no partition 7 9 15 A C E E I K L M O P T Q R X U S for subarrays of size 1 7 7 8 A C E E I K L M O P T Q R X U S 8 8 A C E E I K L M O P T Q R X U S 10 13 15 A C E E I K L M O P S Q R T U X 10 12 12 A C E E I K L M O P R Q S T U X 10 11 11 A C E E I K L M O P Q R S T U X 10 10 A C E E I K L M O P Q R S T U X 14 14 15 A C E E I K L M O P Q R S T U X 15 15 A C E E I K L M O P Q R S T U X

result A C E E I K L M O P Q R S T U X

Quicksort trace (array contents after each partition)

15 Quicksort animation

50 random items

algorithm position in order current subarray not in order http://www.sorting-algorithms.com/quick-sort

16 Quicksort: implementation details

Partitioning in-place. Using an extra array makes partitioning easier (and stable), but is not worth the cost.

Terminating the loop. Testing whether the pointers cross is trickier than it might seem.

Equal keys. When duplicates are present, it is (counter-intuitively) better to stop scans on keys equal to the partitioning item's key. stay tuned

Preserving randomness. Shuffling is needed for performance guarantee. Equivalent alternative. Pick a random partitioning item in each subarray.

17 Quicksort: empirical analysis (1961)

Running time estimates: ・Algol 60 implementation. ・National-Elliott 405 computer.

sorting N 6-word items with 1-word keys Elliott 405 magnetic disc (16K words)

18 Quicksort: empirical analysis

Running time estimates: ・Home PC executes 108 compares/second. ・Supercomputer executes 1012 compares/second.

insertion sort (N2) mergesort (N log N) quicksort (N log N) computer thousand million billion thousand million billion thousand million billion

home instant 2.8 hours 317 years instant 1 second 18 min instant 0.6 sec 12 min

super instant 1 second 1 week instant instant instant instant instant instant

Lesson 1. Good algorithms are better than supercomputers. Lesson 2. Great algorithms are better than good ones.

19 Quicksort: best-case analysis

Best case. Number of compares is ~ N lg N.

initial values

random shuffle

20 Quicksort: worst-case analysis

Worst case. Number of compares is ~ ½ N 2 .

initial values

random shuffle

21 Quicksort: average-case analysis

Proposition. The average number of compares CN to quicksort an array of

N distinct keys is ~ 2N ln N (and the number of exchanges is ~ ⅓ N ln N ).

Pf. CN satisfies the recurrence C0 = C1 = 0 and for N ≥ 2: left right partitioning

C0 + CN 1 C1 + CN 2 CN 1 + C0 C =(N + 1) + + + ... + N N N N ・Multiply both sides by N and collect terms: partitioning probability

NCN = N(N + 1) + 2(C0 + C1 + ... + CN 1) ・Subtract from this equation the same equation for N - 1:

NCN (N 1)CN 1 =2N +2CN 1 ・Rearrange terms and divide by N (N + 1):

CN CN 1 2 = + N +1 N N +1

22 Quicksort: average-case analysis

・Repeatedly apply previous equation: CCNN CCNN 1 1 22 = ++ NN+1+1 NN NN+1+1 CN 2 2 2 = + + substitute previous equation N 1 N N +1 CN 3 2 2 2 = + + + N 2 N 1 N N +1 2 2 2 2 = + + + ... + 3 4 5 N +1

・Approximate sum by an integral: 1 1 1 1 C = 2(N + 1) + + + ... N 3 4 5 N +1 ✓ ◆ N+1 1 2(N + 1) dx ⇠ x Z3

・Finally, the desired result: C 2(N + 1) ln N 1.39N lg N N ⇥ 23 Quicksort: summary of performance characteristics

Quicksort is a (Las Vegas) randomized algorithm. ・Guaranteed to be correct. ・Running time depends on random shuffle.

Average case. Expected number of compares is ~ 1.39 N lg N. ・39% more compares than mergesort. ・Faster than mergesort in practice because of less data movement.

Best case. Number of compares is ~ N lg N. Worst case. Number of compares is ~ ½ N 2. [ but more likely that lightning bolt strikes computer during execution ]

24 Quicksort properties

Proposition. Quicksort is an in-place sorting algorithm. Pf. ・Partitioning: constant extra space. ・Depth of recursion: logarithmic extra space (with high probability).

can guarantee logarithmic depth by recurring on smaller subarray before larger subarray (but requires using an explicit stack)

Proposition. Quicksort is not stable. Pf. [ by counterexample ]

i j 0 1 2 3

B1 C1 C2 A1

1 3 B1 C1 C2 A1

1 3 B1 A1 C2 C1

0 1 A1 B1 C2 C1

25 Quicksort: practical improvements

Insertion sort small subarrays. ・Even quicksort has too much overhead for tiny subarrays. ・Cutoff to insertion sort for ≈ 10 items.

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo + CUTOFF - 1) { Insertion.sort(a, lo, hi); return; } int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }

26 Quicksort: practical improvements

Median of sample. ・Best choice of pivot item = median. ・Estimate true median by taking median of sample. ・Median-of-3 (random) items.

~ 12/7 N ln N compares (14% fewer) ~ 12/35 N ln N exchanges (3% more)

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return;

int median = medianOf3(a, lo, lo + (hi - lo)/2, hi); swap(a, lo, median);

int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }

27 2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Selection

Goal. Given an array of N items, find the kth smallest item.

Ex. Min (k = 0), max (k = N - 1), median (k = N / 2).

Applications. ・Order statistics. ・Find the "top k."

Use theory as a guide. ・Easy N log N upper bound. How? ・Easy N upper bound for k = 1, 2, 3. How? ・Easy N lower bound. Why?

Which is true?

・N log N lower bound? is selection as hard as sorting? ・N upper bound? is there a linear-time algorithm?

29 Quick-select

Partition array so that: ・Entry a[j] is in place. ・No larger entry to the left of j. ・No smaller entry to the right of j.

Repeat in one subarray, depending on j; finished when j equals k.

public static Comparable select(Comparable[] a, intbefore k) v { lo if a[k] is here if a[k] is herehi StdRandom.shuffle(a); set hi to j-1 set lo to j+1 int lo = 0, hi = a.length - 1; during v Յ v Ն v while (hi > lo) { i j int j = partition(a, lo, hi); after Յ v v Ն v if (j < k) lo = j + 1; else if (j > k) hi = j - 1; lo j hi else return a[k]; Quicksort partitioning overview } return a[k]; }

30 Quick-select: mathematical analysis

Proposition. Quick-select takes linear time on average.

Pf sketch. ・Intuitively, each partitioning step splits array approximately in half: N + N / 2 + N / 4 + … + 1 ~ 2N compares. ・Formal analysis similar to quicksort analysis yields:

CN = 2 N + 2 k ln (N / k) + 2 (N – k) ln (N / (N – k)) ≤ (2 + 2 ln 2) N

・Ex: (2 + 2 ln 2) N ≈ 3.38 N compares to find median (k = N / 2).

31 Theoretical context for selection

Proposition. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] Compare-based whose worst-case running time is linear.

Time Bounds for Selection

bY . Manuel Blum, Robert W. Floyd, Vaughan Watt, Ronald L. Rive&, and Robert E. Tarjan

Abstract

L The number of comparisons required to select the i-th smallest of

n numbers is shown to be at most a linear function of n by analysis of i a new selection algorithm -- PICK. Specifically, no more than i L 5.4305 n comparisons are ever required. This bound is improved for extreme values of i , and a new lower bound on the requisite number L of comparisons is also proved. Remark. Constants are high ⇒ not used in practice.

Use theory as a guide.

L ・Still worthwhile to seek practical linear-time (worst-case) algorithm. ・Until one is discovered, use quick-select (if you don’t need a full sort).

32

This work was supported by the National Science Foundation under grants GJ-992 and GJ-33170X.

1 2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Duplicate keys

Often, purpose of sort is to bring items with equal keys together. ・Sort population by age. ・Remove duplicates from mailing list. ・Sort job applicants by college attended. sorted by time sorted by city (unstable) sorted by city (stable) Chicago 09:00:00 Chicago 09:25:52 Chicago 09:00:00 Typical characteristics of such applications.Phoenix 09:00:03 Chicago 09:03:13 Chicago 09:00:59 Houston 09:00:13 Chicago 09:21:05 Chicago 09:03:13 ・Huge array. Chicago 09:00:59 Chicago 09:19:46 Chicago 09:19:32 Houston 09:01:10 Chicago 09:19:32 Chicago 09:19:46 ・Small number of key values. Chicago 09:03:13 Chicago 09:00:00 Chicago 09:21:05 Seattle 09:10:11 Chicago 09:35:21 Chicago 09:25:52 Seattle 09:10:25 Chicago 09:00:59 Chicago 09:35:21 Phoenix 09:14:25 Houston 09:01:10 Houston 09:00:13 NOT Chicago 09:19:32 Houston 09:00:13 Houston 09:01:10 sorted Chicago 09:19:46 Phoenix 09:37:44 sorted Phoenix 09:00:03 Chicago 09:21:05 Phoenix 09:00:03 Phoenix 09:14:25 Seattle 09:22:43 Phoenix 09:14:25 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:10:25 Seattle 09:10:11 Chicago 09:25:52 Seattle 09:36:14 Seattle 09:10:25 Chicago 09:35:21 Seattle 09:22:43 Seattle 09:22:43 Seattle 09:36:14 Seattle 09:10:11 Seattle 09:22:54 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:36:14

Stability when sorting on a second key key

34 War story (system sort in C)

A beautiful bug report. [Allan Wilks and Rick Becker, 1991]

We found that qsort is unbearably slow on "organ-pipe" inputs like "01233210":

main (int argc, char**argv) { int n = atoi(argv[1]), i, x[100000]; for (i = 0; i < n; i++) x[i] = i; for ( ; i < 2*n; i++) x[i] = 2*n-i-1; qsort(x, 2*n, sizeof(int), intcmp); }

Here are the timings on our machine: $ time a.out 2000 real 5.85s $ time a.out 4000 real 21.64s $time a.out 8000 real 85.11s

35 War story (system sort in C)

Bug. A qsort() call that should have taken seconds was taking minutes.

Why is qsort() so slow?

At the time, almost all qsort() implementations based on those in: ・ (1979): quadratic time to sort organ-pipe arrays. ・BSD Unix (1983): quadratic time to sort random arrays of 0s and 1s.

36 Duplicate keys: stop on equal keys

Our partitioning subroutine stops both scans on equal keys.

scan until ≥ P scan until ≤ P

P G E P A Q B P Y C O U P Z S R

Q. Why not continue scans on equal keys?

scan until > P scan until < P

P G E P A Q B P Y C O U P Z S R

37 Quicksort quiz 2

What is the result of partitioning the following array (skip over equal keys)?

scan until > A scan until < A

A A A A A A A A A A A A A A A A

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A. A A A A A A A A A A A A A A A A

B. A A A A A A A A A A A A A A A A

C. A A A A A A A A A A A A A A A A

D. I don't know. 38 Quicksort quiz 3

What is the result of partitioning the following array (stop on equal keys)?

scan until ≥ A scan until ≤ A

A A A A A A A A A A A A A A A A

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A. A A A A A A A A A A A A A A A A

B. A A A A A A A A A A A A A A A A

C. A A A A A A A A A A A A A A A A

D. I don't know. 39 Partitioning an array with all equal keys

40 Duplicate keys: partitioning strategies

Bad. Don't stop scans on equal keys. [ ~ ½ N 2 compares when all keys equal ]

B A A B A B B B C C C A A A A A A A A A A A

Good. Stop scans on equal keys. [ ~ N lg N compares when all keys equal ]

B A A B A B C C B C B A A A A A A A A A A A

Better. Put all equal keys in place. How? [ ~ N compares when all keys equal ]

A A A B B B B B C C C A A A A A A A A A A A

41 DUTCH NATIONAL FLAG PROBLEM

Problem. [Edsger Dijkstra] Given an array of N buckets, each containing a red, white, or blue pebble, sort them by color.

input

sorted

Operations allowed. ・swap(i, j): swap the pebble in bucket i with the pebble in bucket j. ・color(i): color of pebble in bucket i.

Requirements. ・Exactly N calls to color(). ・At most N calls to swap(). ・Constant extra space.

42 3-way partitioning

Goal. Partition array into three parts so that: ・Entries between lt and gt equal to the partition item. ・No larger entries to left of lt. ・No smaller entries to right of gt. before v

lo hi before v during v

lo hi lt i gt during v after v

lt i gt lo lt gt hi after v 3-way partitioning lo lt gt hi

3-way partitioning Dutch national flag problem. [Edsger Dijkstra] ・Conventional wisdom until mid 1990s: not worth doing. ・Now incorporated into C library qsort() and Java 6 system sort.

43 Dijkstra 3-way partitioning demo

・Let v be partitioning item a[lo]. ・Scan i from left to right. – (a[i] < v): exchange a[lt] with a[i]; increment both lt and i – (a[i] > v): exchange a[gt] with a[i]; decrement gt – (a[i] == v): increment i

lt i gt

P A B X W P P V P D P C Y Z

lo hi

before v invariant lo hi

during v

lt i gt

after v 44

lo lt gt hi

3-way partitioning Dijkstra 3-way partitioning demo

・Let v be partitioning item a[lo]. ・Scan i from left to right. – (a[i] < v): exchange a[lt] with a[i]; increment both lt and i – (a[i] > v): exchange a[gt] with a[i]; decrement gt – (a[i] == v): increment i

lt gt

A B C D P P P P P V W Y Z X

lo hi

before v invariant lo hi

during v

lt i gt

after v 45

lo lt gt hi

3-way partitioning 3-way quicksort: Java implementation

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int lt = lo, gt = hi; Comparable v = a[lo]; int i = lo; while (i <= gt) { int cmp = a[i].compareTo(v); if (cmp < 0) exch(a, lt++, i++); else if (cmp > 0) exch(a, i, gt--); else i++; } before v

sort(a, lo, lt - 1); lo hi

sort(a, gt + 1, hi); during v } lt i gt after v

lo lt gt hi

3-way partitioning 46 3-way quicksort: visual trace

equal to partitioning element

Visual trace of quicksort with 3-way partitioning

47 Duplicate keys: lower bound

Sorting lower bound. If there are n distinct keys and the ith one occurs xi times, then any compare-based sorting algorithm must use at least n N! xi lg x lg N lg N when all distinct; x ! x ! x ! ⇤ i N 1 2 n i=1 linear when only a constant number of distinct keys ··· ⇥ ⇤ compares in the worst case.

Proposition. The expected number of compares to 3-way quicksort an array is entropy optimal (proportional to sorting lower bound).

Pf. [beyond scope of course]

Bottom line. Quicksort with 3-way partitioning reduces running time from linearithmic to linear in broad class of applications.

48 Sorting summary

inplace? stable? best average worst remarks

selection ✔ ½ N 2 ½ N 2 ½ N 2 N exchanges

use for small N insertion ✔ ✔ N ¼ N 2 ½ N 2 or partially ordered

tight code; shell ✔ N log N ? c N 3/2 3 subquadratic

N log N guarantee; merge ✔ ½ N lg N N lg N N lg N stable

improves mergesort timsort ✔ N N lg N N lg N when preexisting order

N log N probabilistic guarantee; quick ✔ N lg N 2 N ln N ½ N 2 fastest in practice

improves quicksort 3-way quick ✔ N 2 N ln N ½ N 2 when duplicate keys

? ✔ ✔ N N lg N N lg N holy sorting grail

49 2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Sorting applications

Sorting algorithms are essential in a broad variety of applications: ・Sort a list of names. Organize an MP3 library. ・ obvious applications ・Display Google PageRank results. ・List RSS feed in reverse chronological order.

・Find the median. ・Identify statistical outliers. problems become easy once ・Binary search in a database. items are in sorted order ・Find duplicates in a mailing list.

・Data compression. Computer graphics. ・ non-obvious applications ・Computational biology. ・Load balancing on a parallel computer. . . .

51 Engineering a system sort (in 1993)

Bentley-McIlroy quicksort. sample 9 items ・Cutoff to insertion sort for small subarrays. ・Partitioning item: median of 3 or Tukey's ninther. ・Partitioning scheme: Bentley-McIlroy 3-way partitioning.

similar to Dijkstra 3-way partitioning SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(11), 1249–1265(but fewer (NOVEMBER exchanges 1993) when not many equal keys)

Engineering a Sort Function

JON L. BENTLEY M. DOUGLAS McILROY AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, U.S.A.

SUMMARY We recount the history of a new qsort function for a C library. Our function is clearer, faster and more robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a novel solution to Dijkstra’s Dutch National Flag problem; and it swaps efficiently. Its behavior was assessed with timing and debugging testbeds, and with a program to certify performance. The design techniques apply in domains beyond sorting.

KEY WORDS Quicksort Sorting algorithms Performance tuning Algorithm design and implementation Testing

INTRODUCTION C libraries have long included a qsort function to sort an array, usually implemented by Very widelyHoare’s used. Quicksort. C,1 BecauseC++, existing Javaqsort 6,s are…. flawed, we built a new one. This paper summarizes its evolution. Compared to existing library sorts, our new qsort is faster—typically about twice as fast—clearer, and more robust under nonrandom inputs. It uses some standard Quicksort 52 tricks, abandons others, and introduces some new tricks of its own. Our approach to build- ing a qsort is relevant to engineering other algorithms. The qsort on our home system, based on Scowen’s ‘Quickersort’,2 had served faith- fully since Lee McMahon wrote it almost two decades ago. Shipped with the landmark Sev- enth Edition Unix System,3 it became a model for other qsorts. Yet in the summer of 1991 our colleagues Allan Wilks and Rick Becker found that a qsort run that should have taken a few minutes was chewing up hours of CPU time. Had they not interrupted it, it would have gone on for weeks.4 They found that it took n 2 comparisons to sort an ‘organ- pipe’ array of 2n integers: 123..nn.. 321. Shopping around for a better qsort, we found that a qsort written at Berkeley in 1983 would consume quadratic time on arrays that contain a few elements repeated many times—in particular arrays of random zeros and ones.5 In fact, among a dozen different Unix libraries we found no qsort that could not easily be driven to quadratic behavior; all were derived from the Seventh Edition or from the 1983 Berkeley function. The Seventh

0038-0644/93/111249–17$13.50 Received 21 August 1992  1993 by John Wiley & Sons, Ltd. Revised 10 May 1993 A beautiful mailing list post (Yaroslavskiy, September 2009)

Replacement of quicksort in java.util.Arrays with new dual-pivot quicksort

Hello All,

I'd like to share with you new Dual-Pivot Quicksort which is faster than the known implementations (theoretically and experimental). I'd like to propose to replace the JDK's Quicksort implementation by new one.

...

The new Dual-Pivot Quicksort uses *two* pivots elements in this manner:

1. Pick an elements P1, P2, called pivots from the array. 2. Assume that P1 <= P2, otherwise swap it. 3. Reorder the array into three parts: those less than the smaller pivot, those larger than the larger pivot, and in between are those elements between (or equal to) the two pivots. 4. Recursively sort the sub-arrays.

The invariant of the Dual-Pivot Quicksort is:

[ < P1 | P1 <= & <= P2 } > P2 ]

...

http://mail.openjdk.java.net/pipermail/core-libs-dev/2009-September/002630.html 53 A beautiful mailing list post (Yaroslavskiy-Bloch-Bentley, October 2009)

Replacement of quicksort in java.util.Arrays with new dual-pivot quicksort

Date: Thu, 29 Oct 2009 11:19:39 +0000 Subject: Replace quicksort in java.util.Arrays with dual-pivot implementation

Changeset: b05abb410c52 Author: alanb Date: 2009-10-29 11:18 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/b05abb410c52

6880672: Replace quicksort in java.util.Arrays with dual-pivot implementation Reviewed-by: jjb Contributed-by: vladimir.yaroslavskiy at sun.com, joshua.bloch at google.com, jbentley at avaya.com

! make/java/java/FILES_java.gmk ! src/share/classes/java/util/Arrays.java + src/share/classes/java/util/DualPivotQuicksort.java

http://mail.openjdk.java.net/pipermail/compiler-dev/2009-October.txt

54 Dual-pivot quicksort

Use two partitioning items p1 and p2 and partition into three subarrays:

・Keys less than p1. ・Keys between p1 and p2. ・Keys greater than p2.

< p1 p1 ≥ p1 and ≤ p2 p2 > p2

lo lt gt hi

Recursively sort three subarrays.

degenerates to Dijkstra's 3-way partitioning

Note. Skip middle subarray if p1 = p2.

55 Dual-pivot partitioning demo

Initialization. ・Choose a[lo] and a[hi] as partitioning items. ・Exchange if necessary to ensure a[lo] ≤ a[hi].

S E A Y R L F V Z Q T C M K

lo hi

exchange a[lo] and a[hi] Dual-pivot partitioning demo

Main loop. Repeat until i and gt pointers cross. ・If (a[i] < a[lo]), exchange a[i] with a[lt] and increment lt and i. ・Else if (a[i] > a[hi]), exchange a[i] with a[gt] and decrement gt. ・Else, increment i.

< p1 p1 ≥ p1 and ≤ p2 ? p2 > p2

lo lt i gt hi

K E A F R L M C Z Q T V Y S

lo lt i gt hi Dual-pivot partitioning demo

Finalize. ・Exchange a[lo] with a[--lt]. ・Exchange a[hi] with a[++gt].

< p1 p1 ≥ p1 and ≤ p2 p2 > p2

lo lt gt hi

C E A F K L M R Q S Z V Y T

lo lt gt hi

3-way partitioned Dual-pivot quicksort

Use two partitioning items p1 and p2 and partition into three subarrays:

・Keys less than p1. ・Keys between p1 and p2. ・Keys greater than p2.

< p1 p1 ≥ p1 and ≤ p2 p2 > p2

lo lt gt hi

Now widely used. Java 7, Python unstable sort, Android, …

59 Three-pivot quicksort

Use three partitioning items p1, p2, and p3 and partition into four subarrays:

・Keys less than p1. ・Keys between p1 and p2. ・Keys between p2 and p3. ・Keys greater than p3.

< p1 p1 ≥ p1 and ≤ p2 p2 ≥ p2 and ≤ p3 p3 > p3

lo a1 a2 a3 hi

Multi-Pivot Quicksort: Theory and Experiments

Shrinu Kushagra Alejandro L´opez-Ortiz J. Ian Munro [email protected] [email protected] [email protected] University of Waterloo University of Waterloo University of Waterloo Aurick Qiao [email protected] University of Waterloo November 7, 2013

Abstract In 2009, Vladimir Yaroslavskiy introduced a novel 60 The idea of multi-pivot quicksort has recently received dual-pivot partitioning algorithm. When run on a bat- the attention of researchers after Vladimir Yaroslavskiy tery of tests under the JVM, it outperformed the stan- proposed a dual pivot quicksort algorithm that, con- dard quicksort algorithm [10]. In the subsequent re- trary to prior intuition, outperforms standard quicksort lease of Java 7, the internal sorting algorithm was re- by a a significant margin under the Java JVM [10]. More placed by Yaroslavskiy’s variant. Three years later, recently, this algorithm has been analysed in terms of Wild and Nebel [9] published a rigorous average-case comparisons and swaps by Wild and Nebel [9]. Our con- analysis of the algorithm. They stated that the previ- tributions to the topic are as follows. First, we perform ous lower bound relied on assumptions that no longer the previous experiments using a native C implementa- hold in Yaroslavskiy’s implementation. The dual pivot tion thus removing potential extraneous e↵ects of the approach actually uses less comparisons (1.9n ln n vs JVM. Second, we provide analyses on cache behavior 2.0n ln n) on average. However, the di↵erence in run- of these algorithms. We then provide strong evidence time is much greater than the di↵erence in number of that cache behavior is causing most of the performance comparisons. We address this issue and provide an ex- planation in 5. di↵erences in these algorithms. Additionally, we build § upon prior work in multi-pivot quicksort and propose Aum¨uller and Dietzfelbinger [1] (ICALP2013) have a 3-pivot variant that performs very well in theory and recently addressed the following question: If the previ- practice. We show that it makes fewer comparisons and ous lower bound does not hold, what is really the best has better cache behavior than the dual pivot quicksort we can do with two pivots? They prove a 1.8n ln n lower in the expected case. We validate this with experimen- bound on the number of comparisons for all dual-pivot tal results, showing a 7-8% performance improvement quicksort algorithms and introduced an algorithm that in our tests. actually achieves that bound. In their experimentation, the algorithm is outperformed by Yaroslavskiy’s quick- 1 Introduction sort when sorting integer data. However, their algo- rithm does perform better with large data (eg. strings) 1.1 Background Up until about a decade ago it was since comparisons incur high cost. thought that the classic quicksort algorithm [3] using one pivot is superior to any multi-pivot scheme. It was 1.2 The Processor-Memory Performance Gap previously believed that using more pivots introduces Both presently and historically, the performance of CPU too much overhead for the advantages gained. In 2002, registers have far outpaced that of main memory. Ad- Sedgewick and Bentley [7] recognised and outlined some

Downloaded 02/07/14 to 96.248.80.75. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php ditionally, this performance gap between the processor of the advantages to a dual-pivot quicksort. However, and memory has been increasing since their introduc- the implementation did not perform as well as the classic tion. Every year, the performance of memory improves quicksort algorithm [9] and this path was not explored by about 10% while that of the processor improves by again until recent years. 60% [5]. The performance di↵erence grows so quickly

47 Copyright © 2014. by the Society for Industrial and Applied Mathematics. Quicksort quiz 4

Why do 2-pivot (and 3-pivot) quicksort perform better than 1-pivot?

A. Fewer compares.

B. Fewer exchanges.

C. Fewer cache misses.

D. I don't know.

61 System sort in Java 7

Arrays.sort(). ・Has one method for objects that are Comparable. ・Has an overloaded method for each primitive type. ・Has an overloaded method for use with a Comparator. ・Has overloaded methods for sorting subarrays.

Algorithms. ・Dual-pivot quicksort for primitive types. ・Timsort for reference types.

Q. Why use different algorithms for primitive and reference types?

Bottom line. Use the system sort!

62