ROBERT SEDGEWICK | KEVIN WAYNE

2.3

‣ quicksort ‣ selection Algorithms ‣ duplicate keys FOURTH EDITION ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Two classic sorting algorithms: mergesort and quicksort

Critical components in the world’s computational infrastructure. ・Full scientific understanding of their properties has enabled us to develop them into practical system sorts. ・Quicksort honored as one of top 10 algorithms of 20th century in science and engineering.

Mergesort. [last lecture]

...

Quicksort. [this lecture]

...

2 2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Quicksort

Basic plan. ・Shuffle the array. ・Partition so that, for some j – entry a[j] is in place – no larger entry to the left of j – no smaller entry to the right of j ・ each subarray recursively.

input Q U I C K S O R T E X A M P L E shufe K R A T E L E P U I M Q C X O S partitioning item

partition E C A I E K L P U T M Q R X O S not greater not less sort left A C E E I K L P U T M Q R X O S sort right A C E E I K L M O P Q R S T U X result A C E E I K L M O P Q R S T U X Quicksort overview 4 number). 9.9 X 10 45 is used to represent infinity. Imaginary conunent I and J are output variables, and A is the array (with values of x may not be negative and reM values of x may not be subscript bounds M:N) which is operated upon by this procedure. smaller than 1. Partition takes the value X of a random element of the array A, Values of Qd~'(x) may be calculated easily by hypergeometric and rearranges the values of the elements of the array in such a seriesTony if x is not tooHoare small nor (n - m) too large. Q~m(x) can be way that there exist integers I and J with the following properties : computed from an appropriate set of values of Pnm(X) if X is near M _-< J < I =< NprovidedM < N 1.0 or ix is near 0. Loss of significant digits occurs for x as small as A[R] =< XforM =< R _-< J 1.1 if n is larger than 10. Loss of significant digits is a major diffi- A[R] = XforJ < R < I culty in using finite polynomiM representations also if n is larger A[R] ~ Xfor I =< R ~ N than m. However, QLEG has been tested in regions of x and n The procedure uses an integer procedure random (M,N) which both・ large andInvented small; quicksort to translatechooses equiprobably Russian a random integer into F between English. M and N, and procedure QLEG(m, nmax, x, ri, R, Q); value In, nmax, x, ri; also a procedure exchange, which exchanges the values of its two real In, mnax, x, ri; real array R, Q; parameters ; begin・ real [t, buti, n, q0, s;couldn't explain his beginalgorithm real X; integer or F; implement it! ] n := 20; F := random (M,N); X := A[F]; if nmax > 13 then I:=M; J:=N; n := nmax +7; up: for I : = I step 1 until N do ・if riLearned = 0then Algol 60 (and recursion).if X < A [I] then go to down; begin ifm = 0then I:=N; Q[0] := 0.5 X 10g((x + 1)/(x - 1)) down: forJ := J step --1 until M do ・Implementedelse quicksort. if A[J]= 0 then q0 := Q[0]; I:=I+l 3:begin 1980 Turing Award Q[0] := s end end; end e:= aXe;f := bXd;goto8 if x = 1 then else if F < J tllen begin exchange (A[F], A[J]) ; end 3 ; Q[0] := 9.9 I" 45; J:=J-1 e:=bXc; R[n + 1] := x - sqrt(x X x - 1); end ; ifd ~ 0then for i := n step --1 until 1 do end partition 4: begin R[i] := (i + m)/((i + i + 1) X x f:=bXd; goto8 +(m-i- 1) X R[i+l]); end 4; 61 go to the end; f:=aXd; goto8 if m = 0 then ALGORITHMPROCEDURES 64FOR RANGE ARITHMETIC 5: end 2; begin if x < 0.5 tben QUICKSORTALLAN GIBB* ifb > 0 then Q[0] := arctan(x) - 1.5707963 else UniversityC. A. R. HOAREof Alberta, Calgary, Alberta, Canada 6: begin Q[0] := - aretan(1/x)end else Elliottbegin Brothers Ltd., Borehamwood, Hertfordshire, Eng. if d > 0 then begin t := 1/sqrt(x X x + 1); begin procedure RANGESUM (a, b, c, d, e, f); q0 := 0; procedure quicksort (A,M,N); value M,N; e := MIN(a X d, b X c); real a,b,c,d,e,f; q[0] := t; array A; integer M,N; f:= MAX(a X c,b X d); go to8 comment The term "range number" was used by P. S. Dwyer, for i : = 2 step 1 until m do comment Quicksort is a very fast and convenient method of end 6; Linear Computations (Wiley, 1951). Machine procedures for begin s := (x + x) X (i -- 1) X t X Q[0I sorting an array in the random-access store of a computer. The e:= bX c; f := aX c; go to8 range arithmetic were developed about 1958 by Ramon Moore, +(3i+iX i -- 2) × q0; entire contents of the store may be sorted, since no extra space is end 5; "Automatic Error Analysis in Digital Computation," LMSD qO := Q[0]; required. The average number of comparisons made is 2(M--N) In f:=aXc; Report 48421, 28 Jan. 1959, Lockheed Missiles and Space Divi- Q[0] := s end end; (N--M), and the average nmnber of exchanges is one sixth this if d _-< O then sion, Palo Alto, California, 59 pp. If a _< x -< b and c ~ y ~ d, R[n + 1] := x - sqrt(x × x + 1); amount. Suitable refinements of this method will be desirable for 7: begin then RANGESUM yields an interval [e, f] such that e =< (x + y) for i := n step - 1 until 1 do its implementation on any actual computer; e:=bXd; goto8 f. Because of machine operation (truncation or rounding) the R[i] := (i + m)/((i -- m + 1) × R[i + 1] begin integer 1,J ; end 7 ; machine sums a -4- c and b -4- d may not provide safe end-points --(i+i+ 1) X x); if M < N then begin partition (A,M,N,I,J); e:=aXd; of the output interval. Thus RANGESUM requires a non-local for i : = 1 step 2 until nmax do quicksort (A,M,J) ; 8: e := ADJUSTPROD (e, -1); real procedure ADJUSTSUM which will compensate for the Rill := -- Rill; quicksort (A, I, N) f := ADJUSTPROD (f, 1) machine arithmetic. The body of ADJUSTSUM will be de- the: for i : = 1 step 1 until nnmx do end end RANGEMPY; pendent upon the type of machine for which it is written and so Q[i] := Q[i - 1] X R[i] end quieksort procedure RANGEDVD (a, b, c, d, e, f) ; is not given here. (An example, however, appears below.) It end QLEG; real a, b, c, d, e, f; is assumed that ADJUSTSUM has as parameters real v and w, comment If the range divisor includes zero the program * This procedure was developed in part under the sponsorship and integer i, and is accompanied by a non-local real procedure exists to a non-local label "zerodvsr". RANGEDVD assumes a of the Air Force Cambridge Research Center. CORRECTION which gives an upper bound to the magnitude CommunicationsALGORITHM 65 of the ACM (July 1961) non-local real procedure ADJUSTQUOT which is analogous of the error involved in the machine representation of a number. FIND (possibly identical) to ADJUSTPROD; The output ADJUSTSUM provides the left end-point of the begin C.output A. R. interval HOARE of RANGESUM when ADJUSTSUM is called if c =< 0 A d ~ 0 then go to zer0dvsr; ALGORITHM 63 Elliottwith i Brothers= --1, and Ltd., the right Borehamwood, end-point when Hertfordshire, called with i Eng.= 1 if c < 0 then PARTITION The procedures RANGESUB, RANGEMPY, and RANGEDVD 5 procedure find (A,M,N,K); value M,N,K; C. A. R. HOARE provide for the remaining fundamental operations in range 1: begin array A; integer M,N,K; ifb > 0 then Elliott Brothers Ltd., Borehamwood, Hertfordshire, Eng. arithmetic. RANGESQR gives an interval within which the comment Find will assign to A [K] the value which it would 2: begin square of a range nmnber must lie. RNGSUMC, RNGSUBC, procedure partition (A,M,N,I,J); value M,N; have if the array A [M:N] had been sorted. The array A will be e := b/d; go to3 RNGMPYC and RNGDVDC provide for range arithmetic with array A; integer M,N,1,J; partly sorted, and subsequent entries will be faster than the first; end 2; complex range arguments, i.e. the real and imaginary parts are range numbers~ e := b/c; begin Communications of the ACM 321 3: ifa -->_ 0then e := ADJUSTSUM (a, c, -1); 4: begin f := a/c; go to 8 f := ADJUSTSUM (b, d, 1) end RANGESUM; end 4; f := a/d; go to 8 procedure RANGESUB (a, b, c, d, e, f) ; real a, b,c, d,e, f; end 1 ; if a < 0 then comment RANGESUM is a non-local procedure; begin 5: begin RANGESUM (a, b, -d, --c, e, f) e := a/c; go to6 end RANGESUB ; end 5 ; procedure RANGEMPY (a, b, c, d, e, f); e := a/d; real a, b, c, d, e, f; 6: ifb > 0then comment ADJUSTPROD, which appears at the end of this 7: begin f := b/c; go to8 procedure, is analogous to ADJUSTSUM above and is a non- local real procedure. MAX and MIN find the maximum and end 7 ; minimum of a set of real numbers and are non-local; f := b/d; begin 8: e := ADJUSTQUOT (e, -1); f := ADJUSTQUOT (f,1) real v, w; end RANGEDVD ; if a < 0A c => 0then procedure RANGESQR (a, b, e, f); 1: begin real a, b, e, f; v:=c; c:=a; a:=v; w:=d; d:=b; b:=w comment ADJUSTPROD is a non-10cal procedure; end 1; begin if a => O then if a < 0 then

Communications of the &CM 319 Bob Sedgewick

・Refined and popularized quicksort. ・Analyzed quicksort.

rithms, and we can learn from that experience to separate Bob Sedgewick good algorithms from bad ones. Third, if the tile fits into the memory of the computer, there is one algorithm, called Quicksort, which has been shown to perform well Programming S. L. Graham, R. L. Rivest in a variety of situations. Not only is this algorithm Techniques Editors simpler than many other sorting algorithms, but empir- ical [2, ll, 13, 21] and analytic [9] studies show that Quicksort can be expected to be up to twice as fast as its Implementing nearest competitors. The method is simple enough to be learned by programmersActa Informatica who have no7, 327--355previous experi-(1977) Quicksort Programs ence with sorting, and9 by those Springer-Verlag who do know 1977 other sorting Robert Sedgewick methods should also find it profitable to learn about Brown University Quicksort. Because of its prominence, it is appropriate to study how Quicksort might be improved.The This Analysis subject has of Quicksort Programs* This paper is a practical study of how to implement received considerable attention (see, for example, [1, 4, the Quicksort and its best variants on 11, 13, 14, 18, 20]), but few real improvements have beenRobert Sedgewick suggested beyond those described by C.A.R. Hoare, the real computers, including how to apply various code Received January 19, t976 optimization techniques. A detailed implementation inventor of Quicksort, in his original papers [5, 6]. Hoare also showed how to analyze Quicksort and predict its combining the most effective improvements to Summary. The Quicksort sorting algorithm and its best variants are presented Quicksort is given, along with a discussion of how to running time. The analysis has since been extended to the improvements thatand heanalyzed. suggested, Results and used are derivedto indicate which make it possible to obtain exact formulas de- implement it in assembly language. Analytic results scribing the total expected running time of particular implementations on real com- describing the performance of the programs are how they may bestputers be implementedof Quick,sort [9, and 15, an 17]. improvement The called the -of-three modification. summarized. A variety of special situations are subject of the carefulDetailed implementation analysis of ofthe Quicksort effect of hasan implementation technique called loop unwrapping considered from a practical standpoint to illustrate not been studied asis presented.widely as globalThe paper improvements is intended tonot only to present results of direct practical utility, Quicksort's wide applicability as an internal sorting the algorithm, butbut the also savings to illustrate to be therealized intriguing are as mathematics which arises in the complete analysis of this important algorithm. method which requires negligible extra storage. significant. The history of Quicksort is quite complex, Key Words and Phrases: Quicksort, analysis of and [15] contains a full survey of the many variants algorithms, code optimization, sorting which, have been proposed. 1. Introduction CR Categories: 4.0, 4.6, 5.25, 5.31, 5.5 The purpose of this paper is to describe in detail how Quicksort can best beIn implemented t96t-62 C.A.R. to handle Hoare actual presented a new algorithm called Quicksort [7, 8] applications on realwhich computers. is suitable A general for puttingdescription files of into order by computer. This method combines the algorithm is followedelegance by and descriptions efficiency, of andthe mostit remains today the most useful general-purpose effective improvementssorting that method have forbeen computers. proposed The(as practical utility of the algorithm has meant demonstrated in [15]).not only Next, that an it implementationhas been sfibjected of to countless modifications (though few real 6 Quicksort in a typicalimprovements high level language have been is presented, suggested beyond those described by Hoare), but also Introduction and assembly languagethat implementationit has been used issues .in arecountless con- applications, often to sort very large, files. sidered. This discussionConsequently, should easily it is translateimportant to toreal be able to estimate how long an implementation One of the most widely studied practical problems in languages on real machines.of Quicksort Finally, can a benumber expected of special to run, in order to be able to compare variants or computer science is sorting: the use of a computer to put issues are consideredestimate which expenses.may be of Fortunately, importance inas we shall see, this is an algorithm which can be files in order. A person wishing to use a computer to sort particular sorting applications.analyzed. (Hoare recognized this, and gave some analytic results in [8].) It is is faced with the problem of determining which of the This paper is intendedpossible to beto a deriveself-contained exact overviewformulas describing the average performance of real many available algorithms is best suited for his purpose. of the properties ofimplementations Quicksort for use byof thosethe algorithm. who need This task is becoming less difficult than it once was for to actually implement andThe use history the algorithm. of Quicksort A compan- is quite complex, and a full survey of the many variants three reasons. First, sorting is an area in which the ion paper [17] provideswhich the have analytical been proposed results which is given su- in [t 7]. In addition, [t 7] gives analytic results mathematical analysis of algorithms has been particu- port much of the discussion presented here. describing many of the improvements which have been suggested for the purpose larly successful: we can predict the performance of many sorting methods and compare them intelligently. Second, of determining which are the most effective. There are many examples in [~ 7] we have a great deal of experience using sorting algo- The Algofithm which illustrate that the simplicity of Quicksort is deceiving. The algorithm has hidden subtleties which can have significant effects on performance. Furthermore, Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct Quicksort is a recursiveas we shall method see, simplefor sorting changes an array to the algorithm or its implementation can radically commercial advantage, the ACM copyright notice and the title of the A[1], A[2] ..... A[N]change by first the "partitioning" analysis. itIn so this that paper,the we shall consider in detail how practical publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy following conditionsimplementations hold: of the best versions of Quicksort may be analyzed. otherwise, or to republish, requires a fee and/or specific permission. In this paper, we will deal with the analysis of: (i) the basic Quicksort algo- (i) Some key v is in its final position in the array. (If it This work was supported in part by the Fannie and John Hertz rithm; (ii) an improvement called the "median-of-three" modification which Foundation and in part by NSF Grants. No. GJ-28074 and MCS75- is thejth smallest, it is in position A[j].) 23738. reduces the average number of comparisons required; and (iii) an implementation (ii) All elements to the left of A[j] are less than or equal Author's address: Division of Applied Mathematics and Computer technique called "loop unwrapping" which reduces the amount of overhead per Science Program, Brown University, Providence, RI 02912. to it. (These elements A [ 1], A [2] ..... A [j - 1] are © 1978 ACM 0001-0782/78/1000-0847 $00.75 called the "leftcomparison. subtile.") These particular methods not only represent the most effective vari- * This work was supported in part by the Fannie and John Hertz Foundation, and 847 Communications October 1978 of in part byVolume the National21 Science Foundation Grants No. GJ-28074 and MCS75-23738. the ACM 22 Acta Informatica,Number 10Vol. 7 Quicksort partitioning demo

Repeat until i and j pointers cross. ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Exchange a[i] with a[j].

K R A T E L E P U I M Q C X O S

lo i j

7 Quicksort partitioning demo

Repeat until i and j pointers cross. ・Scan i from left to right so long as (a[i] < a[lo]). ・Scan j from right to left so long as (a[j] > a[lo]). ・Exchange a[i] with a[j].

When pointers cross. ・Exchange a[lo] with a[j].

E C A I E K L P U T M Q R X O S

lo j hi

partitioned! 8 Quicksort: Java code for partitioning

private static int partition(Comparable[] a, int lo, int hi) { int i = lo, j = hi+1; while (true) { while (less(a[++i], a[lo])) find item on left to swap if (i == hi) break;

while (less(a[lo], a[--j])) find item on right to swap if (j == lo) break;

if (i >= j) break; check if pointers cross exch(a, i, j); swap } exch(a, lo, j); before v

return j; swap with partitioninglo item hi } before v return index of item now knownduring to be vin placeՅ v Ն v

lo hi i j after before v during v Յ v Ն v Յ v v Ն v

lo hi i j lo j hi 9 during v Յ v Ն v after Յ v v Ն v Quicksort partitioning overview

i j lo j hi

after Յ v v Ն v Quicksort partitioning overview

lo j hi

Quicksort partitioning overview Quicksort: Java implementation

public class Quick { private static int partition(Comparable[] a, int lo, int hi) { /* see previous slide */ }

public static void sort(Comparable[] a) { shuffle needed for StdRandom.shuffle(a); performance guarantee sort(a, 0, a.length - 1); (stay tuned) }

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); } }

10 Quicksort trace

lo j hi 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 initial values Q U I C K S O R T E X A M P L E random shufe K R A T E L E P U I M Q C X O S 0 5 15 E C A I E K L P U T M Q R X O S 0 3 4 E C A E I K L P U T M Q R X O S 0 2 2 A C E E I K L P U T M Q R X O S 0 0 1 A C E E I K L P U T M Q R X O S 1 1 A C E E I K L P U T M Q R X O S 4 4 A C E E I K L P U T M Q R X O S 6 6 15 A C E E I K L P U T M Q R X O S no partition 7 9 15 A C E E I K L M O P T Q R X U S for subarrays of size 1 7 7 8 A C E E I K L M O P T Q R X U S 8 8 A C E E I K L M O P T Q R X U S 10 13 15 A C E E I K L M O P S Q R T U X 10 12 12 A C E E I K L M O P R Q S T U X 10 11 11 A C E E I K L M O P Q R S T U X 10 10 A C E E I K L M O P Q R S T U X 14 14 15 A C E E I K L M O P Q R S T U X 15 15 A C E E I K L M O P Q R S T U X

result A C E E I K L M O P Q R S T U X

Quicksort trace (array contents after each partition)

11 Quicksort animation

50 random items

algorithm position in order current subarray not in order http://www.sorting-algorithms.com/quick-sort

12 Quicksort: implementation details

Partitioning in-place. Using an extra array makes partitioning easier (and stable), but is not worth the cost.

Terminating the loop. Testing whether the pointers cross is trickier than it might seem.

Equal keys. When duplicates are present, it is (counter-intuitively) better to stop scans on keys equal to the partitioning item's key.

Preserving randomness. Shuffling is needed for performance guarantee. Equivalent alternative. Pick a random partitioning item in each subarray.

13 Quicksort: empirical analysis (1961)

Running time estimates: ・Algol 60 implementation. ・National-Elliott 405 computer.

sorting N 6-word items with 1-word keys Elliott 405 magnetic disc (16K words)

14 Quicksort: empirical analysis

Running time estimates: ・Home PC executes 108 compares/second. ・Supercomputer executes 1012 compares/second.

(N2) mergesort (N log N) quicksort (N log N)

computer thousand million billion thousand million billion thousand million billion

home instant 2.8 hours 317 years instant 1 second 18 min instant 0.6 sec 12 min

super instant 1 second 1 week instant instant instant instant instant instant

Lesson 1. Good algorithms are better than supercomputers. Lesson 2. Great algorithms are better than good ones.

15 Quicksort: best-case analysis

Best case. Number of compares is ~ N lg N.

initial values

random shuffle

16 Quicksort: worst-case analysis

Worst case. Number of compares is ~ ½ N 2 .

initial values

random shuffle

17 Quicksort: average-case analysis

Proposition. The average number of compares CN to quicksort an array of

N distinct keys is ~ 2N ln N (and the number of exchanges is ~ ⅓ N ln N ).

Pf. CN satisfies the recurrence C0 = C1 = 0 and for N ≥ 2:

left right partitioning

C0 + CN 1 C1 + CN 2 CN 1 + C0 C =(N + 1) + + + ... + N N N N ・Multiply both sides by N and collect terms: partitioning probability

NCN = N(N + 1) + 2(C0 + C1 + ... + CN 1) ・Subtract from this equation the same equation for N - 1:

NCN (N 1)CN 1 =2N +2CN 1 ・Rearrange terms and divide by N (N + 1):

CN CN 1 2 = + N +1 N N +1

18 Quicksort: average-case analysis

・Repeatedly apply above equation: CN CN 1 2 = + N +1 N N +1 CN 2 2 2 = + + substitute previous equation N 1 N N +1 previous equation CN 3 2 2 2 = + + + N 2 N 1 N N +1 2 2 2 2 = + + + ... + 3 4 5 N +1

・Approximate sum by an integral: 1 1 1 1 C = 2(N + 1) + + + ... N 3 4 5 N +1 ✓ ◆ N+1 1 2(N + 1) dx ⇠ x Z3

・Finally, the desired result: C 2(N + 1) ln N 1.39N lg N N ⇥ 19 Quicksort: summary of performance characteristics

Quicksort is a (Las Vegas) randomized algorithm. ・Guaranteed to be correct. ・Running time depends on random shuffle.

Average case. Expected number of compares is ~ 1.39 N lg N. ・39% more compares than mergesort. ・Faster than mergesort in practice because of less data movement.

Best case. Number of compares is ~ N lg N. Worst case. Number of compares is ~ ½ N 2. [ but more likely that lightning bolt strikes computer during execution ]

20 Quicksort properties

Proposition. Quicksort is an in-place sorting algorithm. Pf. ・Partitioning: constant extra space. ・Depth of recursion: logarithmic extra space (with high probability).

can guarantee logarithmic depth by recurring on smaller subarray before larger subarray (requires using an explicit stack)

Proposition. Quicksort is not stable. Pf. [ by counterexample ]

i j 0 1 2 3

B1 C1 C2 A1

1 3 B1 C1 C2 A1

1 3 B1 A1 C2 C1

0 1 A1 B1 C2 C1

21 Quicksort: practical improvements

Insertion sort small subarrays. ・Even quicksort has too much overhead for tiny subarrays. ・Cutoff to insertion sort for ≈ 10 items.

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo + CUTOFF - 1) { Insertion.sort(a, lo, hi); return; } int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }

22 Quicksort: practical improvements

Median of sample. ・Best choice of pivot item = median. ・Estimate true median by taking median of sample. ・Median-of-3 (random) items.

~ 12/7 N ln N compares (14% less) ~ 12/35 N ln N exchanges (3% more)

private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return;

int median = medianOf3(a, lo, lo + (hi - lo)/2, hi); swap(a, lo, median);

int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }

23 2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Selection

Goal. Given an array of N items, find the kth smallest item.

Ex. Min (k = 0), max (k = N - 1), median (k = N / 2).

Applications. ・Order statistics. ・Find the "top k."

Use theory as a guide. ・Easy N log N upper bound. How? ・Easy N upper bound for k = 1, 2, 3. How? ・Easy N lower bound. Why?

Which is true?

・N log N lower bound? is selection as hard as sorting?

・N upper bound? is there a linear-time algorithm?

25 Quick-select

Partition array so that: ・Entry a[j] is in place. ・No larger entry to the left of j. ・No smaller entry to the right of j.

Repeat in one subarray, depending on j; finished when j equals k.

public static Comparable select(Comparable[] a, int k) before v

{ lo if a[k] is here if a[k] is herehi StdRandom.shuffle(a); set hi to j-1 set lo to j+1 during v Յ v Ն v int lo = 0, hi = a.length - 1; while (hi > lo) i j { after Յ v v Ն v int j = partition(a, lo, hi); lo j hi if (j < k) lo = j + 1; Quicksort partitioning overview else if (j > k) hi = j - 1; else return a[k]; } return a[k]; 26 } Quick-select: mathematical analysis

Proposition. Quick-select takes linear time on average.

Pf sketch. ・Intuitively, each partitioning step splits array approximately in half:

N + N / 2 + N / 4 + … + 1 ~ 2N compares. ・Formal analysis similar to quicksort analysis yields:

CN = 2 N + 2 k ln (N / k) + 2 (N – k) ln (N / (N – k))

・Ex: (2 + 2 ln 2) N ≈ 3.38 N compares to find median.

27 Theoretical context for selection

Proposition. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] Compare-based selection algorithm whose worst-case running time is linear.

Time Bounds for Selection

bY . , Robert W. Floyd, Vaughan Watt, Ronald L. Rive&, and Robert E. Tarjan

Abstract

L The number of comparisons required to select the i-th smallest of n numbers is shown to be at most a linear function of n by analysis of i a new selection algorithm -- PICK. Specifically, no more than i L 5.4305 n comparisons are ever required. This bound is improved for extreme values of i , and a new lower bound on the requisite number L of comparisons is also proved. Remark. Constants are high ⇒ not used in practice.

Use theory as a guide.

L ・Still worthwhile to seek practical linear-time (worst-case) algorithm. ・Until one is discovered, use quick-select if you don’t need a full sort.

28

This work was supported by the National Science Foundation under grants GJ-992 and GJ-33170X.

1 2.3 QUICKSORT

‣ quicksort ‣ selection Algorithms ‣ duplicate keys ‣ system sorts

ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Duplicate keys

Often, purpose of sort is to bring items with equal keys together. ・Sort population by age. ・Remove duplicates from mailing list. ・Sort job applicants by college attended. sorted by time sorted by city (unstable) sorted by city (stable) Chicago 09:00:00 Chicago 09:25:52 Chicago 09:00:00 Typical characteristics of such applications. Phoenix 09:00:03 Chicago 09:03:13 Chicago 09:00:59 Houston 09:00:13 Chicago 09:21:05 Chicago 09:03:13 ・Huge array. Chicago 09:00:59 Chicago 09:19:46 Chicago 09:19:32 Houston 09:01:10 Chicago 09:19:32 Chicago 09:19:46 ・Small number of key values. Chicago 09:03:13 Chicago 09:00:00 Chicago 09:21:05 Seattle 09:10:11 Chicago 09:35:21 Chicago 09:25:52 Seattle 09:10:25 Chicago 09:00:59 Chicago 09:35:21 Phoenix 09:14:25 Houston 09:01:10 Houston 09:00:13 NOT Chicago 09:19:32 Houston 09:00:13 Houston 09:01:10 sorted Chicago 09:19:46 Phoenix 09:37:44 sorted Phoenix 09:00:03 Chicago 09:21:05 Phoenix 09:00:03 Phoenix 09:14:25 Seattle 09:22:43 Phoenix 09:14:25 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:10:25 Seattle 09:10:11 Chicago 09:25:52 Seattle 09:36:14 Seattle 09:10:25 Chicago 09:35:21 Seattle 09:22:43 Seattle 09:22:43 Seattle 09:36:14 Seattle 09:10:11 Seattle 09:22:54 Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:36:14

Stability when sorting on a second key key

30 Duplicate keys

Quicksort with duplicate keys. Algorithm can go quadratic unless partitioning stops on equal keys!

S T O P O N E Q U A L K E Y S

swap if we don't stop if we stop on on equal keys equal keys

Caveat emptor. Some textbook (and commercial) implementations go quadratic when many duplicate keys.

31 What is the result of partitioning the following array?

A A A A A A A A A A A A A A A A

A. A A A A A A A A A A A A A A A A

B. A A A A A A A A A A A A A A A A

C. A A A A A A A A A A A A A A A A

32 Partitioning an array with all equal keys

33 Duplicate keys: the problem

Recommended. Stop scans on items equal to the partitioning item. Consequence. ~ N lg N compares when all keys equal.

B A A B A B C C B C B A A A A A A A A A A A

Mistake. Don't stop scans on items equal to the partitioning item. Consequence. ~ ½ N 2 compares when all keys equal.

B A A B A B B B C C C A A A A A A A A A A A

Desirable. Put all items equal to the partitioning item in place.

A A A B B B B B C C C A A A A A A A A A A A

34 3-way partitioning

Goal. Partition array into three parts so that: ・Entries between lt and gt equal to the partition item. ・No larger entries to left of lt. ・No smaller entries to right of gt. before v

lo hi before v during v

lo hi lt i gt during v after v lt i gt lo lt gt hi after v 3-way partitioning lo lt gt hi

3-way partitioning Dutch national flag problem. [Edsger Dijkstra] ・Conventional wisdom until mid 1990s: not worth doing. ・Now incorporated into C library qsort() and Java 6 system sort.

35 Sorting summary

inplace? stable? best average worst remarks

selection ✔ ½ N 2 ½ N 2 ½ N 2 N exchanges

use for small N insertion ✔ ✔ N ¼ N 2 ½ N 2 or partially ordered tight code; shell ✔ N log3 N ? c N 3/2 subquadratic N log N guarantee; merge ✔ ½ N lg N N lg N N lg N stable improves mergesort ✔ N N lg N N lg N when preexisting order N log N probabilistic guarantee; quick ✔ N lg N 2 N ln N ½ N 2 fastest in practice improves quicksort 3-way quick ✔ N 2 N ln N ½ N 2 when duplicate keys

? ✔ ✔ N N lg N N lg N holy sorting grail

36