The Number of Comparisons Used by : An Average-case Analysis

James Allen Fill † Svante Janson ‡

Abstract bers) from the interval (0, 1). Many authors (Knuth [7], The analyses of many and data structures (such Regnier [10], Rosler [11], Knessl and Szpankowski [6], as digital search trees) for searching and sorting are based Fill and Janson [3] [4], Neininger and Ruschendor [9], on the representation of the keys involved as bit strings and and others) have studied Kn, the (random) number of so count the number of bit comparisons. On the other hand, key comparisons performed by the . This is the standard analyses of many other algorithms (such as a natural measure of the cost (run-time) of the algo- Quicksort) are performed in terms of the number of key rithm, if each comparison has the same cost. On the comparisons. We introduce the prospect of a fair comparison other hand, if comparisons are done by scanning the bit between algorithms of the two types by providing an average- representations of the numbers, comparing their one case analysis of the number of bit comparisons required by one, then the cost of comparing two keys is deter- by Quicksort. Counting bit comparisons rather than key mined by the number of bits compared until a dierence comparisons introduces an extra logarithmic factor to the is found. We call this number the number of bit com- asymptotic average total. We also provide a new algorithm, parisons for the key comparison, and let Bn denote the “BitsQuick”, that reduces this factor to constant order by total number of bit comparisons when n keys are sorted eliminating needless bit comparisons. by Quicksort. We assume that the keys X1, . . . , Xn to be sorted 1 Introduction and summary are independent random variables with a common con- tinuous distribution F over (0, 1). It is well known that Algorithms for sorting and searching (together with the distribution of the number K of key comparisons their accompanying analyses) generally fall into one n does not depend on F . This invariance clearly fails to of two categories: either the algorithm is regarded as extend to the number B of bit comparisons, and so we comparing items pairwise irrespective of their internal n need to specify F . structure (and so the analysis focuses on the number For simplicity, we study mainly the case that F is of comparisons), or else it is recognized that the items the uniform distribution, and, throughout, the reader (typically numbers) are represented as bit strings and should assume this as the default. But we also give that the algorithm operates on the individual bits. a result valid for a general absolutely continuous dis- Typical examples of the two types are Quicksort and tribution F over (0, 1) (subject to a mild integrability digital search trees, respectively; see [7]. condition on the density). In this extended abstract we take a rst step to- In this extended abstract we focus on the mean wards bridging the gap between the two points of view, of B . One of our main results is the following The- in order to facilitate run-time comparisons across the n orem 1.1, the concise version of which is the asymptotic gap, by answering the following question posed many equivalence years ago by Bob Sedgewick [personal communication]: What is the bit complexity of Quicksort? E Bn n(ln n)(lg n) as n . More precisely, we consider Quicksort (see Sec- → ∞ tion 2 for a review) applied to n distinct keys (num- Throughout, we use ln (respectively, lg) to denote natu- ral (resp., binary) , and use log when the base Department of Mathematical Sciences, The Johns Hopkins doesn’t matter (for example, in remainder estimates). . University, 34th and Charles Streets, Baltimore, MD 21218-2682 The symbol = is used to denote approximate equality, . USA. [email protected] and = 0.57722 is Euler’s constant. †Research supported by NSF grant DMS-0104167 and by the Acheson J. Duncan Fund for the Advancement of Research in Theorem 1.1. If the keys X , . . . , X are independent Statistics. 1 n and uniformly distributed on (0, 1), then the number B ‡Department of , Uppsala University, P. O. Box n 480, SE-751 06 Uppsala, Sweden. [email protected] of bit comparisons required to sort these keys using

Quicksort has expectation given by the following exact For non-uniform distribution F , we have the same and asymptotic expressions: leading term for the asymptotic expansion of E B(), but the second-order term is larger. (Throughout, n k n 1 ln+ denotes the positive part of the (1.1) E Bn = 2 ( 1) (k 1) k (k 1)k[1 2 ] function. We denote the uniform distribution by unif.) kX=2 (1.2) = n(ln n)(lg n) c n ln n + c n + n 1 2 n Theorem 1.2. Let X1, X2, . . . be independent with a + O(log n), common distribution F over (0, 1) having density f, and let N be independent and Poisson with mean . where, with := 2/ ln 2, 1 4 If 0 f(ln+ f) < , then the expected number of bit comparisons, call∞it (), required to sort the keys 1 . R f c1 := (4 2 ln 2) = 3.105, X , . . . , X using Quicksort satises ln 2 1 N 1 1 2 c := (6 ln 2)2 (4 ln 2) + + 2 f () = unif () + 2H(f) ln + o( log ) 2 ln 2 6 6 . 1 = 6.872, as , where H(f) := 0 f lg f 0 is the entropy (in bits)→ ∞of the density f. and R In applications, it may be unrealistic to assume that i := ( 1 ik)nik a specic density f is known. Nevertheless, even in such n k( 1 ik) k Z: k=0 cases, Theorem 1.2 may be useful since it provides a ∈X6 measure of the robustness of the asymptotic estimate in is periodic in lg n with period 1 and amplitude smaller Theorem 1.1. 9 than 5 10 . Bob Sedgewick (on June 23, 2003, and among others who have heard us speak on the present material) Small periodic uctuations as in Theorem 1.1 come has suggested that the number of bit comparisons as a surprise to newcomers to the for Quicksort might be reduced substantially by not but in fact are quite common in the analysis of digital comparing bits that have to be equal according to the structures and algorithms; see, for example, Chapter 6 results of earlier steps in the algorithm. In the nal in [8]. section, we note that this is indeed the case: for a For our further results, it is technically convenient xed number n of keys, the average number of bit to assume that the number of keys is no longer xed comparisons in the improved algorithm (which we dub at n, but rather Poisson distributed with mean and “BitsQuick”) is asymptotically 2(1 + 3 )n ln n, only independent of the values of the keys. (In this extended . 2 ln 2 a constant (= 3.2) times the average number of key abstract, we shall not deal with the “de-Poissonization” comparisons [see (2.2)]. A related algorithm is the needed to transfer results back to the xed-n model; for digital version of Quicksort by Roura [12]; it too requires the results here we can simply compare the xed-n case (n log n) bit comparisons (we do not know the exact to Poisson cases with slightly smaller and larger means, constant factor). say n n2/3.) In obvious notation, the Poissonized We may compare our results to those obtained versionof (1.2) is for radix-based methods, for example radix exchange (1.3) sorting, see [7, Section 5.2.2]. This method works E B() = (ln )(lg ) c ln + c + + O(log ), 1 2 by bit inspections, that is comparisons to constant whose proof is indicated in Section 5 [following (5.2)]. bits, rather than pairwise comparisons. In the case We will also see (Proposition 5.1) that Var B() = of n uniformly distributed keys, radix exchange sorting O(2), so B() is concentrated about its mean. Since uses asymptotically n lg n bit inspections. Since radix the number K() of key comparisons is likewise concen- exchange sorting is designed so that the number of bit trated about its mean E K() 2 ln for large (see inspections is minimal, it is not surprising that our Lemmas 5.1 and 5.2), it followsthat results show that Quicksort uses more bit comparisons. More precisely, Theorem 1.1 shows that Quicksort uses 2 B() about ln n times as many bit comparisons as radix (1.4) 1 in probability as . lg K() → → ∞ exchange sorting. For BitsQuick, this is reduced to a small constant factor. This gives us a measure of 1 In other words, about 2 lg bits are compared per key the cost in bit comparisons of using these algorithms; comparison. Quicksort is often used because of other advantages,

and our results opens the possibility to see when they over the set 1, . . . , n , Kj =L Kj , and out-weight the increase in bit comparisons. { } In Section 2 we review Quicksort itself and basic Un; K0, . . . , Kn 1; K0, . . . , Kn 1 facts about the number K of key comparisons. In n are all independent. Section 3 we derive the exact formula (1.1) for E B , and n Passing to expectations we obtain the “divide-and- in Section 4 we derive the asymptotic expansion (1.2) conquer” recurrence relation from an alternative exact formula that is somewhat less elementary than (1.1) but much more transparent n 1 2 for asymptotics. In the transitional Section 5 we E K = E K + n 1, n n j establish certain basic facts about the moments of j=0 X K() and B() in the Poisson case with uniformly distributed keys, and in Section 6 we use martingale which is easily solved to give arguments to establish Theorem 1.2 for the expected (2.1) E Kn = 2(n + 1)Hn 4n number of bit comparisons for Poisson() draws from (2.2) = 2n ln n + (2 4)n + 2 ln n + (2 + 1) a general density f. Finally, in Section 7 we study the improved BitsQuick algorithm discussed in the + O(1/n). preceding paragraph. It is also routine to use a recurrence to compute explicitly the exact variance of K . In particular, the Remark 1.1. The results can be generalized to bases n asymptotics are other than 2. For example, base 256 would give corresponding results on the “byte complexity”. Var K = 2n2 2n ln n + O(n) n 2 2 2 . Remark 1.2. Cutting o and sorting small subles where := 7 3 = 0.4203. Higher moments can be dierently would aect the results in Theorems 1.1 and handled similarly . Further, the normalized sequence 1.2 by O(n log n) and O( log ) only. In particular, the K := (K )/n, n 1, leading terms would remain the same. n n n converges in distribution to K, where the law of K 2 Review: number of key comparisons used by b is characterized as the unique distribution over the Quicksort real line with vanishing meanbthat satises a certainb In this section we briey review certain basic known distributional identity; and the moment generating results concerning the number Kn of key comparisons functions of Kn converge to that of K. required by Quicksort for a xed number n of keys uniformly distributed on (0, 1). (See, for example, [4] 3 Exact meanb number of bit comparisonsb and the references therein for further details.) In this section we establish the exact formula (1.1), Quicksort, invented by Hoare [5], is the stan- repeated here as (3.1) for convenience, for the expected dard sorting procedure in Unix systems, and has been number of bit comparisons required by Quicksort for a cited [2] as one of the ten algorithms “with the great- xed number n of keys uniformly distributed on (0, 1): est inuence on the development and practice of science and engineering in the 20th century.” The Quicksort n k n 1 algorithm for sorting an array of n distinct keys is very (3.1) E Bn = 2 ( 1) (k 1) . k (k 1)k[1 2 ] simple to describe. If n = 0 or n = 1, there is nothing kX=2 to do. If n 2, pick a key uniformly at random from Let X , . . . , X denote the keys, and X < < the given arra y and call it the “pivot”. Compare the 1 n (1) X their order statistics. Consider ranks 1 i < j other keys to the pivot to partition the remaining keys (n) n. Formula (3.1) follows readily from the following three into two subarrays. Then recursively invoke Quicksort facts, all either obvious or very well known: on each of the two subarrays. With K0 := 0 as initial condition, Kn satises the distributional recurrence relation The event Cij := keys X(i) and X(j) are compared and the random{ vector (X , X ) are } (i) (j) L independent. Kn = KUn 1 + Kn U + n 1, n 1, n P(C ) = 2/(j i + 1). [Indeed, C equals ij ij where =L denotes equality in law (i.e., in distribution), the event that the rst pivot chosen from among and where, on the right, Un is distributed uniformly X(i), . . . , X(j) is either X(i) or X(j).]

The joint density g of (X , X ) is given by 4 Asymptotic mean number of bit comparisons n,i,j (i) (j) Formula (1.1), repeated at (3.1), is hardly suitable for n g (x, y) = numerical calculations or asymptotic treatment, due to n,i,j i 1, 1, j i 1, 1, n j excessive cancellations in the alternating sum. Indeed, if i 1 j i 1 n j x (y x) (1 y) . (say) n = 100, then the terms (including the factor 2, for deniteness) alternate in , with magnitude as large 25 . Let b(x, y) denote the index of the rst bit at which as 10 , and yet E Bn = 2295. Fortunately, there is the numbers x, y (0, 1) dier, where for deniteness a complex-analytic technique designed for precisely our we take the non-terminating∈ expansion for terminating situation (alternating binomial sums), namely, Rice’s rationals. Then method. Because of space limitations, we will not review the idea behind the method here, but rather refer the (3.2) reader to (for example) Section 6.4 of [8]. Let 1 1 2 E Bn = P(Cij) b(x, y) gn,i,j (x, y) dy dx h(z) := 0 x (z 1) 1 i

Then the residue contributions using Rice’s method are Lemma 5.2. For every p 1, there exists a constant cp < such that 2n(H 2 1 ), at the double pole at z = 1; ∞ n n K() E K() p cp for 1, 2(Hn + 1), at the double pole at z = 0. k k K() c 2/p for 1. k kp p Summing the two contributions gives an alternative Proof (sketch). The rst result is certainly true for derivation of (2.1). 1 bounded away from . For we need only Poissonize standard Quicksort∞ momen→ ∞t calculations; 5 Poissonized model for uniform draws cf. the very end of Section 2 for the xed-n case. For As a warm-up for Section 6, we now suppose that the 1 we use number of keys (throughout this section still assumed to p p N p 2p be uniformly distributed) is Poisson with mean . We E K () E ; N 2 2 E [N ; N 2] 2 begin with a lemma which provides both the analogue of (2.1)–(2.2) and two other facts we will need in Section 6. n 2 p 2 ∞ 2p p 2 = 2 e n c n! p Lemma 5.1. In the setting of Theorem 1.2 with F n=2 X uniform, the expected number of key comparisons is a strictly convex function of given by where N is Poisson() and cp is taken to be at least the 1 2p 1/p nite value ∞ (n /n!) . 2 n=2 y 2 E K() = 2 ( y)(e 1 + y)y dy. P Z0 We now turn our attention from K() to the more Asymptotically, as we have interesting random variable B(), the total number of → ∞ bit comparisons. First, let E K() = 2 ln + (2 4) + 2 ln + 2 + 2 + O(e ) k k Ik,j := [(j 1)2 , j2 ) and as 0 we have → be the jth dyadic rational interval of rank k, and 1 2 3 E K() = 2 + O( ). B () := number of comparisons of (k + 1)st bits, Proof. To obtain the exact formula, begin with k Bk,j () := number of comparisons of (k + 1)st bits 1 1 E Kn = pn(x, y) dy dx; between keys in Ik,j . Z0 Zx Observe that cf. (3.2) and recall Remark 4.1. Then multiply both n 2k sides by e /n! and sum, using the middle expression ∞ ∞ in (3.3); we omit the simple computation. Strict con- (5.1) B() = Bk() = Bk,j(). 2 d k=0 k=0 j=1 vexity then follows from the calculation d2 E K() = X X X 2 2(e 1 + )/ > 0, and asymptotics as 0 are → A simplication provided by our Poissonization is that, trivial: E K() = 2 ( y)[ 1 +O(y)] dy = 1 2+O(3). 0 2 2 for each xed k, the variables Bk,j() are independent. We omit the proof of the result for , but plan R → ∞ Further, the marginal distribution of Bk,j() is simply to include it in our full-length paper; comparing the k that of K(2 ). Taking expectations in (5.1), we nd xed-n and Poisson() expansions, note the dierence

in constant terms and the much smaller error term in ∞ k k the Poisson case. (5.2) unif () = E B() = 2 E K(2 ). kX=0 To handle the number of bit comparisons, we will In the full-length paper we (tentatively) plan to include also need the following bounds on the moments of K(). the details of a proof we have written showing how the Together with Lemma 5.1, these bounds also establish two asymptotic estimates in Lemma 5.1 can be used concentration of K() about its mean when is large. in conjunction with (5.2) to establish the asymptotic p 1/p For real 1 p < , we let W p := (E W ) estimate (1.3) for unif () as . [An alternative denote Lp-norm and ∞use E(W ; Ak) askshorthand| for| the approach is to Poissonize (1.2).] →Moreo∞ ver, we are now expectation of the product of W and the indicator of in position to establish the concentration of B() about the event A. unif () promised just prior to (1.4).

Proposition 5.1. There exists a constant c such that Lemma 6.1. With f := f, Var B() c22 for 0 < < . ∞ ∞ (fk)0 k is a martingale, ∞ Proof. For 0 < < , we have by the triangle inequal- 1 ∞ k and fk f almost surely (and in L ). ity for 2, independence and Bk,j() =L K(2 ), and → k k k/2 Lemma 5.2, with c := c2 k∞=0 2 , Before we begin the proof of Theorem 1.2 we remark that the asymptotic inequality f () unif () 1/2 P∞ 1/2 [VarB()] [Var Bk()] observed there in fact holds for every 0 < < . Indeed, ∞ kX=0 ∞ k k 1/2 2k [2 Var K(2 )] ∞ f () = E K(pk,j ) kX=0 c. (6.1) k=0 j=1 X X ∞ k k 2 E K(2 ) = (), Remark 5.1. (a) In the full-length paper we will show unif that Proposition 5.1 can be extended to kX=0 where the rst equality appropriately generalizes (5.2), B() E B() c0 k kp p the inequality follows by the convexity of E K() (recall Lemma 5.1), and the second equality follows by (5.2). for any real 1 p < (and some nite cp0 ) and all 1. ∞ Proof (sketch) of Theorem 1.2. Assume 1 and, (b) It is quite plausible that the variables Bk() with m m() := lg , split the double sum in (6.1) are positively correlated [because, for k 1, the d e as m 2k (k + 1)st bits of two keys can’t be compared unless (6.2) () = E K(p ) + R(), the kth bits are], in which case it is easy to check that f k,j k=0 j=1 Var B() = (2) for 1, but we do not know a X X proof. We would then have B() E B() p = () with R() a remainder term. Assuming for each real 2 p < . Perhapsk it is evenktrue that ∞ [B() E B()]/ has a limiting distribution, but we 3 (6.3) f (ln+ f ) < have no conjecture as to its form. ∞ Z (to be proved in the full-length paper from the as- 6 Mean number of bit comparisons for keys sumption on f in the statement of the theorem, using drawn from an arbitrary density f the maximal inequality for nonnegative submartingales) In this section we outline martingale arguments for and using the estimates of Lemma 5.1, one can show proving Theorem 1.2 for the expected number of bit R() = O(). Plugging this and the consequence comparisons for Poisson() draws from a rather general density f. (For background on martingales, see any E K(x) = 2x ln x + (2 4)x + O(x1/2), standard measure-theoretic probability text, e.g., [1].) which holds uniformly in 0 x < , of Lemma 5.1 In addition to the notation above, we will use the ∞ following: into (6.2), we nd m 2k

f () = 2pk,j (ln + ln pk,j ) + (2 4)pk,j pk,j := f, k=0 j=1 ZIk,j X Xh k 1/2 fk,j := (average value of f over Ik,j ) = 2 pk,j , + O (pk,j ) + O()

fk(x) := fk,j for all x Ik,j , k i ∈ m 2 f ( ) := sup fk( ). = 2 ln + 2 pk,j ln pk,j + (2 4) k j=1 kX=0 X Note for each k 0 that p = 1 and that j k,j + O 1/22k/2 + O() f : (0, 1) [0, ) is the smoothing of f to the k → ∞ rank-k dyadic rational intervals.PFrom basic martingale m theory we have immediately the following simple but = unif () + 2 fk ln fk + O(), key observation. kX=0 Z

where we have used the Cauchy–Schwarz inequality at The routine BitsQuick(A, b) the second equality and comparison with the uniform case (f 1) at the third. If A 1 | | But, by Lemma 6.1, (6.3), and the dominated Return A convergence theorem, Else Set A and A+ Choose a random∅ pivot∅key x = 0.x(1) x(2) . . . (6.4) fk ln fk f ln f as k , → → ∞ If x(1) = 0 Z Z For y A with y = x from which follows If y <∈ x 6 Set y L(y) and then A A y () = () + 2(lg ) f ln f + o( log ) f unif Else ∪ { } Z Set A A y + + ∪ { } = unif () + 2(ln ) f lg f + o( log ), Set A BitsQuick(A , 1) and Z A BitsQuick(A, 0) + + as desired. Set A A x A+ Else k { } k Remark 6.1. If we make the stronger assumption that For y A with y = x If y <∈ x 6 f is Holder() continuous on [0, 1] for some > 0, Set A A y ∪ { } then we can quantify (6.4) and improve the o( log ) Else remainder in the statement of Theorem 1.2 to O(). Set y L(y) and then A+ A+ y Set A BitsQuick(A , 0) and ∪ { } A BitsQuick(A, 1) 7 An improvement: BitsQuick + + Set A A x A+ Recall the operation of Quicksort described in Sec- If b = 0 k { } k tion 2. Suppose that the pivot [ call it x = Return A 0.x(1) x(2) . . . ] has rst bit x(1) equal to 0, say. Then Else the subarray of keys smaller than x all have rst bit Return R(A) equal to 0 as well, and it wastes time to compare rst bits when Quicksort is called recursively on this sub- array. BitsQuick as a fair measure of its overall cost. In the We call BitsQuick the obvious recursive algorithm full-length paper we will also prove the following ana- that does away with this waste. We give one possible logue of Theorem 1.1 (and establish further terms in the implementation in the boxed pseudocode, where L(y) asymptotic expansion), wherein denotes the result of rotating the register containing 7 15 3 . key y to the left —i.e., replacing y = .y(1) y(2) . . . y(m) c˜1 := + + 2 = 13.9 by .y(2) . . . y(m) y(1) [and similarly R(A) for rotation ln 2 2 ln 2 of every element of array A to the right]. The input bit b and indicates whether or not the array elements need to be 1 3 ik ik rotated to the right before the routine terminates. The ˜n := ( 1 ik) n ln 2 1 + ik symbol denotes concatenation (of sorted arrays). (We k Z: k=0 k ∈X6 omit minor implementational details, such as how to do is periodic in lg n with period 1 and amplitude smaller sorting in place and to maintain random ordering for the 7 than 2 10 : generated subarrays, that are the same as for Quicksort n and very well known.) The routine BitsQuick(A, b) k n 1 2(k 2) k 4 E Qn = ( 1) k k (k 1) receives a (randomly ordered) array A and a single bit b, k 1 2 1 2 and returns the sorted version of A. The initial call is kX=2 + 2nH 5n + 2H + 1 to BitsQuick(A0, 0), where A0 is the full array to be n n sorted. 3 = 2 + n ln n c˜ n + ˜ n + O(log2 n). A related but somewhat more complicated algo- ln 2 1 n rithm has been considered by Roura [12, Section 5]. We have not yet had the opportunity to consider the In the full-length paper we plan to present argu- variability of Qn. ments justifying the number of bit comparisons used by

References

[1] Kai Lai Chung. A course in probability theory. Second edition, Probability and Mathematical Statistics, Vol. 21, Academic Press, New York–London, 1974. [2] J. Dongarra and F. Sullivan. Guest editors’ introduc- tion: the top 10 algorithms. Computing in Science & Engineering, 2(1):22–23, 2000. [3] James Allen Fill and Svante Janson. Smoothness and decay properties of the limiting Quicksort density func- tion. Mathematics and (Versailles, 2000), pp. 53–64. Trends Math., Birkhauser, Basel, 2000. [4] James Allen Fill and Svante Janson. Quicksort asymp- totics. J. Algorithms, 44(1):4–28, 2002. [5] C. A. R. Hoare. Quicksort. Comput. J., 5:10–15, 1962. [6] Charles Knessl and Wojciech Szpankowski. Quicksort algorithm again revisited. Discrete Math. Theor. Com- put. Sci., 3(2):43–63 (electronic), 1999. [7] Donald E. Knuth. The Art of Computer Programming. Vol. 3: Sorting and Searching. 2nd ed., Addison- Wesley, Reading, Mass., 1998. [8] Hosam M. Mahmoud. Evolution of random search trees. John Wiley & Sons Inc., New York, 1992. [9] Ralph Neininger and Ludger Ruschendorf. Rates of convergence for Quicksort. J. Algorithms 44(1):51–62, 2002. [10] Mireille Regnier. A limiting distribution for quicksort. RAIRO Inform. Theor. Appl., 23(3):335–343, 1989. [11] Uwe Rosler. A limit theorem for “Quicksort”. RAIRO Inform. Theor. Appl., 25(1):85–100, 1991. [12] Salvador Roura. Digital access to comparison-based tree data structures and algorithms. J. Algorithms, 40(1):1–23, 2001.