p
(n log log n) Integer Sorting in O Expected Time and Linear Space
Yijie Han Mikkel Thorup School of Interdisciplinary Computing and Engineering AT&T Labs— Research University of Missouri at Kansas City Shannon Laboratory 5100 Rockhill Road 180 Park Avenue Kansas City, MO 64110 Florham Park, NJ 07932 [email protected] [email protected] http://welcome.to/yijiehan
Abstract of the input integers. The assumption that each integer fits in a machine word implies that integers can be operated on
We present a randomized algorithm sorting n integers with single instructions. A similar assumption is made for
p
(n log log n) O (n log n)
in O expected time and linear space. This comparison based sorting in that an time bound
(n log log n) improves the previous O bound by Anderson et requires constant time comparisons. However, for integer al. from STOC’95. sorting, besides comparisons, we can use all the other in-
As an immediate consequence, if the integers are structions available on integers in standard imperative pro-
p
O (n log log U ) bounded by U , we can sort them in gramming languages such as C [26]. It is important that the
expected time. This is the first improvement over the word-size W is finite, for otherwise, we can sort in linear
(n log log U ) O bound obtained with van Emde Boas’ data time by a clever coding of a huge vector-processor dealing
structure from FOCS’75. with n input integers at the time [30, 27].
At the heart of our construction, is a technical determin- Concretely, Fredman and Willard [17] showed that we
n
(n log n= log log n)
istic lemma of independent interest; namely, that we split can sort deterministically in O time and
p
p
n
(n log n) integers into subsets of size at most in linear time and linear space and randomized in O expected time space. This also implies improved bounds for deterministic and linear space. The randomized bound can also be
string sorting and integer sorting without multiplication. achieved deterministically using space unbounded in terms
"W
2 "> 0 of n (of the form for constant ), but here we focus
on bounds with space bounded in terms of n.
1 Introduction In 1995, Andersson et al. [5] improved Fredman and
p
(n log n)
Willard’s O expected time for integer sorting to
(n log log n) Integer sorting has always been an important task in con- O expected time. Both of these bounds use lin- nection with the digital computer. A classic example is ear space. A similar result was found independently by Han the folklore algorithm radix sort, which according to Knuth and Shen [23]. [28] is referenced as far back as in 1929 by Comrie in a The above mentioned bounds are the best unrestricted
document describing punched-card equipment [14]. bounds in the sense that no better bounds are known even if
(log n) Whereas radix sort works in linear time for O -bit besides the randomization, we have unlimited space and are
integers, it was not until 1990 that Fredman and Willard [17] free to define our own operations on words.
n log n) beat the ( comparison-based sorting lower-bound The above results have provided great inspiration for
for the case of arbitrary single word integers. The word-size many researchers, trying to improve them in various ways.
W log n W is determined by the processor. We assume For example, there has been work on avoiding randomiza-
so that we can address the different integers, but in princi- tion [4, 21, 22, 31, 33] and there has been work on avoiding n ple, W can be arbitrarily large compared with . An equiv- multiplication [3, 6, 36]. Also there has been lots of work alent formulation of our assumptions is that we only assume on more dynamic versions of the problem such as priority
constant time operations on integers polynomial in the sum queues [13, 18, 31, 33, 34] and searching [3, 4, 8, 9].
(n log log n) Supported in part by University of Missouri Faculty Research Grant The O algorithm of Andersson et al. from K211942 1995 [5] is very simple, and the fact that it has sustained so
Proceedings of the 43 rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’02) 0272-5428/02 $17.00 © 2002 IEEE
much interest has lead many researchers to think that this Theorem 3 improves a corresponding refinement of Kirk-
(n log n)
could be the complexity for integer sorting, like O patrick and Reich [27] of van Emde Boas’ bound of
log U
) (n log
is the complexity of comparison based sorting. O expected time.
log n
(n log log n)
However, in this paper, we improve the O The following simple lemma states that it suffices for us
p
(n log log n) expected time to O expected time: to prove Theorem 3: Lemma 4 Theorem 3 implies Theorem 1 and Corollary 2.
Theorem 1 There is a randomized algorithm sorting n in-
p
(n log log n) tegers, each stored in one word, in O ex- pected time and linear space. Proof: Trivially Theorem 3 implies the weaker Corol- lary 2. However, Andersson et al. [5] have shown that we
We leave open the problems of getting a corresponding de- 3
(log n) can sort in linear expected time if W , and other-
terministic algorithm and avoid the use of multiplication.
U log W 3 log log n wise, log log . We note that the dynamic aspects are already pretty settled,
for the complexity of dynamic searching is known to be
p
log n= log log n) ( [8, 9], and priority queues have once 1.1 Other domains
and for all been reduced to sorting [35].
(log n) Since integers of O bits can be sorted in linear time using radix sort, we get the following immediate con- In this paper, we generally assume that our integers to be sequence of Theorem 1: sorted each fit in one word, where a word is the maximal unit that we can operate on with a single instruction. In this
subsection, we briefly discuss implications of this case to U Corollary 2 We can sort n integers of size at most in
p many other domains.
(n log log U ) O expected time.
First, consider the case of lexicographically ordered
(n log log U ) This is the first improvement over the O bound strings of words. Andersson and Nilsson [7] have presented
obtained with van Emde Boas’ data structure from 1975 an optimal randomized reduction from this case to that of
(n log log n) [37]. Indeed, the O sorting algorithm of An- integers fitting in single words. Applying their reduction to dersson et al. [5] combines van Emde Boas’ data structure Theorem 1, we get with the packed merging of Albers and Hagerup [2] so as to
Corollary 5 We can sort n variable-length strings dis-
p
(n log log U ) U
match the O bound for large values of .
O (n log log n + N )
tributed over N words in expected
p
(log log U )
We note here that the general O bound of
(n log log n + L) time. In fact, we can get down to O ex-
van Emde Boas [37] has been improved in the context P
L = ` ` i pected time where i and is the length of the of static search structures. More precisely, Beame and i
distinguishing prefix of string i, that is, the smallest prefix Fich [9] have shown that one can preprocess a set of
of string i distinguishing it from all the other strings, or the
n integers in polynomial time and space so that given
length of string i if it is a duplicate.
any x, one can search the largest stored integer below
O (log log U= log log log U ) x in time. However, due to We note here that any algorithm sorting the strings will have the polynomial construction time, this improvement does to read the distinguishing prefixes, and since it takes an in-
not help with sorting. Beame and Fich [9] combine
(L) struction to read each word, it follows that an additive O their result with Andersson’s exponential search trees from is necessary.
[4], giving a dynamic search structure with an amor- One may instead be interested in variable length
(log log U log log n= log log log U ) tized update time of O . multiple-word integers where integers with more words are
This dynamic search structure could be used for sort- considered bigger. However, by prefixing each integer with
(n log log U log log n= log log log U )
ing in O time, but this its length, we reduce this case to lexicographic string sort-
(n log n)
is never better than the best of O time and ing.
(n log log U ) O time. Thus, from the perspective of sorting,
p In our presentation, we think of all our integers as un-
(n log log U )
our O bound constitutes the first improve- signed non-negative integers. However, the standard repre-
(n log log U ) ment over the O bound derived from van Emde sentation of signed integers is such that if we flip the sign- Boas’ data structure from 1975 [37]. bit, we can sort them as unsigned integers, and then flip We will, in fact, prove the following refinement of Corol- the sign-bit back again. Floating points numbers are even lary 2: easier, for the IEEE 754 floating-point standard [25] is de- signed so that the ordering of floating point numbers can
Theorem 3 There is a randomized algorithm sorting n be deduced by perceiving their representations as multiple
W
< 2 word integers, each of size at most U in
q word integers. Also, if we are working with fractions where
log U
O (n log
) expected time and linear space.
n log both enumerator and denominator are single word integers,
Proceedings of the 43 rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’02) 0272-5428/02 $17.00 © 2002 IEEE we get the right ordering if for each fraction, we make the where a single word operation is used to manipulate the in- division in floating point numbers with double precision. formation on several pixels, each represented by one byte Now we get the correct ordering of the original integer frac- of the word. tions by perceiving the corresponding floating point num- The word RAM model distinguishes itself both from the bers as integers. comparison based model and from the pointer machine in that we can use integers, and segments of integers, as ad- 1.2 Machine model dresses. This trick goes back to radix sort where an integer is viewed as a vector of characters, and these characters are Recall that our machine model is a normal computer with used as addresses. Another word RAM trick in this direc- an instruction set corresponding to what we program in a tion is that we can hash integers into smaller ranges. Here standard programming language such as C [26] or C++ [32]. radix sort goes back at least to 1929 [14] and hashing goes back at least to 1956 [16], both being developed for efficient We have a processor determined word-size W , limiting how big integers we can operate on in constant time. We assume problem solving in the real world. that each input integer fits in a single word. We note that Fredman and Willard [17] further use the RAM for ad- for generic code, the type of a full word integer, e.g. long vanced tabulation of complicated functions over small do- long int, should be a macro parameter in C or template mains. Their tabulation is too complicated to be of practical parameter in C++. We adopt the unit-cost time measure relevance, but tabulation of functions is in itself commonly where each operation takes constant time. used to tune code. As a simple example, Bentley [11, pp. Interestingly, the traditional theoretical RAM model of 83–84] suggests that an efficient method for computing the Cook and Reckhow [15] allows infinite words. A disturb- number of set bits in 32-bit integers is to have a preprocess-
ing consequence of infinite words is that with normal op- ing where we first tabulate the number of set bits in all the x erations such as shifts or multiplication, we can simulate 256 different 8-bit integers. Now, given a 32-bit integer , an exponentially big parallel processor solving all problems we view it as the concatenation of four 8-bit integers, and in NP in polynomial time. Hence such operations have to for each of these, we look up the number of set bits in our be banned from the above unit-cost theory RAM, making it table. Finally we just add up these four numbers to get the
even more contrived from a practical view-point. number of set bits in x. However, by adopting the real-world limitation of a lim- Summing up, we have argued that the “dirty tricks” facil- ited word-size, we both resolve the above theoretical issue, itated by the word RAM are well established in the practice and we get algorithms that can be implemented in the real of writing fast code. Hence, if we disallow these tricks, we world. Hagerup [19] has named this model the word RAM. are not discussing the time complexity of running impera- The word RAM has a fairly long tradition within integer tive programs on real world computers. At the same note, sorting, being advocated and used in the 1984 paper [27] by it should be admitted that this is a theory paper. The al- Kirkpatrick and Reisch, and in the seminal 1993 paper of gorithms presented are too complicated and have too large
Fredman and Willard [17]. constants hidden in the O -notation to be of any immediate We note that our unit-cost multiplication may be con- practical use. This does not preclude that some of the ideas
sidered somewhat questionable in that multiplication is not may find use in practice. For example, Nilsson [29] has
O (n log log n) in AC0 [10], that is, there is no multiplication circuit of demonstrated that the algorithm of Anders- son et al. [5] can be implemented so as to be competitive constant-depth and of size polynomial in W .Weleave
it as an open problem to improve Thorup’s randomized with the best practical sorting algorithms.
(n log log n) O expected time and linear space sorting with- out multiplication [36]. 1.3 Deterministic splitting and sorting We will now discuss some of the (delightfully) dirty tricks that we can use on the word RAM, and which are The heart of our construction is a deterministic splitting not allowed in the comparison based model or on a pointer results of independent interest. machine.
A first nice feature of the word RAM over the compar- Definition 6 A splitting of an ordered set X is a partition-
X A
ison based model is that we can add and subtract integers. <
0 1
ingintosets k .Here, denotes
(a; b) 2 A B For example, we can use this to code multiple comparisons that a Proceedings of the 43 rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’02) 0272-5428/02 $17.00 © 2002 IEEE Theorem 7 We can split a set of n word integers in linear 2 Fast randomized sorting p time and space so that each diverse subset has at most n integers. For our fast randomized sorting, we need a slight variant Applying Theorem 7 recursively, we immediately get that of the signature sorting of Andersson et al. [5] (cf. Appendix A.1): (n log log n) we can sort deterministically in O time and lin- ear space, but this has already been proved by Han in [22]. Lemma 11 With an expected linear-time additive over- However, for deterministic string sorting, we get the follow- head, signature sorting with parameter r reduces the prob- ing new result which does not follow from [22] (the general ` lem of sorting n integers of bits to reduction of Andersson and Nilsson [7] from word sorting to string sorting is randomized): 4r log n (i) the problem of sorting n reduced integers of bits each and Corollary 8 We can sort n variable-length strings dis- O (n log log n + N ) tributed over N words in time and lin- `=r (ii) the problem of sorting n integer fields of bits each. (n log log n + L) ear space. In fact, we can get down to O time where L is the sum of the lengths of the distinguishing Here (i) has to be solved before (ii). prefixes. We are now going to present our randomized algorithm for p O (n log p) Proof: To get the corollary, we simply apply Theorem sorting n word integers in linear space and ex- =logU= log n 7 recursively but only to the first unmatched word of each pected time where p . string. That is, our recursive input is a subset of the integers with some common matched prefix. In the root call, the sub- Repeated splitting First we apply our splitting from The- set is the complete set with nothing matched so far. Integers orem 7 to recursively split the diverse set of integers ending up in a duplicate set match in one more word, and p log pe d times, thus getting a splitting with diverse subsets the other integers end up in sets of size reduced to the square p log p 0 1=2 = n of size at most n . Each integer is involved in root. Since the splitting takes constant time per integer, we p log pe at most d linear time splittings, so the total time is O (log log n) pay constant time per word matched and time p (n log p)) for reductions into smaller sets. O . We also present variants of the splitting using only stan- 0 0 S dard AC operations, that is, AC operations available via a Repeated signature sort To each of the diverse sets i of 0 standard programming language such as C or C++. size at most n , we apply the signature sort from Lemma 11 p log p r =2 0 with . Then the reduced integers from (i) have Theorem 9 For any positive ", using standard AC oper- 0 4r log n = O (log n) " 1+ bits. (W=(log log n) ) ations only, we can split a set of n -bit integers in linear time so that diverse subsets have size at We are going to sort these reduced integers from all the p S S i subsets i together, but prefixing reduced integers from most n. O (log n) with i. The prefixed reduced integers still have 0 Corollary 10 For any positive ", using standard AC op- bits, so we can radix sort them in linear time. From the 1+" n O (n(log log n) erations only, we can sort words in ) resulting sorted list we can trivially extract the sorted sub- S time and linear space. list of reduced integers for each i , thus completing task (i) from Lemma 11. Proof: We use the same proof as for Corollary 8, but We have now spent linear expected time on reducing " 1+ the problem to dealing with the fields from Lemma 11 (ii) (W=(log log n) ) viewing each word as a string of -bit p log p =2 characters, to which we can apply Theorem 9. and the field length is only a fraction 1 of original length. Corollary 10 improves the previous p 2 log pe We repeat this signature sorting d times, at a total O (n(log log n) =(log log log n)) bound of Han from p (n log p) [20], and it gets very close to the best multiplication based expected cost of O . We started with integers of U length log , and each round reduces the integer length by a O (n log log n) deterministic bound of [22]. p log p factor 2 , so we end up with integers of length at most 1.4 Contents p p log p log p log p log U=(2 ) log U=2 = log n The rest of the paper is divided into two sections. Section n 2 presents our randomized sorting assuming deterministic Since the lengths are now at most log , we can trivially splitting, and Section 3 presents our deterministic splitting. finish with a linear time bucket sort. Proceedings of the 43 rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’02) 0272-5428/02 $17.00 © 2002 IEEE n = O (1) n = ! (1) Summing up The total expected time spent in the above (d))(a) If , (a) is trivial, so assume . p p (n log p) a n algorithm is O , and we have only used linear We divide our input integers into batches of size .Us- " =2 d space, as desired. ing (d), we can split such a batch over n splitters in Thus, our only remaining task is to provide the splitting linear time. from Theorem 7, that is, We will develop the an appropriate set Y of splitters as = ; we go along, starting with Y . Each time we come with Lemma 12 Theorem 7 implies Theorem 3. a batch of integers, we split them according to the current y 1 k 1 splitters 1 , adding them to the splitting X < fy g X < fy g < fy g X 1 1 2 k k 3 Deterministic splitting 0 done so 1 " =2 d X 4n far. If one of the diverse subsets i gets or more z To prove Theorem 7, first in Section 3.1, we reformulate elements, we split it according to its median z ,andmake +1 it in terms of splitting over a given set of splitters. and z two new splitters. Obviously, we end with at most 1 " =4 d n n = ! (1) 4 splitters in each diverse set, and since , 1 " =4 1 " a d 4n " n 3.1 Splitting over splitters for some positive constant a . 1 " =2 d n We will charge each splitting over a median to 2 We will actually prove our splitting result in terms of elements. This implies that the median finding and sub- sequent splitting is done in linear time, and that the total an equivalent formulation. A splitting of a set X into sets 1 " =2 " =2 d d 2n=(2n = n number of splitters is at most ) ,as X 1 k 1 0 is a splitting over splitters needed for applying (d) with a batch of n integers. y < k 0 1 1 2 3 2 if The charging is simple: every time a diverse subset is fy g X k k . 1 " =2 d n started, it has at most 2 charged elements. We only 1 " =2 d n Lemma 13 The following statements are equivalent: split the set if it gets more than 4 elements, which 1 " =2 d n means that we have at 2 non-charged elements that (a) We can split a set of n integers so that diverse subsets we can charge, and we can easily distribute the charging so 1 " a areofsizeatmost n in linear time and space for that each of the resulting divers sets get at most Moreover, 1 " =2 " d some positive constant a . n any diverse subset starts with at most 2 charged el- ements. (b) We can split a set of n integers so that diverse sets are 1 " b The rest of this paper is devoted to prove Lemma 13 (d) of size at most n in linear time and space for any p 3 " = n with d , which by Lemma 13 (b) implies Theorem " positive constant b . p 3 n 7. That is, our remaining task is to split n integers over " c n (c) We can split a set of n words over splitters in linear splitters in linear time and space. " time and space for any positive constant c . " 3.2 Deterministic signature sorting d n (d) We can split a set of n words over splitters in linear " time and space for some positive constant d . We shall use a deterministic version of signature sort, essentially done by Han [21] (cf. Appendix A.2). 1 " b n Proof: (a))(b) To get diverse subsets of size we ` 2 i O (n + ) apply (a) recursively on diverse sets. If (a) is applied times Lemma 14 With an s time additive overhead, de- r x x to subsets containing , the set containing ends up non- terministic signature sorting with parameter r reduces the i (1 " ) a n` s x diverse, or of size at most n . Thus gets involved at problem of splitting -bit integers over splitters to log (1 " )=O (1) most b times. 1 " a 4r log n (i) the problem of splitting n reduced integers of bits over s reduced splitters, and ) " (b) (c) To prove (c) with any given value of c , we apply " = 2" n `=r s c (b) with b . Now, the diverse subsets have at most (ii) the problem of splitting fields of bits over field 1 2" " c c n n integers. Out of these subsets, at most contain a splitters. splitter. We split each subset with splitters using traditional comparison based sorting. The total number of integers in- Here (i) has to be solved before (ii). 1 2" " 1 " c c c n n = n volved in this is at most , so the total 2 (`s ) Han [21] actually paid an additive O . The fact that we 1 " c O (n log n)=O (n) ` time for sorting is . Having sorted 2 O ( ` ) only pay s is useful if is arbitrarily large compared all diverse subsets with splitters over these splitters, we im- r with n as in Lemma 15 below. mediately derive the desired splitting of the original set. c = (log n) c 5 n Lemma 15 If W , , we can split word p 3 integers over at most n splitters in linear time. (c))(d) is trivial. Proceedings of the 43 rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’02) 0272-5428/02 $17.00 © 2002 IEEE p 3 n Proof: We apply the signature sorting of Lemma 14 with Proof: We need to split n integers over splitters. c 3 =(logn) r . Then the additive overhead is From Lemmas 15 and 16, we know that it suffices to con- (log n) q = O (log n) W = sider q -bit integers where , c p (log n) 4 5 2 3 (q W (log n) n = ! (1) (log n)) O n + n O (n): = ,and .For , the latter c 3 (log n) 6 W n implies , as required for Lemma 17. q Now, the reduced integers from (i) have length We view each integer as consisting of 4 characters, each n)=4 of (log bits. Also, we have plenty of room to pack c 2 2 4r (log n)=4(logn) = O (W=(log n) ) (log q ) q integers in each word. 0 n and that implies that we can sort and hence split them The algorithm works recursively, taking a subset of p i in linear time using the packed sorting of Albers and n integers, all with a common prefix of length .Wethen 2 k = q n) = Hagerup [2]. Similarly, the fields have length (log use Lemma 17 with to sort the integers according to 3 i +1 (log n)=4 (W=(log n) ) O , so they can also be sorted in linear time. character . We note that this character has 0 (log n =2 ) bits, as required of the segment in Lemma 17. Also, we note that this is, in fact, a splitting since all the 5 3 Above, we could let c go down from to if, instead of using the packed sorting of Albers and Hagerup, we used integers agree on the preceding characters. Starting with all n integers, we recurse on diverse subsets until they all have the one by Han and Shen [24] which takes linear time for p size at most n. (W= log n) O -bit integers. However, Han and Shen’s result employs the sorting network of Ajtai et al. [1], and the im- To see that this implies a linear time splitting, let n ;n ;:::;n 1 t provement does not affect the overall results of this paper. 0 be the sizes of the sets that a given integer x n = n is involved in as. That is, we start with 0 .For p 5 (log n) Lemma 16 If W , with a linear time additive i n n