PDM Sorting Algorithms That Take a Small Number of Passes

PDM Sorting Algorithms That Take A Small Number Of Passes Sanguthevar Rajasekaran∗ Sandeep Seny Dept. of Comp. Sci. and Engg. Dept. of Comp. Science University of Connecticut, Storrs, CT IIT, New Delhi [email protected] [email protected] Abstract Since sorting is a fundamental and highly ubiquitous problem, a lot of effort has been spent on developing sort- We live in an era of data explosion that necessitates the ing algorithms for the PDM. It has been shown by Aggar- discovery of novel out-of-core techniques. The I/O bottle- wal and Vitter [1] that Ω N log(N=B) I/O operations neck has to be dealt with in developing out-of-core meth- DB log(M=B) are needed to sort N keys (residing in Ddisks) when the ods. The Parallel Disk Model (PDM) has been proposed to block size is B. Here M is the size of the internal mem- alleviate the I/O bottleneck. Sorting is an important problem ory. Many asymptotically optimal algorithms have been de- that has ubiquitous applications. Several asymptotically op- vised as well (see e.g., Arge [2], Nodine and Vitter [21], and timal PDM sorting algorithms are known and now the fo- Vitter and Hutchinson [29]). The LMM sort of Rajasekaran cus has shifted to developing algorithms for problem sizes [23] is optimal when N; B, and M are polynomially re- of practical interest. In this paper we present several novel lated and is a generalization of Batcher’s odd-even merge algorithms for sorting on the PDM that take only a small sort [6], Thompson and Kung’s s2-way merge sort [28], and number of passes through the data. We also present a gen- Leighton’s columnsort [15]. eralization of the zero-one principle for sorting. A shuffling lemma is presented as well. These lemmas should be of in- Recently, many algorithms have been devised for prob- dependent interest for average case analysis of sorting al- lem sizes of practical interest. For instance, Demen- gorithms as well as for the analysis of randomized sorting tiev and Sanders [11] have developed a sorting algo- algorithms. rithm based on multi-way merge that overlaps I/O and computation optimally. Their implementation sorts gi- gabytes of data and competes with the best practical implementations. Chaudhry, Cormen, and Wisniewski 1. Introduction [9] have developed a novel variant of columnsort that sorts MpM keys in three passes over the data (assum- The Parallel Disk Model (PDM) has been proposed to ing that B = M 1=3). Their implementation is competitive deal with the problem of developing effective algorithms N with NOW-Sort. (By a pass we mean DB read I/O oper- for processing voluminous data. In a PDM, there is a (se- ations and the same number of write operations.) A typ- quential or parallel) computer C that has access to D( 1) ical assumption made in developing PDM algorithms disks. In one I/O operation, it is assumed that a block≥ of is that the internal memory size M is a small multi- size can be fetched into the main memory of . One typ- B C ple of DB. In [7], Chaudhry and Cormen introduce ically assumes that the main memory of C is sized M where some sophisticated engineering tools to speedup the al- M is a (small) constant multiple of DB. gorithm of [9] in practice. They also report a three pass Efficient algorithms have been devised for the PDM for algorithm that sorts MpM keys in this paper (assum- numerous fundamental problems. In the analysis of these ing that B = Θ(M 1=3)). In [8], Chaudhry, Cormen, algorithms, typically, the number of I/O operations needed and Hamon present an algorithm that sorts M 5=3=42=3 are optimized. Since local computations take much less time keys (when B = Θ(M 2=5)). They combine column- than the time needed for the I/O operations, these analyzes sort and Revsort of Schnorr and Shamir [26] in a clever are reasonable. way. This paper also promotes the need for oblivious algorithms and the usefulness of mesh-based techniques in the ∗ This research has been supported in part by the NSF Grants CCR- 9912395 and ITR-0326155. context of out-of-core sorting. In fact, the LMM sort of Ra- y This research was conducted when this author was visiting the Uni- jasekaran [23] and all the algorithms in this paper (except versity of Connecticut. for the integer sorting algorithm) are oblivious. In this paper we focus on developing sorting algorithms cific algorithms for PDM sorting. All of these algorithms that take a small number of passes. We note that for most of use a block size of pM. Here is a list: 1) We present a sim- the applications of practical interest N M 2. For instance, ple mesh-based algorithm that sorts MpM keys in three if M = 108 (integers), then M 2 is 10≤16 (integers) (which passes assuming that B = pM; 2) We also present an- is around 100,000 tera bytes). Thus we focus on input sizes other three pass algorithm for sorting MpM keys that is of M 2 in this paper. Another important thrust of this pa- based on LMM sort that also assumes that B = pM. In per≤is on algorithms that have good expected performance. contrast, the algorithm of Chaudhry and Cormen [7] uses In particular, we are interested in algorithms that take only a block size of M 1=3 and sorts MpM=p2 keys in three a small number of passes on an overwhelming fraction of passes. 3) An expected two pass algorithm that sorts nearly all possible inputs. As an example, consider an algorithm MpM keys. In this paper, all the expected algorithms are α A that takes two passes on at least (1 M − ) fraction of all such that they take the specified number of passes on an − α α possible inputs and three passes on at most M − fraction of overwhelming fraction (i.e., (1 M − ) for any fixed all possible inputs. If M = 108 and α = 2, only on at most α 1) of all possible inputs;≥4) An−expected three pass al- 14 ≥ 1:75 10− % of all possible inputs, will take more than two gorithm that sorts nearly M keys; 5) A seven pass algo- passes. Thus algorithms of this kindA will be of great practi- rithm (based on LMM sort) that sorts M 2 keys. 6) An ex- cal importance. pected six pass algorithm that sorts nearly M 2 keys; and 7) A simple integer sorting algorithm that sorts integers in the c New Results: There are two main contributions in this pa- range [1; M ] (for any constant c) in a constant number of per: 1) We bring out the need for algorithms that have good passes (for any input size). Unlike the bundle-sorting algo- expected performance in the context of PDM sorting. A sav- rithm of Matias, Segal and Vitter [20] that is efficient only ing of even one pass could make a big difference if the in- for a single disk model, our algorithm achieves full paral- put size is large. Especially, algorithms that run in a small lelism under very mild assumptions. number of passes on an overwhelming fraction of all possible inputs will be highly desirable in practice. As a part of Notation. We say the amount of resource (like time, space, this effort we prove two lemmas that should be of indepen- etc.) used by a randomized or probabilistic algorithm is dent interest. 2) The second main contribution is in the de- O(f(n)) if the amount of resource used is no more than 2 α velopment of algorithms for input sizes M . This input cαf(n) with probability (1 n− ) for any n n0, ≤ e ≥ − ≥ size seems to cover most of the applications of practical in- where c and n0 are constants and α is a constant 1. We terest. In this paper we also point out that radix sort can be could also define the asymptotic functions Θ(:); ≥o(:), etc. useful in PDM sorting of integer keys. And hence it should in a similar manner. e be of interest for many practical applications. e The zero-one principle has been extensively used in the past in the design and analysis of sorting algorithms. This 2. A Lower Bound lemma states that if an oblivious sorting algorithm sorts all possible binary sequences of length n, then it also sorts ar- The following lower bound result will help us judge the bitrary sequences of length n. In this paper we present a optimality of algorithms presented in this paper. generalization of this principle that applies to sorting al- Lemma 2.1 At least two passes are needed to sort M pM gorithms that sort most of the binary sequences. The stan- elements when the block size is pM. At least three passes dard 0-1 principle offers great simplicity in analyzing sort- are needed to sort M 2 elements when the block size is pM. ing algorithms–it suffices to assume that the input consists These lower bounds hold on the average as well. of only zeros and ones. The same level of simplicity is of- fered by the generalization presented in this paper as well. Proof. The bounds stated above follow from the lower We are not aware of any previous formalization of such re- bound theorem proven in [4]. In particular, it has been sult - however, Chlebus [10] has stated an ad-hoc version shown in [4] that log(N!) N log B + I (B log((M without giving any proof of its correctness.

Load more