Improved Parallel Integer Sorting Without Concurrent Writing

Improved Parallel Integer Sorting without Concurrent Writing yz y Susanne Alb ers Torb en Hagerup A preliminary version of this pap er was presented at the rd Annual ACMSIAM Symp osium on Discrete Algorithms SODA in Orlando Florida in January y MaxPlanckInstitut fur Informatik D Saarbruc ken Germany Supp orted in part by the Deutsche Forschungsgemeinschaft SFB TP B VLSI Entwurfsmethoden und Parallelitat and in part by the ESPRIT Basic Research Actions Program of the EU under contracts No and pro jects ALCOM and ALCOM I I z Part of this work was done while the author was a student at the Graduiertenkol leg Informatik Univer sitat des Saarlandes D Saarbruc ken Germany and supp orted by a graduate fellowship of the Deutsche Forschungsgemeinschaft Prop osed running head Improved parallel integer sorting Contact author Susanne Alb ers MaxPlanckInstitut fur Informatik Im Stadtwald Saarbruc ken Germany Abstract We show that n integers in the range n can b e sorted stably on an EREW PRAM p using O t time and O n log n log log n log n t op erations for arbitrary given t p tlog n log n log log n and on a CREW PRAM using O t time and O n log n log n op erations for arbitrary given t log n In addition we are able to sort n arbitrary integers on a randomized CREW PRAM within the same resource b ounds with high probability In p log n closer to optimality than all previous each case our algorithm is a factor of almost algorithms for the stated problem in the stated mo del and our third result matches the op eration count of the b est previous sequential algorithm We also show that n integers in the range m can b e sorted in O log n time with O n op erations on an EREW PRAM using a nonstandard word length of O log n log log n log m bits thereby greatly improving the upp er b ound on the word length necessary to sort integers with a linear timepro cessor pro duct even sequentially Our algorithms were inspired by and in one case directly use the fusion trees of Fredman and Willard Intro duction A parallel algorithm is judged primarily by its speed and its eciency Concerning sp eed a widely accepted criterion is that it is desirable for a parallel algorithm to have a running time that is p olylogarithmic in the size of the input The eciency of a parallel algorithm is evaluated by comparing its timeprocessor pro duct ie the total numb er of operations executed with the running time of an optimal sequential algorithm for the problem under consideration the parallel algorithm having optimal speedup or simply b eing optimal if the two agree to within a constant factor A p oint of view often put forward and forcefully expressed in Kruskal et al b is that the eciency of a parallel algorithm at least given presentday technological constraints is far more imp ortant than its raw sp eed the reason b eing that in all probability the algorithm must b e slowed down to meet a smaller numb er of available pro cessors anyway The problem of integer sorting on a PRAM has b een studied intensively While previous research has concentrated on algorithms for the CRCW PRAM Ra jasekaran and Reif Ra jasekaran and Sen Bhatt et al Matias and Vishkin a b Raman a b Hagerup Gil et al Hagerup and Raman Bast and Hagerup Go o drich et al in this pap er we are interested in getting the most out of the weaker EREW and CREW PRAM mo dels Consider the problem of sorting n integers in the range m In view of sequential radix sorting which works in linear time if O m n a parallel algorithm for this most interesting range of m is optimal only if its time pro cessor pro duct is O n Kruskal et al a showed that for m n and p n the n log m problem can b e solved using p pro cessors in time O ie with a timepro cessor p log np n log m pro duct of O The space requirements of the algorithm are pn n for arbitrary lognp xed which makes the algorithm impractical for values of p close to n Algorithms that are not aicted by this problem were describ ed by Cole and Vishkin remark following Theorem and by Wagner and Han For nm p nlog n they use O n space log m n and also sort in O time with p pro cessors All three algorithms are optimal if p log np O m np However let us now fo cus on the probably most interesting case of m n O combined with running times of log n Since the running time is at least np it is easy to see that the timepro cessor pro duct of the algorithms ab ove in this case is b ounded only by O n log mlog log n ie the integer sorting algorithms are more ecient than algorithms for general comparisonbased sorting which can b e done in O log n time using O n log n op erations Ajtai et al Cole by a factor of at most log log n to b e compared with the p otential maximum gain of log n This is true even of a more recent EREW PRAM algorithm by Ra jasekaran and Sen that sorts n integers in the range n stably in O log n log log n time using O n log n log log n op erations and O n space We describ e an EREW PRAM algorithm for sorting n integers in the range n stably that exhibits a tradeo b etween sp eed and eciency The minimum running time is log n log log n and for this running time the algorithm essentially coincides with that of Ra jasekaran and Sen Allowing more time however we can sort with fewer op erations down to a minimum p p of n log n log log n reached for a running time of log n log log n In general for p any given t log n log log n the algorithm can sort in O t time using O n log n log log n log n t op erations Run at the slowest p oint of its tradeo curve the algorithm is more p ecient than the algorithms discussed ab ove by a factor of log nlog log n On the CREW PRAM we obtain a much steep er tradeo For all t log n our algorithm p tlog n sorts n integers in the range n in O t time using O n log n log n op erations the p minimum numb er of op erations of n log n is reached for a running time of log n log log n We also consider the problem of sorting integers of arbitrary size on the CREW PRAM and describ e a reduction of this problem to that of sorting n integers in the range n The reduction p is randomized and uses O log n time and O n log n op erations with high probability It is based on the fusion trees of Fredman and Willard and was discovered indep endently by Raman a The algorithms ab ove all abide by the standard convention that the word length available to sort n integers in the range m is log n m bits ie that unittime op erations on integers O O of size n m are provided but that all integers manipulated must b e of size n m A numb er of pap ers have explored the implications of allowing a larger word length Paul and Simon and Kirkpatrick and Reisch demonstrated that with no restrictions on the word length at all arbitrary integers can b e sorted in linear sequential time Hagerup and Shen showed that in fact a word length of ab out O n log n log m bits suces to sort n integers in the range m in O n sequential time or in O log n time on an EREW PRAM with O nlog n pro cessors The practical value of such results is doubtful b ecause of the unrealistic assumptions Hardly any computer has a word length comparable to typical input sizes We show that n integers in the range m can b e sorted with a linear timepro cessor pro duct in O log n time on an EREW PRAM with a word length of O log n log log n log m bits At the price of a mo derate increase in the running time this greatly improves the known upp er b ound on the word length necessary to sort integers with a linear timepro cessor pro duct even sequentially Another p erhaps more illuminatin g p ersp ective is obtained by noting that providing a PRAM with a large word length amounts to adding more parallelism of a more restrictive kind A single register of many bits is akin to a whole SIMD machine but without the imp ortant ability to let individual pro cessors not participate in the current step As suggested by this view using a word length that allows k integers to b e stored in each word p otentially could decrease the running time by a factor of up to k What our algorithm actually achieves is a reduction in the running time by a factor of k log k curiously it do es so by calling a sequential version of an originally parallel algorithm as a subroutine This can b e interpreted as saying that if one happ ens to sort integers of which several will t in one word then advantage can b e taken of this fact Our result also oers evidence that algorithms using a nonstandard word length should not hastily b e discarded as unfeasible and b eyond practical relevance Following the conference publication of the results rep orted here our work was shown to have unexp ected consequences in a purely sequential setting In particular one of our algorithms is an essential ingredient in an algorithm of Andersson et al that sorts n w bit integers in O n log log n time on a w bit RAM for arbitrary w log n Other pap ers building directly or indirectly on techniques and concepts intro duced here include Andersson and Thorup Preliminaries A PRAM is a synchronous parallel machine consisting of pro cessors numb ered and a global memory accessible to all pro cessors An EREW PRAM disallows concurrent access to a memory cell by more than one pro cessor a CREW PRAM allows concurrent reading but not concurrent writing and a CRCW PRAM allows b oth concurrent reading and concurrent writing We assume an instruction set that includes addition subtraction comparison unrestricted shift as well as the bitwise Bo

Load more