A Taxonomy of Parallel Sorting
Total Page:16
File Type:pdf, Size:1020Kb
A Taxonomy of Parallel Sorting DINA BITTON Department of Applied Mathematics, We~zmannInstitute, Rehovot, Israel DAVID J. DeWITT Computer Science Department, Unwerstty o[ Wisconsin, Madison, Wisconsin 53706 DAVID K. HSIAO AND JAISHANKAR MENON Computer and Information Science Department, The Ohio State University, Columbus, Ohio 43210 We propose a taxonomy of parallel sorting that encompasses a broad range of array- and file-sorting algorithms. We analyze how research on parallel sorting has evolved, from the earliest sorting networks to shared memory algorithms and VLSI sorters. In the context of sorting networks, we describe two fundamental parallel merging schemes: the odd-even and the bitonic merge. We discuss sorting algorithms that evolved from these merging schemes for parallel computers, whose processors communicate through interconnection networks such as the perfect shuffle, the mesh, and a number of other sparse networks. Following our discussion of network sorting algorithms, we describe how faster algorithms have been derived from parallel enumeration sorting schemes, where, with a shared memory model of parallel computation, keys are first ranked and then rearranged according to their rank. Parallel sorting algorithms are evaluated according to several criteria related to both the time complexity of an algorithm and its feasibility from the viewpoint of computer architecture. We show that, in addition to attractive communication schemes, network sorting algorithms have nonadaptive schedules that make them suitable for implementation. In particular, they are easily generalized to block sorting algorithms, which utilize limited parallelism to solve large sorting problems. We also address the problem of sorting large mass-storage files in parallel, using modified disk devices or intelligent bubble memory. We conclude by mentioning VLSI sorting as an active and promising direction for research on parallel sorting. Categories and Subject Descriptors: B.3 [Hardware]: Memory Structures; B.4 [Hardware]: Input/Output and Data Communications; B.7.1 [Integrated Circuits]: Types and Design Styles; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems General Terms: Algorithms Additional Key Words and Phrases: Block sorting, bubble memory, external sorting, hardware sorters, internal sorting, limited parallelism, merging, parallel sorting, sorting networks D. Bitton's present address is Department of Computer Science, Cornell University, Ithaca, New York 14853. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1984 ACM 0360-0300/84/0900-0287 $00.75 Computing Surveys, Vol. 16, No. 3, September 1984 288 • D. Bitton, D. J. De Witt, D. K. Hsiao, and J. Menon CONTENTS efficient serial algorithms are known which can sort n values in at most O(n log n) comparisons, the theoretic lower bound for this problem [Knuth 1973]. In addition to INTRODUCTION their time complexity, various other prop- 1 PARALLELIZING SERIAL SORTING erties of these serial internal sorting algo- ALGORITHMS rithms have also been investigated. In 1 1 The Odd-Even Transposition Sort particular, sorting algorithms have been 1 2 A Parallel Tree-Sort Algorithm 2 NETWORK SORTING ALGORITHMS evaluated with respect to time-memory 2 1 Sorting Networks trade-offs (the amount of additional memory 2 2 Sorting on an SIMD Machine required to run the algorithm in addition 2.3 Summary to the memory storing the initial sequence), 3. SHARED MEMORY PARALLEL SORTING ALGORITHMS stability (the requirement that equal ele- 3.1 A Modified Sorting Network ments retain their original relative order), 3 2 Faster Parallel Merging Algorithms and sensitivity to the initial distribution of 3 3 Bucket Sorting the values (in particular, best case and 3 4 Sorting by Enumeration worst case complexity have been investi- 3 5 Summary 4 BLOCK SORTING ALGORITHMS gated). 41 Two-Way Merge-Spht In the last decade, parallel processing has 4 2 Bltonic Merge-Exchange added a new dimension to research on in- 5. EXTERNAL SORTING ALGORITHMS ternal sorting algorithms. Several models 5.1 Parallel Tape Sorting 5.2 Parallel Disk Sorting of parallel computation have been consid- 5.3 Analysis of the Parallel External ered, each with its own idea of "contiguous" Sorting Algorithm memory locations and definition of the way 6 HARDWARE SORTERS multiple processors access memory. In or- 61 The Rebound Sorter der to state the problem of parallel sorting 6 2 The Up-Down Sorter 6 3 Sorting within Bubble Memory clearly, we must first define what is meant 6 4 Summary and Recent Results by a sorted sequence in a parallel processor. 7 CONCLUSIONS AND OPEN PROBLEMS When processors share a common memory, REFERENCES the idea of contiguous memory locations in a parallel processor is identical to that in a A v serial processor. Thus, as in the serial case, the time complexity of a sorting algorithm INTRODUCTION can be expressed in terms of number of comparisons (performed in parallel by all Sorting in computer terminology is defined or some of the processors) and internal as the process of rearranging a sequence of memory moves. On the other hand, when values in ascending or descending order. processors do not share memory and com- Computer programs such as compilers or municate along the lines of an interconnec- editors often'choose to sort tables and lists tion network, definition of the sorting prob- of symbols stored in memory in order to lem requires a convention to order the enhance the speed and simplicity of algo- processors and thus the union of their rithms used to access them (for the search local memory locations. When parallel pro- or insertion of additional elements, for in- cessors are used, the time complexity of a stance). Because of both their practical im- sorting algorithm is expressed in terms of portance and theoretical interest, algo- parallel comparisons and exchanges be- rithms for sorting values stored in random tween processors that are adjacent in the access memory (internal sorting) have been interconnecting network. the focus of extensive research on algo- Shared memory models of parallel com- rithms. First, serial sorting algorithms were putation have been instrumental in inves- investigated. Then, with the advent of par- tigating the intrinsic parallelism that exists allel processing, parallel sorting algorithms in the sorting problem. Whereas the first became a very active area of research. Many results on parallel sorting were related to Computing Surveys, Voi. 16, No. 3, September 1984 A Taxonomy o/Parallel Sorting • 289 sorting networks [Batcher 1968], faster in research on new external sorting parallel sorting algorithms have been pro- schemes. The reasons for the relatively posed for theoretical models of parallel small amount of research on parallel exter- processors with shared memory [Hirsch- nal sorting [Bitton-Friedland 1982; Even berg 1978; Preparata 1978]. A chain of re- 1974] are most likely related to the neces- sults in shared memory computation has sity of adapting such schemes to mass- led to a number of parallel sorting schemes storage device characteristics. that exhibit a O(log n) time complexity. It may seem that advances in computer Typically, the parallel sorting problem is technology, such as the advent of intelli- expressed as that of sorting n numbers with gent or associative memories, could elimi- n or more processors, all sharing a large nate or reduce the use of sorting as a tool common memory, so that they may access for performing other operations. For ex- with various degrees of contention (e.g., ample, when sorting is used in order to parallel reads and parallel writes with ar- facilitate searching, one may advocate that bitration). Research on paralle ! sorting has associative memories will suppress the need been largely concerned with purely theoret- of sorting. However, associative stores re- ical issues, and it is only recently that fea- main too expensive for widespread use, es- sibility issues such as limited parallelism pecially when large volumes of data are or, in the context of very large scale inte- involved. In the case where sorting is re- gration (VLSI) sorting, trade-offs between quired for the sole purpose of ordering data, hardware complexity (expressed in terms the only way to reduce sorting time is to of chip area) and time complexity are being develop fast parallel sorting schemes, pos- addressed. sibly by integrating sorting capability into In addition to using sorting algorithms mass-storage memory [Chen et al. 1978; to rearrange numbers in memory, sorting Chung et al. 1980]. is often advocated in the context of infor- In this paper, we propose a taxonomy of mation processing. In this context, sorting parallel sorting that includes both internal is used to order a file of data records, stored and external parallel sorting algorithms. on a mass-storage device. The records are We analyze how research on parallel sort- ordered with respect to the value of a key, ing has evolved from the earliest sorting which might be a single field or the con- networks to shared memory model algo- catenation of several fields in the record. rithms and VLSI sorters. We attempt to Files are sorted either to deliver well-orga- classify a broad range of parallel sorting nized output to a user (e.g., a telephone algorithms according to various criteria, in- directory), or as an intermediate step in the cluding time efficiency and the architec- execution of a complex database operation tural requirements upon which they de- [Bitton and DeWitt 1983; Selinger et al. pend. The goal of this study is to provide a 1979]. Because of memory limitations file basic understanding and a unified view of sorting cannot be performed in memory the body of research on parallel sorting.