In-Place Merging Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Analysis and Synthesis of a Parallel Sorting Engine Redacted for Privacy Abstract Approv, John M
AN ABSTRACT OF THE THESIS OF Byoungchul Ahn for the degree of Doctor of Philosophy in Electrical and Computer Engineering, presented on May 3. 1989. Title: The Analysis and Synthesis of a Parallel Sorting Engine Redacted for Privacy Abstract approv, John M. Murray / Thisthesisisconcerned withthe development of a unique parallelsort-mergesystemsuitablefor implementationinVLSI. Two new sorting subsystems, a high performance VLSI sorter and a four-waymerger,werealsorealizedduringthedevelopment process. In addition, the analysis of several existing parallel sorting architectures and algorithms was carried out. Algorithmic time complexity, VLSI processor performance, and chiparearequirementsfortheexistingsortingsystemswere evaluated.The rebound sorting algorithm was determined to be the mostefficientamongthoseconsidered. The reboundsorter algorithm was implementedinhardware asasystolicarraywith external expansion capability. The second phase of the research involved analyzing several parallel merge algorithms andtheirbuffer management schemes. The dominant considerations for this phase of the research were the achievement of minimum VLSI chiparea,design complexity, and logicdelay. Itwasdeterminedthattheproposedmerger architecture could be implemented inseveral ways. Selecting the appropriate microarchitecture for the merger, given the constraints of chip area and performance, was the major problem.The tradeoffs associated with this process are outlined. Finally,apipelinedsort-merge system was implementedin VLSI by combining a rebound sorter -
Efficient Algorithms for a Mesh-Connected Computer With
Efficient Algorithms for a Mesh-Connected Computer with Additional Global Bandwidth by Yujie An A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in the University of Michigan 2019 Doctoral Committee: Professor Quentin F. Stout, Chair Associate Professor Kevin J. Compton Professor Seth Pettie Professor Martin J. Strauss Yujie An [email protected] ORCID iD: 0000-0002-2038-8992 ©Yujie An 2019 Acknowledgments My researches are done with the help of many people. First I would like to thank my advisor, Quentin F. Stout, who introduced me to most of the topics discussed here, and helped me throughout my Ph.D. career at University of Michigan. I would also like to thank my thesis committee members, Kevin J. Compton, Seth Pettie and Martin J. Strauss, for their invaluable advice during the dissertation process. To my parents, thank you very much for your long-term support. All my achievements cannot be accomplished without your loves. To all my best friends, thank you very much for keeping me happy during my life. ii TABLE OF CONTENTS Acknowledgments ................................... ii List of Figures ..................................... v List of Abbreviations ................................. vii Abstract ......................................... viii Chapter 1 Introduction ..................................... 1 2 Preliminaries .................................... 3 2.1 Our Model..................................3 2.2 Related Work................................5 3 Basic Operations .................................. 8 3.1 Broadcast, Reduction and Scan.......................8 3.2 Rotation................................... 11 3.3 Sparse Random Access Read/Write..................... 12 3.4 Optimality.................................. 17 4 Sorting and Selecting ................................ 18 4.1 Sort..................................... 18 4.1.1 Sort in the Worst Case....................... 18 4.1.2 Sort when Only k Values are Out of Order............ -
Instructor's Manual
Instructor’s Manual Vol. 2: Presentation Material CCC Mesh/Torus Butterfly !*#? Sea Sick Hypercube Pyramid Behrooz Parhami This instructor’s manual is for Introduction to Parallel Processing: Algorithms and Architectures, by Behrooz Parhami, Plenum Series in Computer Science (ISBN 0-306-45970-1, QA76.58.P3798) 1999 Plenum Press, New York (http://www.plenum.com) All rights reserved for the author. No part of this instructor’s manual may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission. Contact the author at: ECE Dept., Univ. of California, Santa Barbara, CA 93106-9560, USA ([email protected]) Introduction to Parallel Processing: Algorithms and Architectures Instructor’s Manual, Vol. 2 (4/00), Page iv Preface to the Instructor’s Manual This instructor’s manual consists of two volumes. Volume 1 presents solutions to selected problems and includes additional problems (many with solutions) that did not make the cut for inclusion in the text Introduction to Parallel Processing: Algorithms and Architectures (Plenum Press, 1999) or that were designed after the book went to print. It also contains corrections and additions to the text, as well as other teaching aids. The spring 2000 edition of Volume 1 consists of the following parts (the next edition is planned for spring 2001): Vol. 1: Problem Solutions Part I Selected Solutions and Additional Problems Part II Question Bank, Assignments, and Projects Part III Additions, Corrections, and Other Updates Part IV Sample Course Outline, Calendar, and Forms Volume 2 contains enlarged versions of the figures and tables in the text, in a format suitable for use as transparency masters. -
Parallel Sorting Algorithms + Topic Overview
+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in Sorting on Parallel Computers n Sorting Networks n Bubble Sort and its Variants n Quicksort n Bucket and Sample Sort n Other Sorting Algorithms + Sorting: Overview n One of the most commonly used and well-studied kernels. n Sorting can be comparison-based or noncomparison-based. n The fundamental operation of comparison-based sorting is compare-exchange. n The lower bound on any comparison-based sort of n numbers is Θ(nlog n) . n We focus here on comparison-based sorting algorithms. + Sorting: Basics What is a parallel sorted sequence? Where are the input and output lists stored? n We assume that the input and output lists are distributed. n The sorted list is partitioned with the property that each partitioned list is sorted and each element in processor Pi's list is less than that in Pj's list if i < j. + Sorting: Parallel Compare Exchange Operation A parallel compare-exchange operation. Processes Pi and Pj send their elements to each other. Process Pi keeps min{ai,aj}, and Pj keeps max{ai, aj}. + Sorting: Basics What is the parallel counterpart to a sequential comparator? n If each processor has one element, the compare exchange operation stores the smaller element at the processor with smaller id. This can be done in ts + tw time. n If we have more than one element per processor, we call this operation a compare split. Assume each of two processors have n/p elements. n After the compare-split operation, the smaller n/p elements are at processor Pi and the larger n/p elements at Pj, where i < j. -
Quadratic Time Algorithms Appear to Be Optimal for Sorting Evolving Data
Quadratic Time Algorithms Appear to be Optimal for Sorting Evolving Data Juan Jos´eBesa William E. Devanny David Eppstein Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Univ. of California, Irvine Univ. of California, Irvine Univ. of California, Irvine Irvine, CA 92697 USA Irvine, CA 92697 USA Irvine, CA 92697 USA [email protected] [email protected] [email protected] Michael T. Goodrich Timothy Johnson Dept. of Computer Science Dept. of Computer Science Univ. of California, Irvine Univ. of California, Irvine Irvine, CA 92697 USA Irvine, CA 92697 USA [email protected] [email protected] Abstract 1.1 The Evolving Data Model One scenario We empirically study sorting in the evolving data model. where the Knuthian model doesn't apply is for In this model, a sorting algorithm maintains an ap- applications where the input data is changing while an proximation to the sorted order of a list of data items algorithm is processing it, which has given rise to the while simultaneously, with each comparison made by the evolving data model [1]. In this model, input changes algorithm, an adversary randomly swaps the order of are coming so fast that it is hard for an algorithm to adjacent items in the true sorted order. Previous work keep up, much like Lucy in the classic conveyor belt 1 studies only two versions of quicksort, and has a gap scene in the TV show, I Love Lucy. Thus, rather than between the lower bound of Ω(n) and the best upper produce a single output, as in the Knuthian model, bound of O(n log log n). -
View Publication
Patience is a Virtue: Revisiting Merge and Sort on Modern Processors Badrish Chandramouli and Jonathan Goldstein Microsoft Research {badrishc, jongold}@microsoft.com ABSTRACT In particular, the vast quantities of almost sorted log-based data The vast quantities of log-based data appearing in data centers has appearing in data centers has generated this interest. In these generated an interest in sorting almost-sorted datasets. We revisit scenarios, data is collected from many servers, and brought the problem of sorting and merging data in main memory, and show together either immediately, or periodically (e.g. every minute), that a long-forgotten technique called Patience Sort can, with some and stored in a log. The log is then typically sorted, sometimes in key modifications, be made competitive with today’s best multiple ways, according to the types of questions being asked. If comparison-based sorting techniques for both random and almost those questions are temporal in nature [7][17][18], it is required that sorted data. Patience sort consists of two phases: the creation of the log be sorted on time. A widely-used technique for sorting sorted runs, and the merging of these runs. Through a combination almost sorted data is Timsort [8], which works by finding of algorithmic and architectural innovations, we dramatically contiguous runs of increasing or decreasing value in the dataset. improve Patience sort for both random and almost-ordered data. Of Our investigation has resulted in some surprising discoveries about particular interest is a new technique called ping-pong merge for a mostly-ignored 50-year-old sorting technique called Patience merging sorted runs in main memory. -
Fast Sorting for Exact OIT of Complex Scenes
CGI2014 manuscript No. 86 Fast Sorting for Exact OIT of Complex Scenes Pyarelal Knowles · Geoff Leach · Fabio Zambetta June 2014 Abstract Exact order independent transparency for example sorting per-object. This is because com- (OIT) techniques capture all fragments during ras- pletely sorting per-triangle is expensive and difficult, terization. The fragments are then sorted per-pixel requiring splitting triangles in cases of intersection or by depth and composited them in order using alpha cyclical overlapping. Order-independent transparency transparency. The sorting stage is a bottleneck for (OIT) allows geometry to be rasterized in arbitrary or- high depth complexity scenes, taking 70{95% of the der, sorting surfaces as fragments in image space. Our total time for those investigated. In this paper we show focus is exact OIT, for which sorting is central, although that typical shader based sorting speed is impacted we note there are approximate OIT techniques which by local memory latency and occupancy. We present use data reduction and trade accuracy for speed [17, and discuss the use of both registers and an external 13]. merge sort in register-based block sort to better use the memory hierarchy of the GPU for improved OIT rendering performance. This approach builds upon backwards memory allocation, achieving an OIT ren- dering speed up to 1:7× that of the best previous method and 6:3× that of the common straight forward OIT implementation. In some cases the sorting stage is reduced to no longer be the dominant OIT component. Keywords sorting · OIT · transparency · shaders · performance · registers · register-based block sort Figure 1: Power plant model, with false colouring to 1 Introduction show backwards memory allocation intervals. -
From Merge Sort to Timsort Nicolas Auger, Cyril Nicaud, Carine Pivoteau
Merge Strategies: from Merge Sort to TimSort Nicolas Auger, Cyril Nicaud, Carine Pivoteau To cite this version: Nicolas Auger, Cyril Nicaud, Carine Pivoteau. Merge Strategies: from Merge Sort to TimSort. 2015. hal-01212839v2 HAL Id: hal-01212839 https://hal-upec-upem.archives-ouvertes.fr/hal-01212839v2 Preprint submitted on 9 Dec 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Merge Strategies: from Merge Sort to TimSort Nicolas Auger, Cyril Nicaud, and Carine Pivoteau Universit´eParis-Est, LIGM (UMR 8049), F77454 Marne-la-Vall´ee,France fauger,nicaud,[email protected] Abstract. The introduction of TimSort as the standard algorithm for sorting in Java and Python questions the generally accepted idea that merge algorithms are not competitive for sorting in practice. In an at- tempt to better understand TimSort algorithm, we define a framework to study the merging cost of sorting algorithms that relies on merges of monotonic subsequences of the input. We design a simpler yet competi- tive algorithm in the spirit of TimSort based on the same kind of ideas. As a benefit, our framework allows to establish the announced running time of TimSort, that is, O(n log n). -
An Efficient Implementation of Batcher's Odd-Even Merge
254 IEEE TRANSACTIONS ON COMPUTERS, VOL. c-32, NO. 3, MARCH 1983 An Efficient Implementation of Batcher's Odd- Even Merge Algorithm and Its Application in Parallel Sorting Schemes MANOJ KUMAR, MEMBER, IEEE, AND DANIEL S. HIRSCHBERG Abstract-An algorithm is presented to merge two subfiles ofsize mitted from one processor to another only through this inter- n/2 each, stored in the left and the right halves of a linearly connected connection pattern. The processors connected directly by the processor array, in 3n /2 route steps and log n compare-exchange A steps. This algorithm is extended to merge two horizontally adjacent interconnection pattern will be referred as neighbors. pro- subfiles of size m X n/2 each, stored in an m X n mesh-connected cessor can communicate with its neighbor with a route in- processor array in row-major order, in m + 2n route steps and log mn struction which executes in t. time. The processors also have compare-exchange steps. These algorithms are faster than their a compare-exchange instruction which compares the contents counterparts proposed so far. of any two ofeach processor's internal registers and places the Next, an algorithm is presented to merge two vertically aligned This executes subfiles, stored in a mesh-connected processor array in row-major smaller of them in a specified register. instruction order. Finally, a sorting scheme is proposed that requires 11 n route in t, time. steps and 2 log2 n compare-exchange steps to sort n2 elements stored Illiac IV has a similar architecture [1]. The processor at in an n X n mesh-connected processor array. -
Comparison Sorts Name Best Average Worst Memory Stable Method Other Notes Quicksort Is Usually Done in Place with O(Log N) Stack Space
Comparison sorts Name Best Average Worst Memory Stable Method Other notes Quicksort is usually done in place with O(log n) stack space. Most implementations on typical in- are unstable, as stable average, worst place sort in-place partitioning is case is ; is not more complex. Naïve Quicksort Sedgewick stable; Partitioning variants use an O(n) variation is stable space array to store the worst versions partition. Quicksort case exist variant using three-way (fat) partitioning takes O(n) comparisons when sorting an array of equal keys. Highly parallelizable (up to O(log n) using the Three Hungarian's Algorithmor, more Merge sort worst case Yes Merging practically, Cole's parallel merge sort) for processing large amounts of data. Can be implemented as In-place merge sort — — Yes Merging a stable sort based on stable in-place merging. Heapsort No Selection O(n + d) where d is the Insertion sort Yes Insertion number of inversions. Introsort No Partitioning Used in several STL Comparison sorts Name Best Average Worst Memory Stable Method Other notes & Selection implementations. Stable with O(n) extra Selection sort No Selection space, for example using lists. Makes n comparisons Insertion & Timsort Yes when the data is already Merging sorted or reverse sorted. Makes n comparisons Cubesort Yes Insertion when the data is already sorted or reverse sorted. Small code size, no use Depends on gap of call stack, reasonably sequence; fast, useful where Shell sort or best known is No Insertion memory is at a premium such as embedded and older mainframe applications. Bubble sort Yes Exchanging Tiny code size. -
Sequential Merge Sort 2 Let's Make Mergesort Parallel
CSE341T/CSE549T 09/15/2014 Merge Sort Lecture 6 Today we are going to turn to the classic problem of sorting. Recall that given an array of numbers, a sorting algorithm permutes this array so that they are arranged in an increasing order. You are already familiar with basics of the sorting algorithm we are learning today, but we will see how to make it parallel today. 1 Reminder: Sequential Merge Sort As a reminder, here is the pseudo-code for sequential Merge Sort. MergeSort(A; n) 1 if n = 1 2 then return A 3 Divide A into two Aleft and Aright each of size n=2 0 4 Aleft =MERGESORT(Aleft; n=2) 0 5 Aright =MERGESORT(Aright; n=2) 6 Merge the two halves into A0 7 return A0 The running time of the merge procedure is Θ(n). The overall running time (work) of the entire computation is W (n) = 2W (n=2) + Θ(n) = Θ(n lg n). 2 Let’s Make Mergesort Parallel The most obvious thing to do to make the merge sort algorithm parallel is to make the recursive calls in parallel. 1 MergeSort(A; n) 1 if n = 1 2 then return A 3 Divide A into two Aleft and Aright each of size n=2 0 4 Aleft spawn MERGESORT(Aleft; n=2) 0 5 Aright MERGESORT(Aright; n=2) 6 sync 7 Merge the two halves into A0 8 return A0 The work of the algorithm remains unchanged. What is the span? The recurrence is SMergeSort(n) = SMergeSort(n=2) + Smerge(n) Since we are merging the arrays sequentially, the span of the merge call is Θ(n) and the recurrence solves to SMergeSort(n) = Θ(n). -
An Efficient Methodology to Sort Large Volume of Data
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 03, MARCH 2020 ISSN 2277-8616 An Efficient Methodology to Sort Large Volume of Data S.Bharathiraja, G.Suganya, M.Premalatha, R.Kumar, Sakkaravarthi Ramanathan Abstract—Sorting is a basic data processing technique that is used in all day-day applications. To cope up with technological advancement and extensive increase in data acquisition and storage, Sorting requires improvement to minimize time taken for processing, response time and space required for processing. Various sorting techniques have been proposed by researchers but the applicability of those techniques for large volume of data is not assured. The main focus of this work is to propose a new sorting technique titled Neutral Sort, to reduce the time taken for sorting and decrease the response time for a large volume of data. Neutral Sort is designed as an enhancement to Merge Sort. The advantages and disadvantages of existing techniques in terms of their performance, efficiency and throughput are discussed and the comparative study shows that Neutral sort drastically reduces time taken for sorting and hence reduces the response time. Index Terms—Chunking, Banding, Sorting, Efficiency, Merge sort, Bigdata Sorting, Neutral Sort. —————————— —————————— 1 INTRODUCTION N data science, ordering of elements is very much useful in divided chunks may already be in sorted manner and hence I various applications and data base querying. Different doesn’t need further split, we have proposed a modified approaches have been proposed by researchers for Merge sort algorithm titled “Neutral sort” to reduce time performing the sorting operation effectively based on the type complexity and hence to reduce response time.