Fast Sorting for Exact OIT of Complex Scenes
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  Efficient Algorithms for a Mesh-Connected Computer WithEfficient Algorithms for a Mesh-Connected Computer with Additional Global Bandwidth by Yujie An A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in the University of Michigan 2019 Doctoral Committee: Professor Quentin F. Stout, Chair Associate Professor Kevin J. Compton Professor Seth Pettie Professor Martin J. Strauss Yujie An [email protected] ORCID iD: 0000-0002-2038-8992 ©Yujie An 2019 Acknowledgments My researches are done with the help of many people. First I would like to thank my advisor, Quentin F. Stout, who introduced me to most of the topics discussed here, and helped me throughout my Ph.D. career at University of Michigan. I would also like to thank my thesis committee members, Kevin J. Compton, Seth Pettie and Martin J. Strauss, for their invaluable advice during the dissertation process. To my parents, thank you very much for your long-term support. All my achievements cannot be accomplished without your loves. To all my best friends, thank you very much for keeping me happy during my life. ii TABLE OF CONTENTS Acknowledgments ................................... ii List of Figures ..................................... v List of Abbreviations ................................. vii Abstract ......................................... viii Chapter 1 Introduction ..................................... 1 2 Preliminaries .................................... 3 2.1 Our Model..................................3 2.2 Related Work................................5 3 Basic Operations .................................. 8 3.1 Broadcast, Reduction and Scan.......................8 3.2 Rotation................................... 11 3.3 Sparse Random Access Read/Write..................... 12 3.4 Optimality.................................. 17 4 Sorting and Selecting ................................ 18 4.1 Sort..................................... 18 4.1.1 Sort in the Worst Case....................... 18 4.1.2 Sort when Only k Values are Out of Order............
- 
												  Instructor's ManualInstructor’s Manual Vol. 2: Presentation Material CCC Mesh/Torus Butterfly !*#? Sea Sick Hypercube Pyramid Behrooz Parhami This instructor’s manual is for Introduction to Parallel Processing: Algorithms and Architectures, by Behrooz Parhami, Plenum Series in Computer Science (ISBN 0-306-45970-1, QA76.58.P3798) 1999 Plenum Press, New York (http://www.plenum.com) All rights reserved for the author. No part of this instructor’s manual may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission. Contact the author at: ECE Dept., Univ. of California, Santa Barbara, CA 93106-9560, USA ([email protected]) Introduction to Parallel Processing: Algorithms and Architectures Instructor’s Manual, Vol. 2 (4/00), Page iv Preface to the Instructor’s Manual This instructor’s manual consists of two volumes. Volume 1 presents solutions to selected problems and includes additional problems (many with solutions) that did not make the cut for inclusion in the text Introduction to Parallel Processing: Algorithms and Architectures (Plenum Press, 1999) or that were designed after the book went to print. It also contains corrections and additions to the text, as well as other teaching aids. The spring 2000 edition of Volume 1 consists of the following parts (the next edition is planned for spring 2001): Vol. 1: Problem Solutions Part I Selected Solutions and Additional Problems Part II Question Bank, Assignments, and Projects Part III Additions, Corrections, and Other Updates Part IV Sample Course Outline, Calendar, and Forms Volume 2 contains enlarged versions of the figures and tables in the text, in a format suitable for use as transparency masters.
- 
												  Quadratic Time Algorithms Appear to Be Optimal for Sorting Evolving DataQuadratic Time Algorithms Appear to be Optimal for Sorting Evolving Data Juan Jos´eBesa William E. Devanny David Eppstein Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Univ. of California, Irvine Univ. of California, Irvine Univ. of California, Irvine Irvine, CA 92697 USA Irvine, CA 92697 USA Irvine, CA 92697 USA [email protected] [email protected] [email protected] Michael T. Goodrich Timothy Johnson Dept. of Computer Science Dept. of Computer Science Univ. of California, Irvine Univ. of California, Irvine Irvine, CA 92697 USA Irvine, CA 92697 USA [email protected] [email protected] Abstract 1.1 The Evolving Data Model One scenario We empirically study sorting in the evolving data model. where the Knuthian model doesn't apply is for In this model, a sorting algorithm maintains an ap- applications where the input data is changing while an proximation to the sorted order of a list of data items algorithm is processing it, which has given rise to the while simultaneously, with each comparison made by the evolving data model [1]. In this model, input changes algorithm, an adversary randomly swaps the order of are coming so fast that it is hard for an algorithm to adjacent items in the true sorted order. Previous work keep up, much like Lucy in the classic conveyor belt 1 studies only two versions of quicksort, and has a gap scene in the TV show, I Love Lucy. Thus, rather than between the lower bound of Ω(n) and the best upper produce a single output, as in the Knuthian model, bound of O(n log log n).
- 
												  Comparison Sorts Name Best Average Worst Memory Stable Method Other Notes Quicksort Is Usually Done in Place with O(Log N) Stack SpaceComparison sorts Name Best Average Worst Memory Stable Method Other notes Quicksort is usually done in place with O(log n) stack space. Most implementations on typical in- are unstable, as stable average, worst place sort in-place partitioning is case is ; is not more complex. Naïve Quicksort Sedgewick stable; Partitioning variants use an O(n) variation is stable space array to store the worst versions partition. Quicksort case exist variant using three-way (fat) partitioning takes O(n) comparisons when sorting an array of equal keys. Highly parallelizable (up to O(log n) using the Three Hungarian's Algorithmor, more Merge sort worst case Yes Merging practically, Cole's parallel merge sort) for processing large amounts of data. Can be implemented as In-place merge sort — — Yes Merging a stable sort based on stable in-place merging. Heapsort No Selection O(n + d) where d is the Insertion sort Yes Insertion number of inversions. Introsort No Partitioning Used in several STL Comparison sorts Name Best Average Worst Memory Stable Method Other notes & Selection implementations. Stable with O(n) extra Selection sort No Selection space, for example using lists. Makes n comparisons Insertion & Timsort Yes when the data is already Merging sorted or reverse sorted. Makes n comparisons Cubesort Yes Insertion when the data is already sorted or reverse sorted. Small code size, no use Depends on gap of call stack, reasonably sequence; fast, useful where Shell sort or best known is No Insertion memory is at a premium such as embedded and older mainframe applications. Bubble sort Yes Exchanging Tiny code size.
- 
												  Analysis of Fast Insertion SortAnalysis of Fast Insertion Sort A Technical Report presented to the faculty of the School of Engineering and Applied Science University of Virginia by Eric Thomas April 24, 2021 On my honor as a University student, I have neither given nor received unauthorized aid on this assignment as defined by the Honor Guidelines for Thesis-Related Assignments. Eric Thomas Technical advisor: Lu Feng, Department of Computer Science Analysis of Fast Insertion Sort Efficient Variants of Insertion Sort Eric Thomas University of Virginia [email protected] ABSTRACT complexity of 푂(푛) and the latter suffers from complicated implementation. According to Faro et al. [1], there does not As a widespread problem in programming language and seem to exist an iterative, in-place, and online variant of algorithm design, many sorting algorithms, such as Fast Insertion Sort that manages to have a worst-case time Insertion Sort, have been created to excel at particular use complexity of 표(푛2). Originating from Faro et al. [1], a cases. While Fast Insertion Sort has been demonstrated to family of Insertion Sort variants, referred to as Fast have a lower running time than Hoare's Quicksort in many Insertion Sort, is the result of an endeavor to solve these practical cases, re-implementation is carried out to verify problems. those results again in C++ as well as Python, which involves comparing the running times of Fast Insertion Sort, Merge Sort, Heapsort, and Quicksort on random and 2 Related Work partially sorted arrays of different lengths. Furthermore, the After reviewing research by “Mishra et al. [2], Yang et al.
- 
												  Designing Efficient Sorting Algorithms for Manycore GpusDesigning Efficient Sorting Algorithms for Manycore GPUs Nadathur Satish Mark Harris Michael Garland University of California, Berkeley NVIDIA Corporation Abstract cores, typically augmented with vector units—are now commonplace and there is every indication that the trend We describe the design of high-performance parallel towards increasing parallelism will continue on towards radix sort and merge sort routines for manycore GPUs, tak- “manycore” chips that provide far higher degrees of par- ing advantage of the full programmability offered by CUDA. allelism. GPUs have been at the leading edge of this drive Our radix sort is the fastest GPU sort reported in the liter- towards increased chip-level parallelism for some time and ature, and is up to 4 times faster than the graphics-based are already fundamentally manycore processors. Current GPUSort. It is also highly competitive with CPU imple- NVIDIA GPUs, for example, contain up to 128 scalar pro- mentations, being up to 3.5 times faster than comparable cessing elements per chip [19], and in contrast to earlier routines on an 8-core 2.33 GHz Intel Core2 Xeon system. generations of GPUs, they can be programmed directly in Our merge sort is the fastest published comparison-based C using CUDA [20, 21]. GPU sort and is also competitive with multi-core routines. In this paper, we describe the design of efficient sort- To achieve this performance, we carefully design our al- ing algorithms for such manycore GPUs using CUDA. The gorithms to expose substantial fine-grained parallelism and programming flexibility provided by CUDA and the current decompose the computation into independent tasks that per- generation of GPUs allows us to consider a much broader form minimal global communication.
- 
												  Thesis Submitted for the Degree of Doctor of Philosophy at the Australian National UniversityEfficient Algorithms for Sorting and Synchronization Andrew Tridgell A thesis submitted for the degree of Doctor of Philosophy at The Australian National University February 1999 Except where otherwise indicated, this thesis is my own original work. Andrew Tridgell February 1999 Till min kara¨ fru, Susan Acknowledgments The research that has gone into this thesis has been thoroughly enjoyable. That en- joyment is largely a result of the interaction that I have had with my supervisors, col- leagues and, in the case of rsync, the people who have tested and used the resulting software. I feel very privileged to have worked with my supervisors, Richard Brent, Paul Mackerras and Brendan McKay. To each of them I owe a great debt of gratitude for their patience, inspiration and friendship. Richard, Paul and Brendan have taught me a great deal about the field of Computer Science by sharing with me the joy of discovery and investigation that is the heart of research. I would also like to thank Bruce Millar and Iain Macleod, who supervised my initial research in automatic speech recognition. Through Bruce, Iain and the Com- puter Sciences Laboratory I made the transition from physicist to computer scientist, turning my hobby into a career. The Department of Computer Science and Computer Sciences Laboratory have provided an excellent environment for my research. I spent many enjoyable hours with department members and fellow students chatting about my latest crazy ideas over a cup of coffee. Without this rich environment I doubt that many of my ideas would have come to fruition. The Australian Telecommunications and Electronics Research Board, the Com- monwealth and the Australian National University were very generous in providing me with scholarship assistance.
- 
												  In-Place Merging AlgorithmsIn-Place Merging Algorithms Denham Coates-Evelyn Department of Computer Science Kings College, The Strand London WC2R 2LS, U.K. [email protected] January 2004 Abstract In this report we consider the problem of merging two sorted lists of m and n keys each in-place. We survey known techniques for this problem, focussing on correctness and the attributes of Stability and Practicality. We demonstrate a class of unstable in-place merge algorithms that uses block rearrangement and internal buffering that actually does not merge in the presence of sufficient duplicate keys of a given value. We show four relatively simple block sorting techniques that can be used to correct these algorithms. In addition, we show relatively simple and robust techniques that does stable local block merge followed by stable block sort to create a merge. Our internal merge is base on Kronrod’s method of internal buffering and block partitioning. Using block size of O(√m + n) we achieve complexity of no more than 1.5(m+n)+O(√m + n lg(m+n)) comparisons and 4(m+n)+O(√m + n lg(m+n)) data moves. Using block size of O((m + n)/ lg(m + n)) gives complexity of no more than m + n + o(m + n) comparisons and 5(m + n) + o(m + n) moves. Actual experimental results indicates 4.5(m + n) + o(m + n) moves. Our algorithm is stable except for the case of buffer permutation during the merge, its implementation is much less complex than the unstable algorithm given in [26] which does optimum comparisons and 3(m + n) + o(m + n) moves.
- 
												  Sorting by Generating the Sorting Permutation, and the Effect of Caching on SortingSorting by generating the sorting permutation, and the effect of caching on sorting. Arne Maus, [email protected] Department of Informatics University of Oslo Abstract This paper presents a fast way to generate the permutation p that defines the sorting order of a set of integer keys in an integer array ‘a’- that is: a[p[i]] is the i’th sorted element in ascending order. The motivation for using Psort is given along with its implementation in Java. This distribution based sorting algorithm, Psort, is compared to two comparison based algorithms, Heapsort and Quicksort, and two other distribution based algorithms, Bucket sort and Radix sort. The relative performance of the distribution sorting algorithms for arrays larger than a few thousand elements are assumed to be caused by how well they handle caching (both level 1 and level 2). The effect of caching is investigated, and based on these results, more cache friendly versions of Psort and Radix sort are presented and compared. Introduction Sorting is maybe the single most important algorithm performed by computers, and certainly one of the most investigated topics in algorithmic design. Numerous sorting algorithms have been devised, and the more commonly used are described and analyzed in any standard textbook in algorithms and data structures [Dahl and Belsnes, Weiss, Goodrich and Tamassia] or in standard reference works [Knuth, van Leeuwen]. New sorting algorithms are still being developed, like Flashsort [Neubert] and ‘The fastest sorting algorithm” [Nilsson@. The most influential sorting algorithm introduced since the 60’ies is no doubt the distribution based ‘Bucket’ sort which can be traced back to [Dobosiewicz].
- 
												  Chapter 7 SortingCHAPTER 7 SORTING All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data Structures in C”, Computer Science Press, 1992. 1 Sequential Search Example 44, 55, 12, 42, 94, 18, 06, 67 unsuccessful search – n+1 successful search n n+ 1 ∑ (i)/n= i= 1 2 2 Program:Sequential search int seqsearch( int list[ ], int searchnum, int n ) { /*search an array, list, that has n numbers. Return -1, if search num is not in the list */ int i; for (i=0; i<n && list[i] != searchnum ; i++) ; return (( i<n) ? i : -1); } 3 Program: Binary search int binsearch(element list[ ], int searchnum, int n) { /* search list [0], ..., list[n-1]*/ int left = 0, right = n-1, middle; while (left <= right) { middle = (left+ right)/2; switch (COMPARE(list[middle].key, searchnum)) { case -1: left = middle +1; break; case 0: return middle; case 1:right = middle - 1; } } O(log2n) return -1; } 4 Sorting Problem Definition – given (R0, R1, …, Rn-1), where Ri = key + data find a permutation , such that K(i-1) K(i), 0<i<n-1 sorted – K(i-1) K(i), 0<i<n-1 stable – if i < j and Ki = Kj then Ri precedes Rj in the sorted list internal sort vs. external sort criteria – # of key comparisons – # of data movements 5 Insertion Sort Insertion sort is a simple sorting algorithm that is appropriate for small inputs. Most common sorting technique used by card players. The list is divided into two parts: sorted and unsorted. In each pass, the first element of the unsorted part is picked up, transferred to the sorted sublist, and inserted at the appropriate place.
- 
											Contents Classification Comparison of Algorithms Popular SortingFrom Wikipedia, the free encyclopedia A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data to be in sorted lists; it is also often useful for canonicalizing data and for producing human- readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation (reordering) of the input. Further, the data is often taken to be in an array, which allows random access, rather than a list, which only allows sequential access, though often algorithms can be applied with suitable modification to either type of data. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Comparison sorting algorithms have a fundamental requirement of O(n log n) comparisons (some input sequences will require a multiple of n log n comparisons); algorithms not based on comparisons, such as counting sort, can have better performance. Although many consider sorting a solved problem—asymptotically optimal algorithms have been known since the mid-20th century—useful new algorithms are still being invented, with the now widely used Timsort dating to 2002, and the library sort being first published in 2006.
- 
												  Sorting AlgorithmsLecture 15: Sorting CSE 373: Data Structures and Algorithms Algorithms CSE 373 WI 19 - KASEY CHAMPION 1 Administrivia Piazza! Homework - HW 5 Part 1 Due Friday 2/22 - HW 5 Part 2 Out Friday, due 3/1 - HW 3 Regrade Option due 3/1 Grades - HW 1, 2 & 3 Grades into Canvas soon - Socrative EC status into Canvas soon - HW 4 Grades published by 3/1 - Midterm Grades PubLished CSE 373 SP 18 - KASEY CHAMPION 2 Midterm Stats CSE 373 19 WI - KASEY CHAMPION 3 Sorting CSE 373 19 WI - KASEY CHAMPION 4 Types of Sorts Comparison Sorts Niche Sorts aka “linear sorts” Compare two elements at a time Leverages specific properties about the General sort, works for most types of elements items in the list to achieve faster Element must form a “consistent, total ordering” runtimes For every element a, b and c in the list the niche sorts typically run O(n) time following must be true: - If a <= b and b <= a then a = b In this class we’ll focus on comparison - If a <= b and b <= c then a <= c sorts - Either a <= b is true or <= a What does this mean? compareTo() works for your elements Comparison sorts run at fastest O(nlog(n)) time CSE 373 SP 18 - KASEY CHAMPION 5 Sort Approaches In Place sort A sorting algorithm is in-place if it requires only O(1) extra space to sort the array Typically modifies the input collection Useful to minimize memory usage Stable sort A sorting algorithm is stable if any equal items remain in the same relative order before and after the sort Why do we care? - Sometimes we want to sort based on some, but not all attributes of an item -