In-Place Parallel Super Scalar Samplesort (Ips4o)∗
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Mergesort and Quicksort ! Merge Two Halves to Make Sorted Whole
Mergesort Basic plan: ! Divide array into two halves. ! Recursively sort each half. Mergesort and Quicksort ! Merge two halves to make sorted whole. • mergesort • mergesort analysis • quicksort • quicksort analysis • animations Reference: Algorithms in Java, Chapters 7 and 8 Copyright © 2007 by Robert Sedgewick and Kevin Wayne. 1 3 Mergesort and Quicksort Mergesort: Example Two great sorting algorithms. ! Full scientific understanding of their properties has enabled us to hammer them into practical system sorts. ! Occupy a prominent place in world's computational infrastructure. ! Quicksort honored as one of top 10 algorithms of 20th century in science and engineering. Mergesort. ! Java sort for objects. ! Perl, Python stable. Quicksort. ! Java sort for primitive types. ! C qsort, Unix, g++, Visual C++, Python. 2 4 Merging Merging. Combine two pre-sorted lists into a sorted whole. How to merge efficiently? Use an auxiliary array. l i m j r aux[] A G L O R H I M S T mergesort k mergesort analysis a[] A G H I L M quicksort quicksort analysis private static void merge(Comparable[] a, Comparable[] aux, int l, int m, int r) animations { copy for (int k = l; k < r; k++) aux[k] = a[k]; int i = l, j = m; for (int k = l; k < r; k++) if (i >= m) a[k] = aux[j++]; merge else if (j >= r) a[k] = aux[i++]; else if (less(aux[j], aux[i])) a[k] = aux[j++]; else a[k] = aux[i++]; } 5 7 Mergesort: Java implementation of recursive sort Mergesort analysis: Memory Q. How much memory does mergesort require? A. Too much! public class Merge { ! Original input array = N. -
Paralellizing the Data Cube
PARALELLIZING THE DATA CUBE By Todd Eavis SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT DALHOUSIE UNIVERSITY HALIFAX, NOVA SCOTIA JUNE 27, 2003 °c Copyright by Todd Eavis, 2003 DALHOUSIE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE The undersigned hereby certify that they have read and recommend to the Faculty of Graduate Studies for acceptance a thesis entitled “Paralellizing the Data Cube” by Todd Eavis in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Dated: June 27, 2003 External Examiner: Virendra Bhavsar Research Supervisor: Andrew Rau-Chaplin Examing Committee: Qigang Gao Evangelos Milios ii DALHOUSIE UNIVERSITY Date: June 27, 2003 Author: Todd Eavis Title: Paralellizing the Data Cube Department: Computer Science Degree: Ph.D. Convocation: October Year: 2003 Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. Signature of Author THE AUTHOR RESERVES OTHER PUBLICATION RIGHTS, AND NEITHER THE THESIS NOR EXTENSIVE EXTRACTS FROM IT MAY BE PRINTED OR OTHERWISE REPRODUCED WITHOUT THE AUTHOR’S WRITTEN PERMISSION. THE AUTHOR ATTESTS THAT PERMISSION HAS BEEN OBTAINED FOR THE USE OF ANY COPYRIGHTED MATERIAL APPEARING IN THIS THESIS (OTHER THAN BRIEF EXCERPTS REQUIRING ONLY PROPER ACKNOWLEDGEMENT IN SCHOLARLY WRITING) AND THAT ALL SUCH USE IS CLEARLY ACKNOWLEDGED. iii To the two women in my life: Amber and Bailey. iv Table of Contents Table of Contents v List of Tables x List of Figures xi Abstract i Acknowledgements ii 1 Introduction 1 1.1 Overview of Primary Research . -
2.3 Quicksort
2.3 Quicksort ‣ quicksort ‣ selection ‣ duplicate keys ‣ system sorts Algorithms, 4th Edition · Robert Sedgewick and Kevin Wayne · Copyright © 2002–2010 · January 30, 2011 12:46:16 PM Two classic sorting algorithms Critical components in the world’s computational infrastructure. • Full scientific understanding of their properties has enabled us to develop them into practical system sorts. • Quicksort honored as one of top 10 algorithms of 20th century in science and engineering. Mergesort. last lecture • Java sort for objects. • Perl, C++ stable sort, Python stable sort, Firefox JavaScript, ... Quicksort. this lecture • Java sort for primitive types. • C qsort, Unix, Visual C++, Python, Matlab, Chrome JavaScript, ... 2 ‣ quicksort ‣ selection ‣ duplicate keys ‣ system sorts 3 Quicksort Basic plan. • Shuffle the array. • Partition so that, for some j - element a[j] is in place - no larger element to the left of j j - no smaller element to the right of Sir Charles Antony Richard Hoare • Sort each piece recursively. 1980 Turing Award input Q U I C K S O R T E X A M P L E shu!e K R A T E L E P U I M Q C X O S partitioning element partition E C A I E K L P U T M Q R X O S not greater not less sort left A C E E I K L P U T M Q R X O S sort right A C E E I K L M O P Q R S T U X result A C E E I K L M O P Q R S T U X Quicksort overview 4 Quicksort partitioning Basic plan. -
Quick Sort Algorithm Song Qin Dept
Quick Sort Algorithm Song Qin Dept. of Computer Sciences Florida Institute of Technology Melbourne, FL 32901 ABSTRACT each iteration. Repeat this on the rest of the unsorted region Given an array with n elements, we want to rearrange them in without the first element. ascending order. In this paper, we introduce Quick Sort, a Bubble sort works as follows: keep passing through the list, divide-and-conquer algorithm to sort an N element array. We exchanging adjacent element, if the list is out of order; when no evaluate the O(NlogN) time complexity in best case and O(N2) exchanges are required on some pass, the list is sorted. in worst case theoretically. We also introduce a way to approach the best case. Merge sort [4] has a O(NlogN) time complexity. It divides the 1. INTRODUCTION array into two subarrays each with N/2 items. Conquer each Search engine relies on sorting algorithm very much. When you subarray by sorting it. Unless the array is sufficiently small(one search some key word online, the feedback information is element left), use recursion to do this. Combine the solutions to brought to you sorted by the importance of the web page. the subarrays by merging them into single sorted array. 2 Bubble, Selection and Insertion Sort, they all have an O(N2) time In Bubble sort, Selection sort and Insertion sort, the O(N ) time complexity that limits its usefulness to small number of element complexity limits the performance when N gets very big. no more than a few thousand data points. -
Heapsort Vs. Quicksort
Heapsort vs. Quicksort Most groups had sound data and observed: – Random problem instances • Heapsort runs perhaps 2x slower on small instances • It’s even slower on larger instances – Nearly-sorted instances: • Quicksort is worse than Heapsort on large instances. Some groups counted comparisons: • Heapsort uses more comparisons on random data Most groups concluded: – Experiments show that MH2 predictions are correct • At least for random data 1 CSE 202 - Dynamic Programming Sorting Random Data N Time (us) Quicksort Heapsort 10 19 21 100 173 293 1,000 2,238 5,289 10,000 28,736 78,064 100,000 355,949 1,184,493 “HeapSort is definitely growing faster (in running time) than is QuickSort. ... This lends support to the MH2 model.” Does it? What other explanations are there? 2 CSE 202 - Dynamic Programming Sorting Random Data N Number of comparisons Quicksort Heapsort 10 54 56 100 987 1,206 1,000 13,116 18,708 10,000 166,926 249,856 100,000 2,050,479 3,136,104 But wait – the number of comparisons for Heapsort is also going up faster that for Quicksort. This has nothing to do with the MH2 analysis. How can we see if MH2 analysis is relevant? 3 CSE 202 - Dynamic Programming Sorting Random Data N Time (us) Compares Time / compare (ns) Quicksort Heapsort Quicksort Heapsort Quicksort Heapsort 10 19 21 54 56 352 375 100 173 293 987 1,206 175 243 1,000 2,238 5,289 13,116 18,708 171 283 10,000 28,736 78,064 166,926 249,856 172 312 100,000 355,949 1,184,493 2,050,479 3,136,104 174 378 Nice data! – Why does N = 10 take so much longer per comparison? – Why does Heapsort always take longer than Quicksort? – Is Heapsort growth as predicted by MH2 model? • Is N large enough to be interesting?? (Machine is a Sun Ultra 10) 4 CSE 202 - Dynamic Programming .. -
Advanced Topics in Sorting
Advanced Topics in Sorting complexity system sorts duplicate keys comparators 1 complexity system sorts duplicate keys comparators 2 Complexity of sorting Computational complexity. Framework to study efficiency of algorithms for solving a particular problem X. Machine model. Focus on fundamental operations. Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X. Optimal algorithm. Algorithm with best cost guarantee for X. lower bound ~ upper bound Example: sorting. • Machine model = # comparisons access information only through compares • Upper bound = N lg N from mergesort. • Lower bound ? 3 Decision Tree a < b yes no code between comparisons (e.g., sequence of exchanges) b < c a < c yes no yes no a b c b a c a < c b < c yes no yes no a c b c a b b c a c b a 4 Comparison-based lower bound for sorting Theorem. Any comparison based sorting algorithm must use more than N lg N - 1.44 N comparisons in the worst-case. Pf. Assume input consists of N distinct values a through a . • 1 N • Worst case dictated by tree height h. N ! different orderings. • • (At least) one leaf corresponds to each ordering. Binary tree with N ! leaves cannot have height less than lg (N!) • h lg N! lg (N / e) N Stirling's formula = N lg N - N lg e N lg N - 1.44 N 5 Complexity of sorting Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X. -
Sorting Algorithm 1 Sorting Algorithm
Sorting algorithm 1 Sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation, or reordering, of the input. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and lower bounds. Classification Sorting algorithms used in computer science are often classified by: • Computational complexity (worst, average and best behaviour) of element comparisons in terms of the size of the list . For typical sorting algorithms good behavior is and bad behavior is . -
Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation
Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation Michael A. Bender∗x Jonathan Berryy Simon D. Hammondy K. Scott Hemmerty Samuel McCauley∗ Branden Moorey Benjamin Moseleyz Cynthia A. Phillipsy David Resnicky and Arun Rodriguesy ∗Stony Brook University, Stony Brook, NY 11794-4400 USA fbender,[email protected] ySandia National Laboratories, Albuquerque, NM 87185 USA fjberry, sdhammo, kshemme, bjmoor, caphill, drresni, [email protected] zWashington University in St. Louis, St. Louis, MO 63130 USA [email protected] xTokutek, Inc. www.tokutek.com Abstract—A fundamental challenge for supercomputer memory close to the processor, there can be a higher architecture is that processors cannot be fed data from number of connections between the memory and caches, DRAM as fast as CPUs can consume it. Therefore, many enabling higher bandwidth than current technologies. applications are memory-bandwidth bound. As the number 1 of cores per chip increases, and traditional DDR DRAM While the term scratchpad is overloaded within the speeds stagnate, the problem is only getting worse. A computer architecture field, we use it throughout this variety of non-DDR 3D memory technologies (Wide I/O 2, paper to describe a high-bandwidth, local memory that HBM) offer higher bandwidth and lower power by stacking can be used as a temporary storage location. DRAM chips on the processor or nearby on a silicon The scratchpad cannot replace DRAM entirely. Due to interposer. However, such a packaging scheme cannot contain sufficient memory capacity for a node. It seems the physical constraints of adding the memory directly likely that future systems will require at least two levels of to the chip, the scratchpad cannot be as large as DRAM, main memory: high-bandwidth, low-power memory near although it will be much larger than cache, having the processor and low-bandwidth high-capacity memory gigabytes of storage capacity. -
Minimizing Writes in Parallel External Memory Search
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Minimizing Writes in Parallel External Memory Search Nathan R. Sturtevant and Matthew J. Rutherford University of Denver Denver, CO, USA fsturtevant, [email protected] Abstract speed in Chinese Checkers, and is 3.5 times faster in Rubik’s Cube. Recent research on external-memory search has shown that disks can be effectively used as sec- WMBFS is motivated by several observations: ondary storage when performing large breadth- first searches. We introduce the Write-Minimizing First, most large-scale BFSs have been performed to ver- Breadth-First Search (WMBFS) algorithm which ify the diameter (the number of unique depths at which states is designed to minimize the number of writes per- can be found) or the width (the maximum number of states at formed in an external-memory BFS. WMBFS is any particular depth) of a state space, after which any com- also designed to store the results of the BFS for puted data is discarded. There are, however, many scenarios later use. We present the results of a BFS on a in which the results of a large BFS can be later used for other single-agent version of Chinese Checkers and the computation. In particular, a breadth-first search is the under- Rubik’s Cube edge cubes, state spaces with about lying technique for building pattern databases (PDBs), which 1 trillion states each. In evaluating against a com- are used as heuristics for search. PDBs require that the depth parable approach, WMBFS reduces the I/O for the of each state in the BFS be stored for later usage. -
Super Scalar Sample Sort
Super Scalar Sample Sort Peter Sanders1 and Sebastian Winkel2 1 Max Planck Institut f¨ur Informatik Saarbr¨ucken, Germany, [email protected] 2 Chair for Prog. Lang. and Compiler Construction Saarland University, Saarbr¨ucken, Germany, [email protected] Abstract. Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory parallel computers. We show that sample sort is also useful on a single processor. The main algorith- mic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions. This transformation facilitates optimizations like loop unrolling and software pipelining. The final implementation, albeit cache efficient, is limited by a linear num- ber of memory accesses rather than the O(n log n) comparisons. On an Itanium 2 machine, we obtain a speedup of up to 2 over std::sort from the GCC STL library, which is known as one of the fastest available quicksort implementations. 1 Introduction Counting comparisons is the most common way to compare the complexity of comparison based sorting algorithms. Indeed, algorithms like quicksort with good sampling strategies [9,14] or merge sort perform close to the lower bound of log n! n log n comparisons3 for sorting n elements and are among the best algorithms≈ in practice (e.g., [24, 20, 5]). At least for numerical keys it is a bit astonishing that comparison operations alone should be good predictors for ex- ecution time because comparing numbers is equivalent to subtraction — an operation of negligible cost compared to memory accesses. -
How to Sort out Your Life in O(N) Time
How to sort out your life in O(n) time arel Číže @kaja47K funkcionaklne.cz I said, "Kiss me, you're beautiful - These are truly the last days" Godspeed You! Black Emperor, The Dead Flag Blues Everyone, deep in their hearts, is waiting for the end of the world to come. Haruki Murakami, 1Q84 ... Free lunch 1965 – 2022 Cramming More Components onto Integrated Circuits http://www.cs.utexas.edu/~fussell/courses/cs352h/papers/moore.pdf He pays his staff in junk. William S. Burroughs, Naked Lunch Sorting? quicksort and chill HS 1964 QS 1959 MS 1945 RS 1887 quicksort, mergesort, heapsort, radix sort, multi- way merge sort, samplesort, insertion sort, selection sort, library sort, counting sort, bucketsort, bitonic merge sort, Batcher odd-even sort, odd–even transposition sort, radix quick sort, radix merge sort*, burst sort binary search tree, B-tree, R-tree, VP tree, trie, log-structured merge tree, skip list, YOLO tree* vs. hashing Robin Hood hashing https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf xs.sorted.take(k) (take (sort xs) k) qsort(lotOfIntegers) It may be the wrong decision, but fuck it, it's mine. (Mark Z. Danielewski, House of Leaves) I tell you, my man, this is the American Dream in action! We’d be fools not to ride this strange torpedo all the way out to the end. (HST, FALILV) Linear time sorting? I owe the discovery of Uqbar to the conjunction of a mirror and an Encyclopedia. (Jorge Luis Borges, Tlön, Uqbar, Orbis Tertius) Sorting out graph processing https://github.com/frankmcsherry/blog/blob/master/posts/2015-08-15.md Radix Sort Revisited http://www.codercorner.com/RadixSortRevisited.htm Sketchy radix sort https://github.com/kaja47/sketches (thinking|drinking|WTF)* I know they accuse me of arrogance, and perhaps misanthropy, and perhaps of madness. -
Advanced Topics in Sorting List RSS News Items in Reverse Chronological Order
Sorting applications Sorting algorithms are essential in a broad variety of applications Organize an MP3 library. Display Google PageRank results. Advanced Topics in Sorting List RSS news items in reverse chronological order. Find the median. Find the closest pair. Binary search in a database. [email protected] Identify statistical outliers. Find duplicates in a mailing list. [email protected] Data compression. Computer graphics. Computational biology. Supply chain management. Load balancing on a parallel computer. http://www.4shared.com/file/79096214/fb2ed224/lect01.html . Sorting algorithms Which algorithm to use? Many sorting algorithms to choose from Applications have diverse attributes Stable? Internal sorts Multiple keys? Insertion sort, selection sort, bubblesort, shaker sort. Quicksort, mergesort, heapsort, samplesort, shellsort. Deterministic? Solitaire sort, red-black sort, splaysort, Dobosiewicz sort, psort, ... Keys all distinct? External sorts Multiple key types? Poly-phase mergesort, cascade-merge, oscillating sort. Linked list or arrays? Radix sorts Large or small records? Distribution, MSD, LSD. 3-way radix quicksort. Is your file randomly ordered? Parallel sorts Need guaranteed performance? Bitonic sort, Batcher even-odd sort. Smooth sort, cube sort, column sort. Cannot cover all combinations of attributes. GPUsort. 1 Case study 1 Case study 2 Problem Problem Sort a huge randomly-ordered file of small Sort a huge file that is already almost in records. order. Example Example Process transaction records for a phone Re-sort a huge database after a few company. changes. Which sorting method to use? Which sorting method to use? 1. Quicksort: YES, it's designed for this problem 1. Quicksort: probably no, insertion simpler and faster 2.