Design Implement Analyze Experiment

Total Page:16

File Type:pdf, Size:1020Kb

Design Implement Analyze Experiment Felix Putze and Peter Sanders design experiment Algorithmics analyze implement Course Notes Algorithm Engineering TU Karlsruhe October 19, 2009 Preface These course notes cover a lecture on algorithm engineering for the basic toolbox that Peter Sanders is reading at Universitat¨ Karlsruhe since 2004. The text is compiled from slides, scientific papers and other manuscripts. Most of this material is in English so that this language was adopted as the main language. However, some parts are in German. The primal sources of our material are given at the beginning of each chapter. Please refer to the original publications for further references. This document is still work in progress. Please report bugs of any type (content, language, layout, . ) to [email protected]. Thank you! 1 Contents 1 Was ist Algorithm Engineering? 6 1.1 Einfuhrung¨ . .6 1.2 Stand der Forschung und offene Probleme . .8 2 Data Structures 16 2.1 Arrays & Lists . 16 2.2 External Lists . 18 2.3 Stacks, Queues & Variants . 20 3 Sorting 24 3.1 Quicksort Basics . 24 3.2 Refined Quicksort . 25 3.3 Lessons from experiments . 27 3.4 Super Scalar Sample Sort . 30 3.5 Multiway Merge Sort . 34 3.6 Sorting with parallel disks . 35 3.7 Internal work of Multiway Mergesort . 38 3.8 Experiments . 41 4 Priority Queues 44 4.1 Introduction . 44 4.2 Binary Heaps . 45 4.3 External Priority Queues . 47 4.4 Adressable Priority Queues . 54 5 External Memory Algorithms 57 5.1 Introduction . 57 5.2 The external memory model and things we already saw . 59 5.3 The Stxxl library . 60 5.4 Time-Forward Processing . 62 5.5 Cache-oblivious Algorithms . 64 2 5.5.1 Matrix Transposition . 67 5.5.2 Searching Using Van Emde Boas Layout . 69 5.5.3 Funnel sorting . 71 5.5.4 Is the Model an Oversimplification? . 74 5.6 External BFS . 76 5.6.1 Introduction . 76 5.6.2 Algorithm of Munagala and Ranade . 77 5.6.3 An Improved BFS Algorithm with sublinear I/O . 78 5.6.4 Improvements in the previous implementat- ions of MR BFS and MM BFS R................. 81 5.6.5 A Heuristic for maintaining the pool . 82 5.7 Maximal Independent Set . 84 5.8 Euler Tours . 85 5.9 List Ranking . 86 6 van Emde Boas Trees 90 6.1 From theory to practice . 90 6.2 Implementation . 91 6.3 Experiments . 95 7 Shortest Path Search 98 7.1 Introduction . 98 7.2 “Classical” and other Results . 99 7.3 Highway Hierarchy . 102 7.3.1 Introduction . 102 7.3.2 Hierarchies and Contraction . 103 7.3.3 Query . 107 7.3.4 Experiments . 111 7.4 Transit Node Routing . 116 7.4.1 Computing Transit Nodes . 117 7.4.2 Experiments . 119 7.4.3 Complete Description of the Shortest Path . 122 7.5 Dynamic Shortest Path Computation . 123 7.5.1 Covering Nodes . 124 7.5.2 Static Highway-Node Routing . 126 7.5.3 Construction . 127 7.5.4 Query . 127 7.5.5 Analogies To and Differences From Related Techniques . 128 7.5.6 Dynamic Multi-Level Highway Node Routing . 129 7.5.7 Experiments . 131 3 8 Minimum Spanning Trees 137 8.1 Definition & Basic Remarks . 137 8.1.1 Two important properties . 137 8.2 Classic Algorithms . 138 8.2.1 Excursus: The Union-Find Data Structure . 141 8.3 QuickKruskal . 143 8.4 The I-Max-Filter algorithm . 144 8.5 External MST . 149 8.5.1 Semiexternal Algorithm . 150 8.5.2 External Sweeping Algorithm . 151 8.5.3 Implementation & Experiments . 153 8.6 Connected Components . 156 9 String Sorting 158 9.1 Introduction . 158 9.2 Multikey Quicksort . 159 9.3 Radix Sort . 160 10 Suffix Array Construction 162 10.1 Introduction . 162 10.2 The DC3 Algorithm . 162 10.3 External Suffix Array Construction . 165 11 Presenting Data from Experiments 170 11.1 Introduction . 170 11.2 The Process . 171 11.3 Tables . 172 11.4 Two-dimensional Figures . 172 11.5 Grids and Ticks . 181 11.6 Three-dimensional Figures . 182 11.7 The Caption . 183 11.8 A Check List . 184 12 Appendix 186 12.1 Used machine models . 186 12.2 Amortized Analysis for Unbounded Arrays . 187 12.3 Analysis of Randomized Quicksort . 188 12.4 Insertion Sort . 189 12.5 Lemma on Interval Maxima . 190 12.6 Random Permutations without additional I/Os . 191 12.7 Proof of Discarding Theorem for Suffix Array Construction . 192 4 12.8 Pseudocode for the Discarding Algorithm . 192 5 Chapter 1 Was ist Algorithm Engineering? 1.1 Einfuhrung¨ Algorithmen (einschließlich Datenstrukturen) sind das Herz jeder Computeranwendung und damit von entscheidender Bedeutung fur¨ große Bereiche von Technik, Wirtschaft, Wissenschaft und taglichem¨ Leben. Die Algorithmik befasst sich mit der systematischen Entwicklung effizienter Algorithmen und hat damit entscheidenden Anteil an der effek- tiven Entwicklung verlaßlicher¨ und ressourcenschonender Technik. Wir nennen hier nur einige besonders spektakulare¨ Beispiele: Das schnelle Durchsuchen der gewaltigen Datenmengen des Internet (z.B. mit Google) hat die Art verandert,¨ wie wir mit Wissen und Information umgehen. Moglich¨ wurde dies durch Algorithmen zur Volltextsuche, die in Sekundenbruchteilen alle Tr- effer aus Terabytes von Daten herausfischen konnen¨ und durch Ranking-Algorithmen, die Graphen mit Milliarden von Knoten verarbeiten, um aus der Flut von Treffern relevante Antworten zu filtern. Weniger sichtbar aber ahnlich¨ wichtig sind Algorith- men fur¨ die effiziente Verteilung von sehr haufig¨ zugegriffenen Daten unter massiven Lastschwankungen oder gar Uberlastungsangriffen¨ (distributed denial of service attacks). Der Marktfuhrer¨ auf diesem Gebiet, Akamai, wurde von Algorithmikern gegrundet.¨ Eines der wichtigsten wissenschaftlichen Ereignisse der letzten Jahre war die Veroffentlichung¨ des menschlichen Genoms. Mitentscheidend fur¨ die fruhe¨ Veroffentlichung¨ war die von der Firma Celera verwendete und durch algorithmische Uberlegungen¨ begrundete¨ Aus- gestaltung des Sequenzierprozesses (whole genome shotgun sequencing). Die Algorith- mik hat sich hier nicht auf die Verarbeitung der von Naturwissenschaftlern produzierten Daten beschrankt,¨ sondern gestaltenden Einfluss auf den gesamten Prozess ausgeubt.¨ Die Liste der Bereiche, in denen ausgefeilte Algorithmen eine Schlusselrolle¨ spielen, ließe sich fast beliebig fortsetzen: Computergrafik, Bildverarbeitung, geografische Infor- mationssysteme, Kryptografie, Produktions-, Logistik- und Verkehrsplanung . Wie funktioniert nun der Transfer algorithmischer Innovation in Anwendungsbere- 6 abstrakte Algorithm realistische Modelle Modelle 1 reale Engineering 7 8 Eingaben Anwendungen Entwurf Entwurf 2 falsifizierbare Analyse Analyse3 Hypothesen 5 Experimente Algorithmentheorie Induktion Deduktion Leistungsgarantien 4 Implementierung Leistungs− garantien Implementierung Algorithmen− 6 bibliotheken Anwendungen Figure 1.1: Zwei Sichtweisen der Algorithmik: Links: traditionell. Rechts: AE = Algo- rithmik als von falsifizierbaren Hypothesen getriebener Kreislauf aus Entwurf, Analyse, Implementierung, und experimenteller Bewertung von Algorithmen. iche? Traditionell hat sich die Algorithmik der Methodik der Algorithmentheorie bedi- ent, die aus der Mathematik stammt: Algorithmen werden fur¨ einfache und abstrakte Problem- und Maschinenmodelle entworfen. Hauptergebnis sind dann beweisbare Leis- tungsgarantien fur¨ alle moglichen¨ Eingaben. Dieser Ansatz fuhrt¨ in vielen Fallen¨ zu ele- ganten, zeitlosen Losungen,¨ die sich an viele Anwendungen anpassen lassen. Die harten Leistungsgarantien ergeben zuverlassig¨ hohe Effizienz auch fur¨ zur Implementierungszeit unbekannte Typen von Eingaben. Aufgreifen und Implementieren eines Algorithmus ist aus Sicht der Algorithmentheorie Teil der Anwendungsentwicklung. Nach allgemeiner Beobachtung ist diese Art des Ergebnistransfers aber ein sehr langsamer Prozess. Bei wachsenden Anforderungen an innovative Algorithmen ergeben sich daraus wachsende Lucken¨ zwischen Theorie und Praxis: Reale Hardware entwickelt sich durch Parallelis- mus, Pipelining, Speicherhierarchien u.s.w. immer weiter weg von einfachen Maschi- nenmodellen. Anwendungen werden immer komplexer. Gleichzeitig entwickelt die Al- gorithmentheorie immer ausgeklugeltere¨ Algorithmen, die zwar wichtige Ideen enthalten aber manchmal kaum implementierbar sind. Außerdem haben reale Eingaben oft wenig mit den worst-case Szenarien der theoretischen Analyse zu tun. Im Extremfall werden viel versprechende algorithmische Ansatze¨ vernachlassigt,¨ weil eine vollstandige¨ Anal- yse mathematisch zu schwierig ware.¨ Seit Beginn der 1990er Jahre wird deshalb eine breitere Sichtweise der Algorithmik immer wichtiger, die als algorithm engineering (AE) bezeichnet wird und bei der En- twurf, Analyse, Implementierung und experimentelle Bewertung von Algorithmen gleich- berechtigt nebeneinander stehen. Der gegenuber¨ der Algorithmentheorie großere¨ Meth- 7 odenapparat, die Einbeziehung realer Software und der engere Bezug zu Anwendun- gen verspricht realistischere Algorithmen, die Uberbr¨ uckung¨ entstandener Lucken¨ zwis- chen Theorie und Praxis, und einen schnelleren Transfer von algorithmischem Know- how in Anwendungen. Abbildung 1.1 zeigt die Sichtweise der Algorithmik als AE und eine Aufteilung in acht eng interagierende Aktivitaten.¨ Ziele und Arbeitsprogramm des Schwerpunktprogramms ergeben sich daraus in naturlicher¨ Weise: Einsatz der vollen Schlagkraft der AE Methodologie mit dem Ziel, Lucken¨ zwischen Theorie und Praxis zu uberbr¨ ucken.¨ 1. Studium realistischer Modelle fur¨ Maschinen und algorithmische Probleme. 2. Entwurf von einfachen und auch in der Realitat¨ effizienten Algorithmen. 3. Analyse praktikabler Algorithmen zwecks Etablierung von Leistungsgarantien, die Theorie und Praxis einander naher¨ bringen. 4. Sorgfaltige¨ Implementierungen, die die Lucken¨ zwischen dem besten theoretischen Algorithmus und dem besten implementierten Algorithmus verkleinern. 5. Systematische, reproduzierbare Experimente, die der Widerlegung oder Starkung¨ aussagekraftiger,¨ falsifizierbarer Hypothesen dienen,.
Recommended publications
  • The Influence of Caches on the Performance of Sorting
    The Influence of Caches on the Performance of Sorting Anthony LaMarca* & Richard E. Ladner Department of Computer Science and Engineering University of Washington Seattle, WA 98195 lamarcaQparc.xerox.com [email protected] Abstract quicksort [12], and radix sort*. Heapsort, mergesort, We investigate the effect that caches have on the per- and quicksort are all comparison based sorting algo- formance of sorting algorithms both experimentally and rithms while radix sort is not. analytically. To address the performance problems that For each of the four sorting algorithms we choose an high cache miss penalties introduce we restructure heap- implementation variant with potential for good overall sort, mergesort and quicksort in order to improve their performance and then heavily optimize this variant us- cache locality. For all three algorithms the improvement ing traditional techniques to minimize the number of in cache performance leads to a reduction in total ex- instructions executed. These heavily optimized algo- ecution time. We also investigate the performance of rithms form the baseline for comparison. For each of radix sort. Despite the extremely low instruction count the comparison sort baseline algorithms we develop and incurred by this linear sorting algorithm, its relatively apply memory optimizations in order to improve cache poor cache performance results in worse overall perfor- performance and, hopefully, overall performance. For mance than the efficient comparison based sorting algo- radix sort we optimize cache performance by varying rithms. the radix. In the process we develop some simple an- alytic techniques which enable us to predict the mem- 1 Introduction. ory performance of these algorithms in terms of cache misses.
    [Show full text]
  • 17. Chapter 11
    11 EXTERNALSORTING Good order is the foundation of all things. —Edmund Burke Sorting a collection of records on some (search) key is a very useful operation. The key can be a single attribute or an ordered list of attributes, of course. Sorting is required in a variety of situations, including the following important ones: Users may want answers in some order; for example, by increasing age (Section 5.2). Sorting records is the first step in bulk loading a tree index (Section 9.8.2). Sorting is useful for eliminating duplicate copies in a collection of records (Chapter 12). A widely used algorithm for performing a very important relational algebra oper- ation, called join, requires a sorting step (Section 12.5.2). Although main memory sizes are increasing, as usage of database systems increases, increasingly larger datasets are becoming common as well. When the data to be sorted is too large to fit into available main memory, we need to use an external sorting algorithm. Such algorithms seek to minimize the cost of disk accesses. We introduce the idea of external sorting by considering a very simple algorithm in Section 11.1; using repeated passes over the data, even very large datasets can be sorted with a small amount of memory. This algorithm is generalized to develop a realistic external sorting algorithm in Section 11.2. Three important refinements are discussed. The first, discussed in Section 11.2.1, enables us to reduce the number of passes. The next two refinements, covered in Section 11.3, require us to consider a more detailed model of I/O costs than the number of page I/Os.
    [Show full text]
  • Cache-Oblivious Algorithms EXTENDED ABSTRACT Matteo Frigo Charles E
    Cache-Oblivious Algorithms EXTENDED ABSTRACT Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139 ¢¡¤£¦¥¨§¢© ¡ § ¨¨ ¥¨¡ ! "#¨§#£$§ %¥'&( # &*)+%£,&-§" Main Abstract This paper presents asymptotically optimal algo- organized by Memory rithms for rectangular matrix transpose, FFT, and sorting on optimal replacement computers with multiple levels of caching. Unlike previous strategy optimal algorithms, these algorithms are cache oblivious: no Cache variables dependent on hardware parameters, such as cache CPU size and cache-line length, need to be tuned to achieve opti- mality. Nevertheless, these algorithms use an optimal amount W of work and move data optimally among multiple levels of work Q cache. For a cache with size Z and cache-line length L where Z 3 L Cache lines 2 0 cache / 1 . Ω Z L the number of cache misses for an m n ma- misses Θ 0 Lines 2 3 trix transpose is / 1 mn L . The number of cache misses of length L for either an n-point FFT or the sorting of n numbers is 050 0 Θ 0 Θ 24/ 3 / 2 / / 1 n L 1 logZ n . We also give an mnp -work al- 1 gorithm to multiply an m 1 n matrix by an n p matrix that Figure 1: The ideal-cache model 0 0 26/ 2 2 3 2 3 7 incurs Θ / 1 mn np mp L mnp L Z cache faults. We introduce an “ideal-cache” model to analyze our algo- shall assume that word size is constant; the particular rithms. We prove that an optimal cache-oblivious algorithm constant does not affect our asymptotic analyses.
    [Show full text]
  • External Sorting Why Sort? Sorting a File in RAM 2-Way Sort of N Pages
    Why Sort? A classic problem in computer science! Data requested in sorted order – e.g., find students in increasing gpa order Sorting is the first step in bulk loading of B+ tree External Sorting index. Sorting is useful for eliminating duplicate copies in a collection of records (Why?) Chapter 13 Sort-merge join algorithm involves sorting. Problem: sort 100Gb of data with 1Gb of RAM. – why not virtual memory? Take a look at sortbenchmark.com Take a look at main memory sort algos at www.sorting-algorithms.com Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Database Management Systems, R. Ramakrishnan and J. Gehrke 2 Sorting a file in RAM 2-Way Sort of N pages Requires Minimum of 3 Buffers Three steps: – Read the entire file from disk into RAM Pass 0: Read a page, sort it, write it. – Sort the records using a standard sorting procedure, such – only one buffer page is used as Shell sort, heap sort, bubble sort, … (100’s of algos) – How many I/O’s does it take? – Write the file back to disk Pass 1, 2, 3, …, etc.: How can we do the above when the data size is 100 – Minimum three buffer pages are needed! (Why?) or 1000 times that of available RAM size? – How many i/o’s are needed in each pass? (Why?) And keep I/O to a minimum! – How many passes are needed? (Why?) – Effective use of buffers INPUT 1 – Merge as a way of sorting – Overlap processing and I/O (e.g., heapsort) OUTPUT INPUT 2 Main memory buffers Disk Disk Database Management Systems, R.
    [Show full text]
  • Engineering Cache-Oblivious Sorting Algorithms
    Engineering Cache-Oblivious Sorting Algorithms Master’s Thesis by Kristoffer Vinther June 2003 Abstract Modern computers are far more sophisticated than simple sequential programs can lead one to believe; instructions are not executed sequentially and in constant time. In particular, the memory of a modern computer is structured in a hierarchy of increasingly slower, cheaper, and larger storage. Accessing words in the lower, faster levels of this hierarchy can be done virtually immediately, but accessing the upper levels may cause delays of millions of processor cycles. Consequently, recent developments in algorithm design have had a focus on developing algorithms that sought to minimize accesses to the higher levels of the hierarchy. Much experimental work has been done showing that using these algorithms can lead to higher performing algorithms. However, these algorithms are designed and implemented with a very specific level in mind, making it infeasible to adapt them to multiple levels or use them efficiently on different architectures. To alleviate this, the notion of cache-oblivious algorithms was developed. The goal of a cache-oblivious algorithm is to be optimal in the use of the memory hierarchy, but without using specific knowledge of its structure. This automatically makes the algorithm efficient on all levels of the hierarchy and on all implementations of such hierarchies. The experimental work done with these types of algorithms remain sparse, however. In this thesis, we present a thorough theoretical and experimental analysis of known optimal cache-oblivious sorting algorithms. We develop our own simpler variants and present the first optimal sub-linear working space cache-oblivious sorting algorithm.
    [Show full text]
  • Efficient External Sorting on Flash Memory Embedded Devices
    International Journal of Database Management Systems ( IJDMS ) Vol.5, No.1, February 2013 EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada [email protected] [email protected] ABSTRACT Many embedded system applications involve storing and querying large datasets. Existing research in this area has focused on adapting and applying conventional database algorithms to embedded devices. Algorithms designed for processing queries on embedded devices must be able to execute given the small amount of available memory and energy constraints. Most embedded devices use flash memory to store large amounts of data. Flash memory has unique performance characteristics that can be exploited to improve algorithm performance. In this paper, we describe the Flash MinSort external sorting algorithm that uses an index, generated at runtime, to take advantage of fast random reads in flash memory. This algorithm adapts to the amount of memory available and performs best in applications where sort keys are clustered. Experimental results show that Flash MinSort is two to ten times faster than previous approaches for small memory sizes where external merge sort is not executable. KEYWORDS sorting, sensor node, flash memory, query processing 1. INTRODUCTION Embedded systems are devices that perform a few simple functions. Most embedded systems, such as sensor networks, smart cards and certain hand-held devices, are computationally constrained. These devices typically have a low-power microprocessor, limited amount of memory, and flash-based data storage. In addition, some battery-powered devices, such as sensor networks, must be deployed for months at a time without being replaced.
    [Show full text]
  • External Sorting
    External sorting R & G – Chapter 13 Brian Cooper Yahoo! Research A little bit about Y! Yahoo! is the most visited website in the world Sorry Google 500 million unique visitors per month 74 percent of U.S. users use Y! (per month) 13 percent of U.S. users’ online time is on Y! Why sort? Why sort? Users usually want data sorted Sorting is first step in bulk-loading a B+ tree Sorting useful for eliminating duplicates Sort-merge join algorithm involves sorting Banana Apple Grapefruit Banana Apple Blueberry Orange Grapefruit Mango Kiwi Kiwi Mango Strawberry Orange Blueberry Strawberry So? Don’t we know how to sort? Quicksort Mergesort Heapsort Selection sort Insertion sort Radix sort Bubble sort Etc. Why don’t these work for databases? Key problem in database sorting 4 GB: $300 480 GB: $300 How to sort data that does not fit in memory? Example: merge sort Banana Banana Banana Grapefruit Grapefruit Banana Grapefruit Grapefruit Apple Apple Apple Apple Orange Orange Orange Orange Mango Kiwi Mango Mango Kiwi Strawberry Kiwi Kiwi Mango Blueberry Strawberry Blueberry Strawberry Blueberry Blueberry Strawberry Example: merge sort Banana Apple Apple Grapefruit Banana Banana Grapefruit Blueberry Apple Orange Grapefruit Orange Kiwi Mango Orange Kiwi Strawberry Mango Blueberry Kiwi Blueberry Mango Strawberry Strawberry Isn’t that good enough? Consider a file with N records Merge sort is O(N lg N) comparisons We want to minimize disk I/Os Don’t want to go to disk O(N lg N) times! Key insight: sort based on pages, not records Read
    [Show full text]
  • Algorithms and Data Structures for External Memory Algorithms and Data Structures 2:4 for External Memory Jeffrey Scott Vitter
    TCSv2n4.qxd 4/24/2008 11:56 AM Page 1 FnT TCS 2:4 Foundations and Trends® in Theoretical Computer Science Algorithms and Data Structures for External MemoryAlgorithms and Data Structures for Vitter Scott Jeffrey Algorithms and Data Structures 2:4 for External Memory Jeffrey Scott Vitter Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. Algorithms and Data Structures Algorithms and Data Structures for External Memory surveys the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. A variety of EM paradigms are considered for for External Memory solving batched and online problems efficiently in external memory. Jeffrey Scott Vitter Algorithms and Data Structures for External Memory describes several useful paradigms for the design and implementation of efficient EM algorithms and data structures. The problem domains considered include sorting, permuting, FFT, scientific computing, computational geometry, graphs, databases, geographic information systems, and text and string processing. Algorithms and Data Structures for External Memory is an invaluable reference for anybody interested in, or conducting research in the design, analysis, and implementation of algorithms and data structures. This book is originally published as Foundations and Trends® in Theoretical Computer Science Volume 2 Issue 4, ISSN: 1551-305X. now now the essence of knowledge Algorithms and Data Structures for External Memory Algorithms and Data Structures for External Memory Jeffrey Scott Vitter Department of Computer Science Purdue University West Lafayette Indiana, 47907–2107 USA [email protected] Boston – Delft Foundations and TrendsR in Theoretical Computer Science Published, sold and distributed by: now Publishers Inc.
    [Show full text]
  • 14.1 Sorting
    436 14. COMBINATORIAL PROBLEMS INPUT OUTPUT 14.1 Sorting Input description:Asetofn items. Problem description: Arrange the items in increasing (or decreasing) order. Discussion: Sorting is the most fundamental algorithmic problem in computer science. Learning the different sorting algorithms is like learning scales for a mu- sician. Sorting is the first step in solving a host of other algorithm problems, as shown in Section 4.2 (page 107). Indeed, “when in doubt, sort” is one of the first rules of algorithm design. Sorting also illustrates all the standard paradigms of algorithm design. The re- sult is that most programmers are familiar with many different sorting algorithms, which sows confusion as to which should be used for a given application. The following criteria can help you decide: • How many keys will you be sorting? – For small amounts of data (say n ≤ 100), it really doesn’t matter much which of the quadratic-time algorithms you use. Insertion sort is faster, simpler, and less likely to be buggy than bubblesort. Shellsort is closely related to, but much faster than, insertion sort, but it involves looking up the right insert sequences in Knuth [Knu98]. When you have more than 100 items to sort, it is important to use an O(n lg n)-time algorithm like heapsort, quicksort, or mergesort. There are various partisans who favor one of these algorithms over the others, but since it can be hard to tell which is fastest, it usually doesn’t matter. Once you get past (say) 5,000,000 items, it is important to start thinking about external-memory sorting algorithms that minimize disk access.
    [Show full text]
  • External Sorting and Query Evaluation (R&G Ch
    Faloutsos 15-415 CMU SCS CMU SCS Why Sort? Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture 11: external sorting and query evaluation (R&G ch. 13 and 14) Faloutsos 15-415 1 Faloutsos 15-415 2 CMU SCS CMU SCS Why Sort? Outline • select ... order by – e.g., find students in increasing gpa order • two-way merge sort • bulk loading B+ tree index. • external merge sort • duplicate elimination (select distinct) • fine-tunings • select ... group by • B+ trees for sorting • Sort-merge join algorithm involves sorting. Faloutsos 15-415 3 Faloutsos 15-415 4 CMU SCS CMU SCS 2-Way Sort: Requires 3 Buffers Two-Way External Merge Sort • Pass 0: Read a page, sort it, write it. 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file • Each pass we read + PASS 0 – only one buffer page is used 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs write each page in file. PASS 1 4,7 1,3 • Pass 1, 2, 3, …, etc.: requires 3 buffer pages 2,3 2-page runs 4,6 8,9 5,6 2 – merge pairs of runs into runs twice as long PASS 2 2,3 – three buffer pages used. 4,4 1,2 4-page runs 6,7 3,5 8,9 6 PASS 3 INPUT 1 1,2 OUTPUT 2,3 INPUT 2 3,4 8-page runs 4,5 6,6 Main memory buffers Disk Disk 7,8 Faloutsos 15-415 5 Faloutsos 15-415 9 6 1 Faloutsos 15-415 CMU SCS CMU SCS Two-Way External Merge Sort Two-Way External Merge Sort 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file • Each pass we read + PASS 0 • Each pass we read + PASS 0 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs write each page in file.
    [Show full text]
  • 8 File Processing and External Sorting
    8 File Processing and External Sorting In earlier chapters we discussed basic data structures and algorithms that operate on data stored in main memory. Sometimes the application at hand requires that large amounts of data be stored and processed, so much data that they cannot all fit into main memory. In that case, the data must reside on disk and be brought into main memory selectively for processing. You probably already realize that main memory access is much faster than ac- cess to data stored on disk or other storage devices. In fact, the relative difference in access times is so great that efficient disk-based programs require a different ap- proach to algorithm design than most programmers are used to. As a result, many programmers do a poor job when it comes to file processing applications. This chapter presents the fundamental issues relating to the design of algo- rithms and data structures for disk-based applications. We begin with a descrip- tion of the significant differences between primary memory and secondary storage. Section 8.2 discusses the physical aspects of disk drives. Section 8.3 presents basic methods for managing buffer pools. Buffer pools will be used several times in the following chapters. Section 8.4 discusses the C++ model for random access to data stored on disk. Sections 8.5 to 8.8 discuss the basic principles of sorting collections of records too large to fit in main memory. 8.1 Primary versus Secondary Storage Computer storage devices are typically classified into primary or main memory and secondary or peripheral storage.
    [Show full text]
  • Cache-Oblivious Algorithms by Harald Prokop
    Cache-Oblivious Algorithms by Harald Prokop Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY. June 1999 c Massachusetts Institute of Technology 1999. All rights reserved. Author Department of Electrical Engineering and Computer Science May 21, 1999 Certified by Charles E. Leiserson Professor of Computer Science and Engineering Thesis Supervisor Accepted by Arthur C. Smith Chairman, Departmental Committee on Graduate Students 2 Cache-Oblivious Algorithms by Harald Prokop Submitted to the Department of Electrical Engineering and Computer Science on May 21, 1999 in partial fulfillment of the requirements for the degree of Master of Science. Abstract This thesis presents “cache-oblivious” algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary algorithms for matrix transposition, matrix multi- plication, sorting, and Jacobi-style multipass filtering are not cache optimal. We present algorithms for rectangular matrix transposition, FFT, sorting, and multi- pass filters, which are asymptotically optimal on computers with multiple levels 2 ( ) of caches. For a cache with size Z and cache-line length L, where Z = Ω L , ( + = ) the number of cache misses for an m n matrix transpose is Θ 1 mn L . The number of cache misses for either an n-point FFT or the sorting of n numbers is + ( = )( + )) Θ ( 1 n L 1 logZn .
    [Show full text]