Midterm Review Overview Two-Way External Merge Sort General

Total Page:16

File Type:pdf, Size:1020Kb

Midterm Review Overview Two-Way External Merge Sort General Overview • Sorting Midterm Review • Hashing • Selections • Joins Spring 2003 Two-Way External Merge Sort General External Merge Sort 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file • Each pass we read + write PASS 0 * each page in file. 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs More than 3 buffer pages. How can we utilize them? • N pages in the file => the PASS 1 • To sort a file with N pages using B buffer pages: 4,7 1,3 number of passes 2,3 2-page runs 4,6 8,9 5,6 2 – Pass 0: use B buffer pages. Produce È N / B ˘ sorted runs PASS 2 = log2 N +1 of B pages each. È ˘ 2,3 4,4 1,2 4-page runs – Pass 1, 2, …, etc.: merge B-1 runs. • So total cost is: 6,7 3,5 6 8,9 INPUT 1 2N(Èlog2 N ˘+1) PASS 3 1,2 INPUT 2 OUTPUT 2,3 • Idea: Divide and conquer: . 3,4 8-page runs sort subfiles and merge 4,5 INPUT B-1 6,6 Disk 7,8 Disk B Main memory buffers 9 Cost of External Merge Sort Sorting warnings • Be able to run the general external merge sort! • Number of passes: 1 log N / B – Careful use of buffers in pass 0 vs. pass i, i>0. + È B-1 È ˘˘ • Cost = 2N * (# of passes) – Draw pictures of runs like the “tree” in the slides • E.g., with 5 buffer pages, to sort 108 page file: for 2-way external merge sort (will look slightly – Pass 0: È 108 / 5 ˘ = 22 sorted runs of 5 pages different!) each (last run is only 3 pages) • Be able to compute # of passes correctly for • Now, do four-way (B-1) merges file of N blocks, B buffers! – Pass 1: È 2 2 / 4 ˘ = 6 sorted runs of 20 pages – Watch the number of buffers available in pass 0 each (last run is only 8 pages) – tournament sort (heapsort) vs. quicksort – Pass 2: 2 sorted runs, 80 pages and 28 pages – Be able to count I/Os carefully! – Pass 3: Sorted file of 108 pages 1 More tips Query Processing Overview • How to sort any file using 3 memory Pages • The query optimizer translates SQL to a special internal “language” • How to sort in as few passes given some amount of – Query Plans memory • The query executor is an interpreter for query plans • Think of query plans as “box-and-arrow” • I have a file of N blocks and B buffers dataflow diagrams – How big can N be to sort in 2 phases ? – Each box implements a relational operator – Edges represent a flow of tuples (columns as specified) name, gpa B-1 >= N/B – For single-table queries, these diagrams are straight-line graphs Distinct So, N <= B^2 .. approx of course Optimizer name, gpa SELECT DISTINCT name, gpa Sort FROM Students name, gpa HeapScan Sort GROUP BY: Naïve Solution Aggregate • The Sort iterator (could be external sorting, as An Alternative to Sorting: Hashing! explained last week) naturally permutes its input so that all tuples are output in sequence Sort • Idea: • The Aggregate iterator keeps running info (“transition values”) on agg functions in the SELECT list, per group – Many of the things we use sort for don’t exploit the order of the sorted data – E.g., for COUNT, it keeps count-so-far – E.g.: forming groups in GROUP BY – For SUM, it keeps sum-so-far – E.g.: removing duplicates in DISTINCT – For AVERAGE it keeps sum-so-far and count-so-far • Often good enough to match all tuples with equal field- • As soon as the Aggregate iterator sees a tuple from a values new group: • Hashing does this! 1. It produces an output for the old group based on the agg function – And may be cheaper than sorting! (Hmmm…!) E.g. for AVERAGE it returns (sum-so-far/count-so-far) – But how to do it for data sets bigger than memory?? 2. It resets its running info. 3. It updates the running info with the new tuple’s info Original Two Phases Relation OUTPUT Partitions 1 General Idea 1 INPUT 2 hash 2 function • Two phases: • Partition: . hp B-1 B-1 – Partition: use a hash function hp to split tuples into partitions on disk. Disk B main memory buffers Disk • We know that all matches live in the same partition. • Partitions are “spilled” to disk via output buffers Partitions Result – ReHash: for each partition on disk, read it into Hash table for partition hash Ri (k <= B pages) memory and build a main-memory hash table fn • Rehash: hr based on a hash function hr • Then go through each bucket of this hash table to bring together matching tuples Disk B main memory buffers 2 Hash GROUP BY: Naïve Solution Aggregate (similar to the Sort GROUPBY) We Can Do Better! HashAgg Hash • The Hash iterator permutes its input so that all tuples are output in sequence • Combine the summarization into the hashing process • The Aggregate iterator keeps running info (“transition – During the ReHash phase, don’t store tuples, store pairs of values”) on agg functions in the SELECT list, per group the form <GroupVals, TransVals> – E.g., for COUNT, it keeps count-so-far – When we want to insert a new tuple into the hash table – For SUM, it keeps sum-so-far • If we find a matching GroupVals, just update the TransVals appropriately – For AVERAGE it keeps sum-so-far and count-so-far • Else insert a new <GroupVals,TransVals> pair • When the Aggregate iterator sees a tuple from a new • What’s the benefit? group: – Q: How many pairs will we have to handle? 1. It produces an output for the old group based on the agg function – A: Number of distinct values of GroupVals columns E.g. for AVERAGE it returns (sum-so-far/count-so-far) • Not the number of tuples!! 2. It resets its running info. – Also probably “narrower” than the tuples 3. It updates the running info with the new tuple’s info • Can we play the same trick during sorting? Hashing for Grouped Aggregation Analysis • How big can a partition be ? • How big of a table can we process? – As big as can fit into the hashtable during rehash – B-1 “spill partitions” in Phase 1 – For grouped aggs, we have one entry per group ! – Each limited by the number of unique tuples per – So, the key is : the number of unique groups ! partition and that can be accommodated in the hash table (U ) – A partition’s size is only limited by the H number of unique groups in the partition • Have a bigger table? Recursive partitioning! • Similar analysis holds for duplicate elimination – In the ReHash phase, if a partition b has more – Note: Can think of dup-elem as a grouped agg unique tuples than UH, then recurse: • pretend that b is a table we need to hash, run the – All tuples that contribute to the agg are identical Partitioning phase on b, and then the ReHash phase on – So any tuple of a “group” is a “representative” each of its (sub)partitions Even Better: Hybrid Hashing Analysis: Hybrid Hashing, GroupAgg • What if the set of <GroupVals,TransVals> pairs fits in memory • H buffers in all: – It would be a waste to spill it to disk and read it all back! – In Phase 1: P “spill partitions”, H-P buffers for – Recall this could be true even if there are tons of tuples! hash table • Idea: keep a smaller 1st partition in memory during – Subsequent phases: H-1 buffers for hash table phase 1! • How big of a table can we process ? – Output its stuff Original k-buffer hashtable at the end of Relation OUTPUT Partitions – Each of the P partitions is limited by the number Phase 1. 2 of unique tuples per partition and that can be 1 2 accommodated in the hash table (U ) – Q: how do we 3 H choose the hr 3 – Note that that UH depends on the phase ! number k? INPUT . • In Phase 1 UH is based on H-P buffers hh B-k B-k • In subsequent phases UH is based on H-1 buffers Disk B main memory buffers Disk 3 Using an Index for Selections Simple Selections (cont) • Cost depends on #qualifying tuples, and clustering. • With no index, unsorted: – Cost: – Must essentially scan the whole relation • finding qualifying data entries (typically small) – cost is M (#pages in R). For “reserves” = 1000 I/Os. • plus cost of retrieving records (could be large w/o • With no index, sorted: clustering). – cost of binary search + number of pages containing results. – In example “reserves” relation, if 10% of tuples qualify – For reserves = 10 I/Os + Èselectivity*#pages˘ (100 pages, 10000 tuples). • With an index on selection attribute: • With a clustered index, cost is little more than 100 I/Os; – Use index to find qualifying data entries, – then retrieve corresponding data records. • If unclustered, could be up to 10000 I/Os! – Cost? – Unless you get fancy… Projection (DupElim) Simple Nested Loops Join SELECT DISTINCT R.sid, R.bid • Issue is removing duplicates. foreach tuple r in R do FROM Reserves R • Basic approach is to use sorting foreach tuple s in S do – 1. Scan R, extract only the needed attrs (why do this 1st?) if ri == sj then add <r, s> to result – 2. Sort the resulting set • For each tuple in the outer relation R, we scan the – 3.
Recommended publications
  • The Influence of Caches on the Performance of Sorting
    The Influence of Caches on the Performance of Sorting Anthony LaMarca* & Richard E. Ladner Department of Computer Science and Engineering University of Washington Seattle, WA 98195 lamarcaQparc.xerox.com [email protected] Abstract quicksort [12], and radix sort*. Heapsort, mergesort, We investigate the effect that caches have on the per- and quicksort are all comparison based sorting algo- formance of sorting algorithms both experimentally and rithms while radix sort is not. analytically. To address the performance problems that For each of the four sorting algorithms we choose an high cache miss penalties introduce we restructure heap- implementation variant with potential for good overall sort, mergesort and quicksort in order to improve their performance and then heavily optimize this variant us- cache locality. For all three algorithms the improvement ing traditional techniques to minimize the number of in cache performance leads to a reduction in total ex- instructions executed. These heavily optimized algo- ecution time. We also investigate the performance of rithms form the baseline for comparison. For each of radix sort. Despite the extremely low instruction count the comparison sort baseline algorithms we develop and incurred by this linear sorting algorithm, its relatively apply memory optimizations in order to improve cache poor cache performance results in worse overall perfor- performance and, hopefully, overall performance. For mance than the efficient comparison based sorting algo- radix sort we optimize cache performance by varying rithms. the radix. In the process we develop some simple an- alytic techniques which enable us to predict the mem- 1 Introduction. ory performance of these algorithms in terms of cache misses.
    [Show full text]
  • 17. Chapter 11
    11 EXTERNALSORTING Good order is the foundation of all things. —Edmund Burke Sorting a collection of records on some (search) key is a very useful operation. The key can be a single attribute or an ordered list of attributes, of course. Sorting is required in a variety of situations, including the following important ones: Users may want answers in some order; for example, by increasing age (Section 5.2). Sorting records is the first step in bulk loading a tree index (Section 9.8.2). Sorting is useful for eliminating duplicate copies in a collection of records (Chapter 12). A widely used algorithm for performing a very important relational algebra oper- ation, called join, requires a sorting step (Section 12.5.2). Although main memory sizes are increasing, as usage of database systems increases, increasingly larger datasets are becoming common as well. When the data to be sorted is too large to fit into available main memory, we need to use an external sorting algorithm. Such algorithms seek to minimize the cost of disk accesses. We introduce the idea of external sorting by considering a very simple algorithm in Section 11.1; using repeated passes over the data, even very large datasets can be sorted with a small amount of memory. This algorithm is generalized to develop a realistic external sorting algorithm in Section 11.2. Three important refinements are discussed. The first, discussed in Section 11.2.1, enables us to reduce the number of passes. The next two refinements, covered in Section 11.3, require us to consider a more detailed model of I/O costs than the number of page I/Os.
    [Show full text]
  • External Sorting Why Sort? Sorting a File in RAM 2-Way Sort of N Pages
    Why Sort? A classic problem in computer science! Data requested in sorted order – e.g., find students in increasing gpa order Sorting is the first step in bulk loading of B+ tree External Sorting index. Sorting is useful for eliminating duplicate copies in a collection of records (Why?) Chapter 13 Sort-merge join algorithm involves sorting. Problem: sort 100Gb of data with 1Gb of RAM. – why not virtual memory? Take a look at sortbenchmark.com Take a look at main memory sort algos at www.sorting-algorithms.com Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Database Management Systems, R. Ramakrishnan and J. Gehrke 2 Sorting a file in RAM 2-Way Sort of N pages Requires Minimum of 3 Buffers Three steps: – Read the entire file from disk into RAM Pass 0: Read a page, sort it, write it. – Sort the records using a standard sorting procedure, such – only one buffer page is used as Shell sort, heap sort, bubble sort, … (100’s of algos) – How many I/O’s does it take? – Write the file back to disk Pass 1, 2, 3, …, etc.: How can we do the above when the data size is 100 – Minimum three buffer pages are needed! (Why?) or 1000 times that of available RAM size? – How many i/o’s are needed in each pass? (Why?) And keep I/O to a minimum! – How many passes are needed? (Why?) – Effective use of buffers INPUT 1 – Merge as a way of sorting – Overlap processing and I/O (e.g., heapsort) OUTPUT INPUT 2 Main memory buffers Disk Disk Database Management Systems, R.
    [Show full text]
  • Efficient External Sorting on Flash Memory Embedded Devices
    International Journal of Database Management Systems ( IJDMS ) Vol.5, No.1, February 2013 EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada [email protected] [email protected] ABSTRACT Many embedded system applications involve storing and querying large datasets. Existing research in this area has focused on adapting and applying conventional database algorithms to embedded devices. Algorithms designed for processing queries on embedded devices must be able to execute given the small amount of available memory and energy constraints. Most embedded devices use flash memory to store large amounts of data. Flash memory has unique performance characteristics that can be exploited to improve algorithm performance. In this paper, we describe the Flash MinSort external sorting algorithm that uses an index, generated at runtime, to take advantage of fast random reads in flash memory. This algorithm adapts to the amount of memory available and performs best in applications where sort keys are clustered. Experimental results show that Flash MinSort is two to ten times faster than previous approaches for small memory sizes where external merge sort is not executable. KEYWORDS sorting, sensor node, flash memory, query processing 1. INTRODUCTION Embedded systems are devices that perform a few simple functions. Most embedded systems, such as sensor networks, smart cards and certain hand-held devices, are computationally constrained. These devices typically have a low-power microprocessor, limited amount of memory, and flash-based data storage. In addition, some battery-powered devices, such as sensor networks, must be deployed for months at a time without being replaced.
    [Show full text]
  • External Sorting
    External sorting R & G – Chapter 13 Brian Cooper Yahoo! Research A little bit about Y! Yahoo! is the most visited website in the world Sorry Google 500 million unique visitors per month 74 percent of U.S. users use Y! (per month) 13 percent of U.S. users’ online time is on Y! Why sort? Why sort? Users usually want data sorted Sorting is first step in bulk-loading a B+ tree Sorting useful for eliminating duplicates Sort-merge join algorithm involves sorting Banana Apple Grapefruit Banana Apple Blueberry Orange Grapefruit Mango Kiwi Kiwi Mango Strawberry Orange Blueberry Strawberry So? Don’t we know how to sort? Quicksort Mergesort Heapsort Selection sort Insertion sort Radix sort Bubble sort Etc. Why don’t these work for databases? Key problem in database sorting 4 GB: $300 480 GB: $300 How to sort data that does not fit in memory? Example: merge sort Banana Banana Banana Grapefruit Grapefruit Banana Grapefruit Grapefruit Apple Apple Apple Apple Orange Orange Orange Orange Mango Kiwi Mango Mango Kiwi Strawberry Kiwi Kiwi Mango Blueberry Strawberry Blueberry Strawberry Blueberry Blueberry Strawberry Example: merge sort Banana Apple Apple Grapefruit Banana Banana Grapefruit Blueberry Apple Orange Grapefruit Orange Kiwi Mango Orange Kiwi Strawberry Mango Blueberry Kiwi Blueberry Mango Strawberry Strawberry Isn’t that good enough? Consider a file with N records Merge sort is O(N lg N) comparisons We want to minimize disk I/Os Don’t want to go to disk O(N lg N) times! Key insight: sort based on pages, not records Read
    [Show full text]
  • Algorithms and Data Structures for External Memory Algorithms and Data Structures 2:4 for External Memory Jeffrey Scott Vitter
    TCSv2n4.qxd 4/24/2008 11:56 AM Page 1 FnT TCS 2:4 Foundations and Trends® in Theoretical Computer Science Algorithms and Data Structures for External MemoryAlgorithms and Data Structures for Vitter Scott Jeffrey Algorithms and Data Structures 2:4 for External Memory Jeffrey Scott Vitter Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. Algorithms and Data Structures Algorithms and Data Structures for External Memory surveys the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. A variety of EM paradigms are considered for for External Memory solving batched and online problems efficiently in external memory. Jeffrey Scott Vitter Algorithms and Data Structures for External Memory describes several useful paradigms for the design and implementation of efficient EM algorithms and data structures. The problem domains considered include sorting, permuting, FFT, scientific computing, computational geometry, graphs, databases, geographic information systems, and text and string processing. Algorithms and Data Structures for External Memory is an invaluable reference for anybody interested in, or conducting research in the design, analysis, and implementation of algorithms and data structures. This book is originally published as Foundations and Trends® in Theoretical Computer Science Volume 2 Issue 4, ISSN: 1551-305X. now now the essence of knowledge Algorithms and Data Structures for External Memory Algorithms and Data Structures for External Memory Jeffrey Scott Vitter Department of Computer Science Purdue University West Lafayette Indiana, 47907–2107 USA [email protected] Boston – Delft Foundations and TrendsR in Theoretical Computer Science Published, sold and distributed by: now Publishers Inc.
    [Show full text]
  • 14.1 Sorting
    436 14. COMBINATORIAL PROBLEMS INPUT OUTPUT 14.1 Sorting Input description:Asetofn items. Problem description: Arrange the items in increasing (or decreasing) order. Discussion: Sorting is the most fundamental algorithmic problem in computer science. Learning the different sorting algorithms is like learning scales for a mu- sician. Sorting is the first step in solving a host of other algorithm problems, as shown in Section 4.2 (page 107). Indeed, “when in doubt, sort” is one of the first rules of algorithm design. Sorting also illustrates all the standard paradigms of algorithm design. The re- sult is that most programmers are familiar with many different sorting algorithms, which sows confusion as to which should be used for a given application. The following criteria can help you decide: • How many keys will you be sorting? – For small amounts of data (say n ≤ 100), it really doesn’t matter much which of the quadratic-time algorithms you use. Insertion sort is faster, simpler, and less likely to be buggy than bubblesort. Shellsort is closely related to, but much faster than, insertion sort, but it involves looking up the right insert sequences in Knuth [Knu98]. When you have more than 100 items to sort, it is important to use an O(n lg n)-time algorithm like heapsort, quicksort, or mergesort. There are various partisans who favor one of these algorithms over the others, but since it can be hard to tell which is fastest, it usually doesn’t matter. Once you get past (say) 5,000,000 items, it is important to start thinking about external-memory sorting algorithms that minimize disk access.
    [Show full text]
  • External Sorting and Query Evaluation (R&G Ch
    Faloutsos 15-415 CMU SCS CMU SCS Why Sort? Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture 11: external sorting and query evaluation (R&G ch. 13 and 14) Faloutsos 15-415 1 Faloutsos 15-415 2 CMU SCS CMU SCS Why Sort? Outline • select ... order by – e.g., find students in increasing gpa order • two-way merge sort • bulk loading B+ tree index. • external merge sort • duplicate elimination (select distinct) • fine-tunings • select ... group by • B+ trees for sorting • Sort-merge join algorithm involves sorting. Faloutsos 15-415 3 Faloutsos 15-415 4 CMU SCS CMU SCS 2-Way Sort: Requires 3 Buffers Two-Way External Merge Sort • Pass 0: Read a page, sort it, write it. 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file • Each pass we read + PASS 0 – only one buffer page is used 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs write each page in file. PASS 1 4,7 1,3 • Pass 1, 2, 3, …, etc.: requires 3 buffer pages 2,3 2-page runs 4,6 8,9 5,6 2 – merge pairs of runs into runs twice as long PASS 2 2,3 – three buffer pages used. 4,4 1,2 4-page runs 6,7 3,5 8,9 6 PASS 3 INPUT 1 1,2 OUTPUT 2,3 INPUT 2 3,4 8-page runs 4,5 6,6 Main memory buffers Disk Disk 7,8 Faloutsos 15-415 5 Faloutsos 15-415 9 6 1 Faloutsos 15-415 CMU SCS CMU SCS Two-Way External Merge Sort Two-Way External Merge Sort 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file • Each pass we read + PASS 0 • Each pass we read + PASS 0 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs write each page in file.
    [Show full text]
  • 8 File Processing and External Sorting
    8 File Processing and External Sorting In earlier chapters we discussed basic data structures and algorithms that operate on data stored in main memory. Sometimes the application at hand requires that large amounts of data be stored and processed, so much data that they cannot all fit into main memory. In that case, the data must reside on disk and be brought into main memory selectively for processing. You probably already realize that main memory access is much faster than ac- cess to data stored on disk or other storage devices. In fact, the relative difference in access times is so great that efficient disk-based programs require a different ap- proach to algorithm design than most programmers are used to. As a result, many programmers do a poor job when it comes to file processing applications. This chapter presents the fundamental issues relating to the design of algo- rithms and data structures for disk-based applications. We begin with a descrip- tion of the significant differences between primary memory and secondary storage. Section 8.2 discusses the physical aspects of disk drives. Section 8.3 presents basic methods for managing buffer pools. Buffer pools will be used several times in the following chapters. Section 8.4 discusses the C++ model for random access to data stored on disk. Sections 8.5 to 8.8 discuss the basic principles of sorting collections of records too large to fit in main memory. 8.1 Primary versus Secondary Storage Computer storage devices are typically classified into primary or main memory and secondary or peripheral storage.
    [Show full text]
  • External Sort in Data Structure with Example
    External Sort In Data Structure With Example Tawie and new Emmit crucified his stutter amuses impound inflexibly. Double-tongued Rad impressed jingoistically while Douglas always waver his eye chromatograph shallowly, he analyzing so evilly. Lexicographical Pate hutches salaciously. Thus it could be implemented, external sort exist, for satisfying conditions of the number of the number of adding one Any particular order as follows the items into the increased capacity in external data structure with a linked list in the heap before the web log file. Such loading and external sort in data structure with example, external sorting algorithms it can answer because all. For production work, I still suggest that you use a database. Many uses external sorting algorithms. The overall solution is found by describing a method to merge two sorted sequences. In the latter case, we copy this run to the appropriate tape. Repeatedly, the smallest item is removed from the selection tree and placed in the output stream, and the next item from the input file is inserted in its place as a leaf in the selection tree. We will do this for the randomized version. Was a run legths as is related data. In applications a gap using the example, the optimal sorting on the numbers from what is internal sorting algorithms are twice as popping the copies the practical. The external sorting. Till now, we saw that sorting is an important term in any database system. Note that in a rooted tree, the root has no parent and all other nodes have a single parent.
    [Show full text]
  • Leyenda: an Adaptive, Hybrid Sorting Algorithm for Large Scale Data with Limited Memory
    Leyenda: An Adaptive, Hybrid Sorting Algorithm for Large Scale Data with Limited Memory Yuanjing Shi Zhaoxing Li University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Abstract tems. Joins are the most computationally expensive among Sorting is the one of the fundamental tasks of modern data all SparkSQL operators [6]. With its distributed nature, Spark management systems. With Disk I/O being the most-accused chooses proper strategies for join based on joining keys and performance bottleneck [14] and more computation-intensive size of data. Overall, there are three different join implementa- workloads, it has come to our attention that in heterogeneous tions, Broadcast Hash Join, Shuffle Hash Join and Sort Merge environment, performance bottleneck may vary among differ- Join. Broadcast Hash Join is thefirst choice if one side of the ent infrastructure. As a result, sort kernels need to be adap- joining table is broadcastable, which means its size is smaller tive to changing hardware conditions. In this paper, we pro- than a preset ten-megabyte threshold. Although it is the most pose Leyenda, a hybrid, parallel and efficient Radix Most- performant, it is only applicable to a small set of real world Significant-Bit (MSB) MergeSort algorithm, with utilization use cases. Between Shuffle Hash Join and Sort Merge Join, of local thread-level CPU cache and efficient disk/memory since Spark 2.3 [2], Sort Merge Join is always preferred over I/O. Leyenda is capable of performing either internal or ex- Shuffle Hash Join. ternal sort efficiently, based on different I/O and processing The implementation of Sparks’ Sort Merge Join is simi- conditions.
    [Show full text]
  • Cache-Oblivious Sorting (1999; Frigo, Leiserson, Prokop, Ramachandran)
    Cache-Oblivious Sorting (1999; Frigo, Leiserson, Prokop, Ramachandran) Gerth Stølting Brodal, Universty of Aarhus, www.daimi.au.dk/˜gerth INDEX TERMS: Sorting, permuting, cache-obliviousness. SYNONYMS: Funnet sort 1 PROBLEM DEFINITION Sorting a set of elements is one of the most well studied computational problems. In the cache- oblivious setting the first study of sorting was presented in 1999 in the seminal paper by Frigo et al. [8] that introduced the cache-oblivious framework for developing algorithms aimed at machines with (unknown) hierarchical memory. Model In the cache-oblivious setting the computational model is a machine with two levels of memory: a cache of limited capacity and a secondary memory of infinite capacity. The capacity of the cache is assumed to be M elements and data is moved between the two levels of memory in blocks of B consecutive elements. Computations can only be performed on elements stored in cache, i.e. elements from secondary memory need to be moved to the cache before operations can access the elements. Programs are written as acting directly on one unbounded memory, i.e. programs are like standard RAM programs. The necessary block transfers between cache and secondary memory are handled automatically by the model, assuming an optimal offline cache replacement strategy. The core assumption of the cache-oblivious model is that M and B are unknown to the algorithm whereas in the related I/O model introduced by Aggarwal and Vitter [1] the algorithms know M and B and the algorithms perform the block transfers explicitly. A throughout discussion of the cache-oblivious model and its relation to multi-level memory hierarchies is given in [8].
    [Show full text]