External Sorting Why Sort? Sorting a File in RAM 2-Way Sort of N Pages
Total Page:16
File Type:pdf, Size:1020Kb
Why Sort? A classic problem in computer science! Data requested in sorted order – e.g., find students in increasing gpa order Sorting is the first step in bulk loading of B+ tree External Sorting index. Sorting is useful for eliminating duplicate copies in a collection of records (Why?) Chapter 13 Sort-merge join algorithm involves sorting. Problem: sort 100Gb of data with 1Gb of RAM. – why not virtual memory? Take a look at sortbenchmark.com Take a look at main memory sort algos at www.sorting-algorithms.com Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Database Management Systems, R. Ramakrishnan and J. Gehrke 2 Sorting a file in RAM 2-Way Sort of N pages Requires Minimum of 3 Buffers Three steps: – Read the entire file from disk into RAM Pass 0: Read a page, sort it, write it. – Sort the records using a standard sorting procedure, such – only one buffer page is used as Shell sort, heap sort, bubble sort, … (100’s of algos) – How many I/O’s does it take? – Write the file back to disk Pass 1, 2, 3, …, etc.: How can we do the above when the data size is 100 – Minimum three buffer pages are needed! (Why?) or 1000 times that of available RAM size? – How many i/o’s are needed in each pass? (Why?) And keep I/O to a minimum! – How many passes are needed? (Why?) – Effective use of buffers INPUT 1 – Merge as a way of sorting – Overlap processing and I/O (e.g., heapsort) OUTPUT INPUT 2 Main memory buffers Disk Disk Database Management Systems, R. Ramakrishnan and J. Gehrke 3 Database Management Systems, R. Ramakrishnan and J. Gehrke 4 Two-Way External Merge Sort General External Merge Sort Each pass we read + write Suppose we can have more than 3 buffers! Can we each page in file. 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file PASS 0 use them effectively and do better? N pages in the file => the 1,3 2 1-page runs 3,42,6 4,9 7,8 5,6 number of passes PASS 1 To sort a file with N pages using B buffer pages: 4,7 1,3 2,3 2-page runs – Pass 0: use B buffer pages. Produce NB / sorted runs of B log2 N 1 4,6 8,9 5,6 2 PASS 2 pages each. So total cost is: 2,3 – Pass 2, …, etc.: merge B-1 runs. 4,4 1,2 4-page runs 21NNlog2 6,7 3,5 6 8,9 INPUT 1 PASS 3 Idea: Divide and conquer: 1,2 INPUT 2 sort subfiles and merge OUTPUT 2,3 . 3,4 8-page runs 4,5 INPUT B-1 Can we improve upon 6,6 Disk this? How? 7,8 Disk B Main memory buffers 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 5 Database Management Systems, R. Ramakrishnan and J. Gehrke 6 Cost of External Merge Sort Number of Passes of External Sort Number of passes: 1 logB1 NB / Cost = 2N * (# of passes) E.g., with 5 buffer pages, to sort 108 page file: N B=3 B=5 B=9 B=17 B=129 B=257 – Pass 0: 108 / 5 = 22 sorted runs of 5 pages each 100 7 4 3 2 1 1 (last run is only 3 pages) 1,000 10 5 4 3 2 2 – Pass 1: 22 / 4 = 6 sorted runs of 20 pages each (last run is only 8 pages) 10,000 13 7 5 4 2 2 – Pass 2: 2 sorted runs, 80 pages and 28 pages 100,000 17 9 6 5 3 3 – Pass 3: Sorted file of 108 pages 1,000,000 20 10 7 5 3 3 Note that with 3 buffers, initial can be of 3- 10,000,000 23 12 8 6 4 3 page runs (not 1) 100,000,000 26 14 9 7 4 4 –Cost is: 1 + 2Nlog2 N /3 1 1,000,000,000 30 15 10 8 5 4 Database Management Systems, R. Ramakrishnan and J. Gehrke 7 Database Management Systems, R. Ramakrishnan and J. Gehrke 8 Internal (main memory) Sort Algorithm Min-Heap Quicksort is a fast way to sort in memory. – A divide-and-conquer algorithm – Partition initial array into 2 (preferably equal size) with some property for each partition – Sort each partition recursively – In-place sort algorithm – Sorts a fixed size input to generate a fixed-size output! An alternative is “tournament sort” (aka “heapsort”) – You build a max- or min-heap (binary tree with some property) A node’s key >= its children’s keys Can be implemented using an array – Can HEAPIFY a HEAP to insert a new value! http://www.sorting-algorithms.com/random-initial-order Database Management Systems, R. Ramakrishnan and J. Gehrke 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 10 Internal (main memory) Sort Algorithm More on Heapsort Fact: average length of a run in heapsort is 2B Given B buffers, Use 1 input, B-2 current set, and 1 – The “snowplow” analogy output buffer Worst-Case: Use heapsort on the current set and output the – What is min length of a run? smallest to output buffer – How does this arise? Insert new record into current set and output the smallest from current set which is greater than the Best-Case: largest in the output (for ascending sort) – What is max length of a run? – How does this arise? Terminating condition B – when all values in the current set is smaller than the output, Quicksort is faster, but our aim is to reduce start a new run the number of initial runs and hence reduce Instead of discreet sort, we are doing a continuous the number of passes!! sort (snow plow example) – Hence heapsort is better for disk-based sorting! Database Management Systems, R. Ramakrishnan and J. Gehrke 11 Database Management Systems, R. Ramakrishnan and J. Gehrke 12 I/O for External Merge Sort I/O for External Merge Sort (2) … longer runs often means fewer passes! Blocked I/O We are assuming that I/O is done one page at a – Suppose a block is b pages time – We need b buffer pages for output (1 block) In fact, can read a block of pages sequentially! – We can only merge ceiling ((B-b)/b) runs (instead – Much faster/cheaper than reading pages of the of B-1 runs when we read 1 page at a time) block individually – If we have 10 buffer pages, we can Suggests we should make each buffer (input Either merge nine runs without using blocks, or /output) be a block of pages. Four runs if we assume 2 page blocks – This tradeoff between using blocks vs. the number – But this will reduce fan-out during merge passes! of runs needs to be taken into account for external Why? merge sort! – In practice, most files still sorted in 2-3 passes. – The good news is that with greater memory, both Minimizes I/O cost, not the # of I/O’s block sizes and #runs can be kept to a decent Also, double buffering. What does this reduce? value Database Management Systems, R. Ramakrishnan and J. Gehrke 13 Database Management Systems, R. Ramakrishnan and J. Gehrke 14 Number of Passes of Optimized Sort Blocked I/O Let b be the units of read and write N B=1,000 B=5,000 B=10,000 100 1 1 1 Given B buffers, # of runs that can be merged 1,000 1 1 1 is floor ( (B-b)/b) 10,000 2 2 1 If we have 10 buffers, we can 100,000 3 2 2 – Merge 9 runs at a time with 1 page buffer, or 1,000,000 3 2 2 – Merge 4 runs at a time with 2 page input (for each 10,000,000 4 3 3 block) and output buffer blocks 100,000,000 5 3 3 1,000,000,000 5 4 3 How does it reduce I/O cost? Block size = 32, initial pass produces runs of size 2B. Database Management Systems, R. Ramakrishnan and J. Gehrke 15 Database Management Systems, R. Ramakrishnan and J. Gehrke 16 Double Buffering Sorting Records! (http://sortbenchmark.org/) To reduce wait time for I/O request to complete, can prefetch into `shadow block’. Since 2007, it is done by a the sort benchmark committee – Tradeoff between buffers and passes; in practice, most files still sorted in 2-3 passes. Daytona (stock car) Indy (formula 1) Sort code must be general Need only sort 100-byte records purpose. with 10-byte keys. INPUT 1 INPUT 1' INPUT 2 OUTPUT INPUT 2' OUTPUT' b block size Disk INPUT k Disk INPUT k' B main memory buffers, k-way merge Does double buffering reduce the # i/o’s? Database Management Systems, R. Ramakrishnan and J. Gehrke 17 Database Management Systems, R. Ramakrishnan and J. Gehrke 18 2016, 44.8 TB/min 2016, 60.7 TB/min Tencent Sort Tencent Sort 100 TB in 134 Seconds 100 TB in 98.8 Seconds 512 nodes x (2 OpenPOWER 10-core POWER8 2.926 512 nodes x (2 OpenPOWER 10-core POWER8 GHz, 2.926 GHz, 512 GB memory, 4x Huawei ES3600P V3 1.2TB 512 GB memory, 4x Huawei ES3600P V3 1.2TB Gray NVMe SSD, NVMe SSD, Using B+ Trees for Sorting 100Gb Mellanox ConnectX4-EN) 100Gb Mellanox ConnectX4-EN) Jie Jiang, Lixiong Zheng, Junfeng Pu, Jie Jiang, Lixiong Zheng, Junfeng Pu, Xiong Cheng, Chongqing Zhao Xiong Cheng, Chongqing Zhao Tencent Corporation Tencent Corporation Mark R.