Strategies for Stable Merge Sorting
Total Page:16
File Type:pdf, Size:1020Kb
Strategies for Stable Merge Sorting Sam Buss∗ Alexander Knop∗ Abstract relative order of pairs of entries in the input list. We introduce new stable natural merge sort algorithms, Information-theoretic considerations imply that any called 2-merge sort and α-merge sort. We prove upper and comparison-based sorting algorithm must make at least lower bounds for several merge sort algorithms, including log2(n!) ≈ n log2 n comparisons in the worst case. How- Timsort, Shiver's sort, α-stack sorts, and our new 2-merge ever, in many practical applications, the input is fre- and α-merge sorts. The upper and lower bounds have quently already partially sorted. There are many adap- the forms c · n log m and c · n log n for inputs of length n tive sort algorithms which will detect this and run faster comprising m runs. For Timsort, we prove a lower bound of on inputs which are already partially sorted. Natu- (1:5−o(1))n log n. For 2-merge sort, we prove optimal upper ral merge sorts are adaptive in this sense: they detect and lower bounds of approximately (1:089 ± o(1))n log m. sorted sublists (called \runs") in the input, and thereby We state similar asymptotically matching upper and lower reduce the cost of merging sublists. One very popular bounds for α-merge sort, when ' < α < 2, where ' is the stable natural merge sort is the eponymous Timsort of golden ratio. Tim Peters [25]. Timsort is extensively used, as it is in- Our bounds are in terms of merge cost; this upper cluded in Python, in the Java standard library, in GNU bounds the number of comparisons and accurately models Octave, and in the Android operating system. Timsort runtime. The merge strategies can be used for any stable has worst-case runtime O(n log n), but is designed to merge sort, not just natural merge sorts. The new 2- run substantially faster on inputs which are partially merge and α-merge sorts have better worst-case merge pre-sorted by using intelligent strategies to determine cost upper bounds and are slightly simpler to implement the order in which merges are performed. than the widely-used Timsort; they also perform better in There is extensive literature on adaptive sorts: e.g., experiments. for theoretical foundations see [21, 9, 26, 23] and for more applied investigations see [6, 12, 5, 31, 27]. The 1 Introduction present paper will consider only stable, natural merge sorts. As exemplified by the wide deployment of Tim- This paper studies stable merge sort algorithms, espe- sort, these are certainly an important class of adaptive cially natural merge sorts. We will propose new strate- sorts. We will consider the Timsort algorithm [25, 7], gies for the order in which merges are performed, and and related sorts due to Shivers [28] and Auger-Nicaud- prove upper and lower bounds on the cost of several Pivoteau [2]. We will also introduce new algorithms, the merge strategies. The first merge sort algorithm was \2-merge sort" and the \α-merge sort" for ' < α < 2 proposed by von Neumann [16, p.159]: it works by split- where ' is the golden ratio. ting the input list into sorted sublists, initially possibly The main contribution of the present paper is the lists of length one, and then iteratively merging pairs of definition of new stable merge sort algorithms, called 2- sorted lists, until the entire input is sorted. A sorting merge and α-merge sort. These have better worst-case algorithm is stable if it preserves the relative order of run times than Timsort, are slightly easier to implement elements which are not distinguished by the sort order. than Timsort, and perform better in our experiments. There are several methods of splitting the input into We focus on natural merge sorts, since they are sorted sublists before starting the merging; a merge sort so widely used. However, our central contribution is is called natural it finds the sorted sublists by detecting analyzing merge strategies and our results are applicable consecutive runs of entries in the input which are al- to any stable sorting algorithm that generates and ready in sorted order. Natural merge sorts were first merges runs, including patience sorting [5], melsort [29, proposed by Knuth [16, p.160]. 20], and split sorting [19]. Like most sorting algorithms, the merge sort is All the merge sorts we consider will use the follow- comparison-based in that it works by comparing the ing framework. (See Algorithm 1.) The input is a list of n elements, which without loss of generality may be ∗Department of Mathematics, University of California, San Diego, La Jolla, CA, USA assumed to be integers. The first logical stage of the al- Copyright c 2019 by SIAM Unauthorized reproduction of this article is prohibited gorithm (following [25]) identifies maximal length subse- Algorithm Awareness quences of consecutive entries which are in sorted order, Timsort (original) [25] 3-aware either ascending or descending. The descending subse- Timsort (corrected) [7] (4,3)-aware quences are reversed, and this partitions the input into α-stack sort [2] 2-aware \runs" R1;:::;Rm of entries sorted in non-decreasing Shivers sort [28] 2-aware order. The number of runs is m; the number of ele- 2-merge sort 3-aware P α-merge sort (ϕ<α<2) 3-aware ments in a run R is jRj. Thus i jRij = n. It is easy to see that these runs may be formed in linear time and with linear number of comparisons. Table 1: Awareness levels for merge sort algorithms Merge sort algorithms process the runs in left-to- right order starting with R1. This permits runs to be identified on-the-fly, only when needed. This means limit of the constant c. there is no need to allocate Θ(m) additional memory Algorithm 1 shows the framework for all the merge to store the runs. This also may help reduce cache sort algorithms we discuss. This is similar to what misses. On the other hand, it means that the value Auger et al. [2] call the \generic" algorithm. The m is not known until the final run is formed; thus, input S is a sequence of integers which is partitioned the natural sort algorithms do not use m except as a into monotone runs of consecutive members. The stopping condition. decreasing runs are inverted, so S is expressed as a list The runs R are called original runs. The second i R of increasing original runs R1;:::;Rm called \original logical stage of the natural merge sort algorithms re- runs". The algorithm maintains a stack Q of runs peatedly merges runs in pairs to give longer and longer Q1;:::;Q`, which have been formed from R1;:::;Rk. runs. (As already alluded to, the first and second log- Each time through the loop, it either pushes the next ical stages are interleaved in practice.) Two runs A original run, Rk+1, onto the stack Q, or it chooses a and B can be merged in linear time; indeed with only pair of adjacent runs Qi and Qi+1 on the stack and jAj + jBj − 1 many comparisons and jAj + jBj many merges them. The resulting run replaces Qi and Qi+1 movements of elements. The merge sort stops when all and becomes the new Qi, and the length of the stack original runs have been identified and merged into a decreases by one. The entries of the runs Qi are stored single run. in-place, overwriting the elements of the array which Our mathematical model for the run time of a merge held the input S. Therefore, the stack needs to hold sort algorithm is the sum, over all merges of pairs of only the positions pi (for i = 0; : : : ; ` + 1) in the input runs A and B, of jAj + jBj. We call the quantity the array where the runs Qi start, and thereby implicitly merge cost. In most situations, the run time of a natural the lengths jQij of the runs. We have p1 = 0, pointing sort algorithm can be linearly bounded in terms of its to the beginning of the input array, and for each i, we merge cost. Our main theorems are lower and upper have jQij = pi+1 − pi. The unprocessed part of S in bounds on the merge cost of several stable natural merge the input array starts at position p`+1 = p` + jQ`j. If sorts. Note that if the runs are merged in a balanced p`+1 < n, then it will be the starting position of the fashion, using a binary tree of height dlog me, then the next original run pushed onto Q. total merge cost ≤ ndlog me. (We use log to denote Algorithm 1 is called (k1; k2)-aware since its choice logarithms base 2.) Using a balanced binary tree of of what to do is based on just the lengths of the runs in merges gives a good worst-case merge cost, but it does the top k1 members of the stack Q, and since merges are not take into account savings that are available when only applied to runs in the top k2 members of Q. ([2] 1 runs have different lengths. The goal is find adaptive used the terminology \degree" instead of \aware".) In stable natural merge sorts which can effectively take all our applications, k1 and k2 are small numbers, so it advantage of different run lengths to reduce the merge is appropriate to store the runs in a stack. Usually k1 = cost, but which are guaranteed to never be much worse k2, and we write \k-aware" instead of \(k; k)-aware".