Galloping in Natural Merge Sorts

Galloping in natural merge sorts Vincent Jugé Université Gustave Eiffel, LIGM (UMR 8049), CNRS, ENPC, ESIEE Paris, UPEM Ghazal Khalighinejad Sharif University of Technology, Tehran, Iran Abstract We study the algorithm TimSort and the sub-routine it uses to merge monotonic (non-decreasing) sub-arrays, hereafter called runs. More precisely, we look at the impact on the number of element comparisons performed of using this sub-routine instead of a naive routine. In this article, we introduce a new object for measuring the complexity of arrays. This notion dual to the notion of runs on which TimSort built its success so far, hence we call it dual runs. It induces complexity measures that are dual to those induced by runs. We prove, for this new complexity measure, results that are similar to those already known when considering standard run-induced measures. Although our new results do not lead to any improvement on the number of element moves performed, they may lead to dramatic improvements on the number of element comparisons performed by the algorithm. In order to do so, we introduce new notions of fast- and middle-growth for natural merge sorts, which allow deriving the same upper bounds. After using these notions successfully on TimSort, we prove that they can be applied to a wealth of variants of TimSort and other natural merge sorts. 2012 ACM Subject Classification Theory of computation → Sorting and searching Keywords and phrases Sorting algorithms, Merge sorting algorithms, TimSort, ShiversSort, Analysis of algorithms 1 Introduction In 2002, Tim Peters, a software engineer, created a new sorting algorithm, which was called TimSort [10]. This algorithm immediately demonstrated its efficiency for sorting actual data, and was adopted as the standard sorting algorithm in core libraries of wide-spread programming languages such as Python and Java. Hence, the prominence of such a custom- made algorithm over previously preferred optimal algorithms contributed to the regain of interest in the study of sorting algorithms. Among the best-identified reasons behind the success of TimSort are the fact that this arXiv:2012.03996v1 [cs.DS] 7 Dec 2020 algorithm is well adapted to the architecture of computers (e.g., for dealing with cache issues) and to realistic distributions of data. In particular, the very conception of TimSort makes it particularly well-suited to sorting data whose run decompositions [3, 5] (see Figure 1) are simple. Such decompositions were already used by Knuth NaturalMergeSort [7, Section 5.2.4], which predated TimSort, and adapted the traditional MergeSort algorithm as follows: NaturalMergeSort is based on splitting arrays into monotonic subsequences, also called runs, and on merging these runs together. Thus, all algorithms sharing this feature of NaturalMergeSort are also called natural merge sorts. In addition to being a natural merge sort, TimSort also includes many optimisations, which were carefully engineered, through extensive testing, to offer the best complexity performances. As a result, the general structure of TimSort can be split into three main components: (i) a variant of an insertion sort, which is used to deal with small runs (e.g., 2 Galloping in natural merge sorts S =( 12, 7, 6, 5, 5, 7, 14, 36, 3, 3, 5, 21, 21, 20, 8, 5, 1 ) |first{z run} |second{z run} | third{z run } |fourth{z run} Figure 1 A sequence and its run decomposition computed by a greedy algorithm: for each run, the first two elements determine if it is non-decreasing or decreasing, then it continues with the maximum number of consecutive elements that preserves the monotonicity. runs of length less than 32), (ii) a simple policy for choosing which large runs to merge, (iii) a sub-routine for merging these runs. The second component has been subject to an intense scrutiny these last few years, thereby giving birth to a great variety of TimSort-like algorithms. On the contrary, the first and third components, which seem more complicated and whose effect may be harder to quantify, have often been used as black boxes when studying TimSort or designing variants thereof. Context and related work The success of TimSort has nurtured the interest in the quest for sorting algorithms that would be adapted to arrays with few runs. However, the ad hoc conception of TimSort made its complexity analysis less easy than what one might have hoped, and it is only in 2015, a decade after TimSort had been largely deployed, that Auger et al. proved that TimSort required O(n log(n)) comparisons for sorting arrays of length n [2]. This is optimal in the model of sorting by comparisons, if the input array can be an arbitrary array of length n. However, taking into account the run decompositions of the input array allows using finer-grained complexity classes, as follows. First, one may consider only arrays whose run decomposition consists of ρ monotonic runs. On such arrays, the best worst-case time complexity one may hope for is O(n log(ρ)) [8]. Second, we may consider even more restricted classes of input arrays, and focus only on those arrays that consist of ρ runs of lengths r1,...,rρ. In that case, the best worst-case time complexity is O(n + nH), ρ where H is defined as H = H(r1/n,...,rρ/n) and H(x1,...,xρ)= − Pi=1 xi log2(xi) is the general entropy function [3, 6]. TimSort enjoys such a O(nH) time complexity [1]. In fact, since TimSort was invented, several natural merge sorts have been proposed, all of which were meant to offer easy- to-prove complexity guarantees. Such algorithms include ShiversSort [11], which runs in time O(n log(n)); α-StackSort [2], which, like NaturalMergeSort, runs in time O(n log(ρ)); α-MergeSort [4], PeekSort and PowerSort [9], and the most recent adaptive ShiversSort [6], which, like TimSort, run in time O(n + nH). Except TimSort, these algorithms are, in fact, described only as policies for merging runs, the actual sub-routine used for merging runs being left implicit. In practice, choosing a naive merging sub-routine does not harm the worst-case time complexities considered above. As a consequence, all authors identified the cost of merging two runs of lengths m and n with the sum m + n, and the complexity of the algorithm with the sum of the costs of the merges processed. One notable exception is that of [9], whose authors compare the running times of TimSort and of TimSort’s variant where the merging sub-routine is replaced by a naive routine. While the arrays on which the performance comparisons are performed are chosen to have a low entropy H, the authors did not try to identify which arrays could benefit the most from TimSort’s sub-routine. As a result, they unsurprisingly observed that TimSort’s complex sub- routine seemed less efficient than the naive one, but their work suggests another approach: V. Jugé and G. Khalighinejad 3 finding distributions on arrays for which TimSort’s merging sub-routine will actually be helpful. Contributions We study the time complexity of TimSort and its variants in a context where we refine the family of arrays we consider. This refinement is based on notion of complexity about the input arrays that is dual to the decomposition of arrays into monotonic runs. Consider an array A whose values A[1], A[2],...,A[n] are pairwise distinct integers between 1 and n: we identify A with a permutation of the set {1, 2,...,n}. In the standard literature, we subdivide A into distinct increasing runs, i.e., we partition the set {1, 2,...,n} into intervals R1, R2,...,Rρ such that the function x 7→ A[x] is increasing on each interval Ri. A dual approach would consist in partitioning that set into intervals S1,S2,...,Sσ, which we call dual runs below, such that the function x 7→ A−1[x] is increasing on each interval ∗ Sj . It would then be feasible to sort the array A in time O(n log(σ)), or even O(n + nH ), ∗ where our intervals have lengths s1,s2,...,sσ and H = H(s1/n,...,sσ/n). In this preliminary version, we prove that, thanks to its merging sub-routine, TimSort requires O(n log σ) comparisons. In a subsequent version of this paper, we will further prove that TimSort requires only O(nH∗) comparisons. 2 TimSort and its sub-routine for merging runs In this section, we describe the sorting algorithm TimSort and the components it consists of. As mentioned in Section 1, TimSort is based on decomposing its input array into monotonic (non-decreasing) runs, which are then merged together into larger monotonic runs. Following the literature on this subject, we first focus on the policy used by TimSort to decide which runs to merge. In order to do so, each monotonic run is identified to a pair of pointers to its first and last entries in the array. TimSort discovers its runs on-the-fly, from left to right, makes them non-decreasing, and inserts them into a stack, which is used to decide which runs should be merged. These actions are performed as described in Algorithm 1. This description of TimSort relies on two black-box sub-routines for (i) finding a new run R and making it non-decreasing, and (ii) merging two runs. The first sub-routine works as follows. New runs in the input array are found by using a naive (yet optimal) algorithm, where r comparisons are used to detect a new run R of length r. If r is smaller than a given threshold, the run R is made artificially longer by absorbing elements to its right.

Galloping in Natural Merge Sorts

Sort Algorithms 15-110 - Friday 2/28 Learning Objectives

Overview Parallel Merge Sort

Data Structures & Algorithms

Selected Sorting Algorithms

Chapter 19 Searching, Sorting and Big —Solutions

Lecture 8.Key

Sorting Algorithm 1 Sorting Algorithm

Sorting Algorithm 1 Sorting Algorithm

From Merge Sort to Timsort Nicolas Auger, Cyril Nicaud, Carine Pivoteau

Comparison of Parallel Sorting Algorithms

Mergesort and Quicksort

Comparison Sorts Name Best Average Worst Memory Stable Method Other Notes Quicksort Is Usually Done in Place with O(Log N) Stack Space