https://courses.missouristate.edu/anthonyclark/325 Outline

Topics and Learning Objectives • Cover the Timsort to compare a real-world algorithm to what we have discussed in class

Assessments • None Extra Resources

Python Implementation • https://svn.python.org/projects/python/trunk/Objects/listsort.txt • https://github.com/python/cpython/blob/master/Objects/listobject.c

Java Implementation • https://hg.openjdk.java.net/jdk/jdk/file/74094a60d018/src/java.base/shar e/classes/java/util/TimSort.java

• https://www.youtube.com/watch?v=emeME__917E • https://www.bigocheatsheet.com/ Tim Peters

In a nutshell, the main routine marches over the array once, left to right, alternately identifying the next run, then merging it into the previous runs "intelligently". Everything else is complication for speed, and some hard-won measure of memory efficiency.

Now for the more interesting cases. lg(n!) is the information-theoretic limit for the best any comparison-based can do on average (across all permutations). When a method gets significantly below that, it's either astronomically lucky, or is finding exploitable structure in the data.

Timsort takes advantage of data that includes “runs” Timsort (2002)

• Fast • Best case: Ω(n) • Average case: O(n lg n) • Worst case: O(n lg n)

• Stable • It keeps the relative ordering of items that have the same key

• Uses both merge and

• Used in Python, Java, Swift, Chrome, Octave, and other major language implementations https://www.bigocheatsheet.com/ Timsort Overview

1. Identify runs and reverse runs • Unlikely to have large runs in random data • Artificially increase runs with BinaryInsertionSort (faster than InsertionSort)

2. Merge consecutive runs using a merge stack • Merging is done in-place • Merging take into account galloping

3. If you have fewer than 32 elements use BinaryInsertionSort • 16, 32, 64 and 128 worked about equally well • At 256 the data-movement cost in BinaryInsertionSort clearly hurt • At 8 the increase in the number of function calls clearly hurt • Picking some power of 2 is important so that the merges end up balanced Runs

A run is the longest non-decreasing or decreasing sequence. A:30 B:20 C:10 Merge Stack A:500 B:400: C:1000

Just indices! Not copies of subarrays temp = min(A, B) In-Place Merging If we copy from B several times in a row • Find spot of A[0] in B Galloping • Copy B • Copy A[0] Parameters

• MAX_MERGE_PENDING (85): Maximum number of entries in MergeState’s pending-runs stack

• MIN_GALLOP (7): When we get into galloping mode, we stay there until both runs win less often than MIN_GALLOP consecutive times

• MERGESTATE_TEMP_SIZE (256): Temporary array for merging

• MIN_RUN (32 ..= 64): Minimum run length Summary

• Implemented in several languages

• Uses BinaryInsertionSort and MergeSort

• Tested on real-world data and found to be better than many theory- only