14-Timsort.Pdf

14-Timsort.Pdf

Timsort https://courses.missouristate.edu/anthonyclark/325 Outline Topics and Learning Objectives • Cover the Timsort algorithm to compare a real-world sorting algorithm to what we have discussed in class Assessments • None Extra Resources Python Implementation • https://svn.python.org/projects/python/trunk/Objects/listsort.txt • https://github.com/python/cpython/blob/master/Objects/listobject.c Java Implementation • https://hg.openjdk.java.net/jdk/jdk/file/74094a60d018/src/java.base/shar e/classes/java/util/TimSort.java • https://www.youtube.com/watch?v=emeME__917E • https://www.bigocheatsheet.com/ Tim Peters In a nutshell, the main routine marches over the array once, left to right, alternately identifying the next run, then merging it into the previous runs "intelligently". Everything else is complication for speed, and some hard-won measure of memory efficiency. Now for the more interesting cases. lg(n!) is the information-theoretic limit for the best any comparison-based sorting algorithm can do on average (across all permutations). When a method gets significantly below that, it's either astronomically lucky, or is finding exploitable structure in the data. Timsort takes advantage of data that includes “runs” Timsort (2002) • Fast • Best case: Ω(n) • Average case: O(n lg n) • Worst case: O(n lg n) • Stable • It keeps the relative ordering of items that have the same key • Uses both merge sort and insertion sort • Used in Python, Java, Swift, Chrome, Octave, and other major language implementations https://www.bigocheatsheet.com/ Timsort Overview 1. Identify runs and reverse runs • Unlikely to have large runs in random data • Artificially increase runs with BinaryInsertionSort (faster than InsertionSort) 2. Merge consecutive runs using a merge stack • Merging is done in-place • Merging take into account galloping 3. If you have fewer than 32 elements use BinaryInsertionSort • 16, 32, 64 and 128 worked about equally well • At 256 the data-movement cost in BinaryInsertionSort clearly hurt • At 8 the increase in the number of function calls clearly hurt • Picking some power of 2 is important so that the merges end up balanced Runs A run is the longest non-decreasing or decreasing sequence. A:30 B:20 C:10 Merge Stack A:500 B:400: C:1000 Just indices! Not copies of subarrays temp = min(A, B) In-Place Merging If we copy from B several times in a row • Find spot of A[0] in B Galloping • Copy B • Copy A[0] Parameters • MAX_MERGE_PENDING (85): Maximum number of entries in MergeState’s pending-runs stack • MIN_GALLOP (7): When we get into galloping mode, we stay there until both runs win less often than MIN_GALLOP consecutive times • MERGESTATE_TEMP_SIZE (256): Temporary array for merging • MIN_RUN (32 ..= 64): Minimum run length Summary • Implemented in several languages • Uses BinaryInsertionSort and MergeSort • Tested on real-world data and found to be better than many theory- only algorithms.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us