Sorting Presorted Data

Sorting presorted data Vincent Jugé LIGM – Université Gustave Eiffel, ESIEE, ENPC & CNRS 14/06/2021 Joint work with N. Auger, C. Nicaud, C. Pivoteau & G. Khalighinejad Université Gustave Eiffel Sharif University of Technology V. Jugé Sorting presorted data No! MergeSort has a worst-case time complexity of O(n log(n)) Can we do better? Proof: There are n! possible reorderings Each element comparison gives a 1-bit information END OF TALK! Thus log2(n!) ∼ n log2(n) tests are required Sorting data 0 2 2 3 4 0 1 5 4 1 2 3 0 0 1 1 2 2 2 3 3 4 4 5 V. Jugé Sorting presorted data No! Proof: There are n! possible reorderings Each element comparison gives a 1-bit information END OF TALK! Thus log2(n!) ∼ n log2(n) tests are required Sorting data 0 2 2 3 4 0 1 5 4 1 2 3 0 0 1 1 2 2 2 3 3 4 4 5 MergeSort has a worst-case time complexity of O(n log(n)) Can we do better? V. Jugé Sorting presorted data END OF TALK! Sorting data 0 2 2 3 4 0 1 5 4 1 2 3 0 0 1 1 2 2 2 3 3 4 4 5 MergeSort has a worst-case time complexity of O(n log(n)) Can we do better? No! Proof: There are n! possible reorderings Each element comparison gives a 1-bit information Thus log2(n!) ∼ n log2(n) tests are required V. Jugé Sorting presorted data Sorting data 0 2 2 3 4 0 1 5 4 1 2 3 0 0 1 1 2 2 2 3 3 4 4 5 MergeSort has a worst-case time complexity of O(n log(n)) Can we do better? No! Proof: There are n! possible reorderings Each element comparison gives a 1-bit information END OF TALK! Thus log2(n!) ∼ n log2(n) tests are required V. Jugé Sorting presorted data 0 1 1 0 2 1 0 2 0 2 0 1 5 × 0 4 × 1 3 × 2 0 0 0 0 0 1 1 1 1 2 2 2 Cannot we ever do better? In some cases, we should. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 V. Jugé Sorting presorted data Cannot we ever do better? In some cases, we should. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 1 0 2 1 0 2 0 2 0 1 5 × 0 4 × 1 3 × 2 0 0 0 0 0 1 1 1 1 2 2 2 V. Jugé Sorting presorted data 4 runs of lengths 5, 3, 1 and 3 2 New parameters: Number of runs (ρ) and their lengths (r1;:::; rρ) Pρ New parameters: Run-length entropy: H = i=1(ri =n) log2(n=ri ) New parameters: Run-length entropy: H 6 log2(ρ) 6 log2(n) Theorem [1,2,4,7,11] Some merge sort has a worst-case time complexity of O(n + n H) We cannot do better than Ω(n + n H)![4] Reading the whole input requires a time Ω(n) There are X possible reorderings, with X 21−ρ n 2n H=2 > r1 ::: rρ > Let us do better! 0 2 2 3 4 0 1 5 4 1 2 3 1 Chunk your data in non-decreasing runs V. Jugé Sorting presorted data Pρ New parameters: Run-length entropy: H = i=1(ri =n) log2(n=ri ) New parameters: Run-length entropy: H 6 log2(ρ) 6 log2(n) Theorem [1,2,4,7,11] Some merge sort has a worst-case time complexity of O(n + n H) We cannot do better than Ω(n + n H)![4] Reading the whole input requires a time Ω(n) There are X possible reorderings, with X 21−ρ n 2n H=2 > r1 ::: rρ > Let us do better! 4 runs of lengths 5, 3, 1 and 3 0 2 2 3 4 0 1 5 4 1 2 3 1 Chunk your data in non-decreasing runs 2 New parameters: Number of runs (ρ) and their lengths (r1;:::; rρ) V. Jugé Sorting presorted data Theorem [1,2,4,7,11] Some merge sort has a worst-case time complexity of O(n + n H) We cannot do better than Ω(n + n H)![4] Reading the whole input requires a time Ω(n) There are X possible reorderings, with X 21−ρ n 2n H=2 > r1 ::: rρ > Let us do better! 4 runs of lengths 5, 3, 1 and 3 0 2 2 3 4 0 1 5 4 1 2 3 1 Chunk your data in non-decreasing runs 2 New parameters: Number of runs (ρ) and their lengths (r1;:::; rρ) Pρ New parameters: Run-length entropy: H = i=1(ri =n) log2(n=ri ) New parameters: Run-length entropy: H 6 log2(ρ) 6 log2(n) V. Jugé Sorting presorted data We cannot do better than Ω(n + n H)![4] Reading the whole input requires a time Ω(n) There are X possible reorderings, with X 21−ρ n 2n H=2 > r1 ::: rρ > Let us do better! 4 runs of lengths 5, 3, 1 and 3 0 2 2 3 4 0 1 5 4 1 2 3 1 Chunk your data in non-decreasing runs 2 New parameters: Number of runs (ρ) and their lengths (r1;:::; rρ) Pρ New parameters: Run-length entropy: H = i=1(ri =n) log2(n=ri ) New parameters: Run-length entropy: H 6 log2(ρ) 6 log2(n) Theorem [1,2,4,7,11] Some merge sort has a worst-case time complexity of O(n + n H) V. Jugé Sorting presorted data We cannot do better than Ω(n + n H)![4] Reading the whole input requires a time Ω(n) There are X possible reorderings, with X 21−ρ n 2n H=2 > r1 ::: rρ > Let us do better! 4 runs of lengths 5, 3, 1 and 3 0 2 2 3 4 0 1 5 4 1 2 3 1 Chunk your data in non-decreasing runs 2 New parameters: Number of runs (ρ) and their lengths (r1;:::; rρ) Pρ New parameters: Run-length entropy: H = i=1(ri =n) log2(n=ri ) New parameters: Run-length entropy: H 6 log2(ρ) 6 log2(n) Theorem [1,2,4,7,11] TimSort has a worst-case time complexity of O(n + n H) V. Jugé Sorting presorted data Let us do better! 4 runs of lengths 5, 3, 1 and 3 0 2 2 3 4 0 1 5 4 1 2 3 1 Chunk your data in non-decreasing runs 2 New parameters: Number of runs (ρ) and their lengths (r1;:::; rρ) Pρ New parameters: Run-length entropy: H = i=1(ri =n) log2(n=ri ) New parameters: Run-length entropy: H 6 log2(ρ) 6 log2(n) Theorem [1,2,4,7,11] TimSort has a worst-case time complexity of O(n + n H) We cannot do better than Ω(n + n H)![4] Reading the whole input requires a time Ω(n) There are X possible reorderings, with X 21−ρ n 2n H=2 > r1 ::: rρ > V. Jugé Sorting presorted data PJ J P A J O 1 2 2 2 2 3 4 1 Invented by Tim Peters[3] 2 Standard algorithm in Python Standard algorithm————————for non-primitiveAndroid, arraysJava, Octave in 3 1st worst-case complexity analysis[6] – TimSort works in time O(n log n) 4 Refined worst-case analysis[7] – TimSort works in time O(n + n H) Bugs uncovered in Python & Java implementations[5,7] A brief history of TimSort 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 ’20 ’21 V. Jugé Sorting presorted data PJ J P A J O 2 2 2 2 3 4 2 Standard algorithm in Python Standard algorithm————————for non-primitiveAndroid, arraysJava, Octave in 3 1st worst-case complexity analysis[6] – TimSort works in time O(n log n) 4 Refined worst-case analysis[7] – TimSort works in time O(n + n H) Bugs uncovered in Python & Java implementations[5,7] A brief history of TimSort 1 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 ’20 ’21 1 Invented by Tim Peters[3] V. Jugé Sorting presorted data PJ J 3 4 3 1st worst-case complexity analysis[6] – TimSort works in time O(n log n) 4 Refined worst-case analysis[7] – TimSort works in time O(n + n H) Bugs uncovered in Python & Java implementations[5,7] A brief history of TimSort P A J O 1 2 2 2 2 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 ’20 ’21 1 Invented by Tim Peters[3] 2 Standard algorithm in Python Standard algorithm————————for non-primitiveAndroid, arraysJava, Octave in V. Jugé Sorting presorted data PJ J 4 4 Refined worst-case analysis[7] – TimSort works in time O(n + n H) Bugs uncovered in Python & Java implementations[5,7] A brief history of TimSort P A J O 1 2 2 2 2 3 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 ’20 ’21 1 Invented by Tim Peters[3] 2 Standard algorithm in Python Standard algorithm————————for non-primitiveAndroid, arraysJava, Octave in 3 1st worst-case complexity analysis[6] – TimSort works in time O(n log n) V.

Sorting Presorted Data

The Analysis and Synthesis of a Parallel Sorting Engine Redacted for Privacy Abstract Approv, John M

Parallel Sorting Algorithms + Topic Overview

View Publication

From Merge Sort to Timsort Nicolas Auger, Cyril Nicaud, Carine Pivoteau

An Efficient Implementation of Batcher's Odd-Even Merge

Sequential Merge Sort 2 Let's Make Mergesort Parallel

An Efficient Methodology to Sort Large Volume of Data

Parallel Techniques

Mergesort, Quicksort Oct

Parallel Implementation of Sorting Algorithms

Strategies for Stable Merge Sorting

Sorting with Gpus: a Survey Dmitri I