Sorting

15-121 Fall 2020 Margaret Reid-Miller Today

• Margaret will have office hours today 4-5pm Today • Quadratic sorts • O(n log n) sorts

Next time • Bucket and Radix sorts • Sorting properties

Fall 2020 15-121 (Reid-Miller) 2 Quadratic Sorts Review

• Let A be an array of n elements, and we wish to sort these elements in non-decreasing order. • Which is and which is ? • Selection sort "select" the next minimum and swaps • Insertion sort "inserts" the next element into the sorted part • These sort algorithms works in place, meaning it uses its own storage to perform the sort. part

Fall 2020 15-121 (Reid-Miller) 3 Selection Sort : Repeatedly select the minimum and add to sorted part i j i smallest UNSORTED min A SORTED

swap i i smallest UNSORTED A SORTED

Loop invariant: A[0..i-1] are the i smallest elements sorted in non-decreasing order and are in their final position

Fall 2020 15-121 (Reid-Miller) 5 public static void selectionSort(int[] a){ for (int i = 0; i < a.length-1; i++) { int minIndex = indexOfMin(a, i); int temp = a[minIndex]; a[minIndex] = a[i]; a[i] = temp; } }

// returns index of minimum, from start to end public static int indexOfMin(int[] a, int start) { int minIndex = start; for (int j = start+1; j < a.length; j++) { if (a[j] < a[minIndex]) minIndex = j; } return minIndex; }

Fall 2020 15-121 (Reid-Miller) 6 Selection Sort Example

66 44 99 55 11 88 22 77 33 11 44 99 55 66 88 22 77 33 11 22 99 55 66 88 44 77 33 11 22 33 55 66 88 44 77 99 11 22 33 44 66 88 55 77 99 11 22 33 44 55 88 66 77 99 11 22 33 44 55 66 88 77 99 11 22 33 44 55 66 77 88 99 11 22 33 44 55 66 77 88 99

Fall 2020 15-121 (Reid-Miller) 7 Selection Sort: Run time analysis

Worst Case: Search for 1st min: n-1 comparisons Search for 2nd min: n-2 comparisons ... Search for 2nd-to-last min: 1 comparison Total comparisons: (n-1) + (n-2) + ... + 2 + 1 = O(_____)n2 Average Case: = O(_____)n2 Best Case: = O(_____)n2

Fall 2020 15-121 (Reid-Miller) 8 Insertion Sort: repeatedly insert the next element into the sorted part

k i SORTED UNSORTED

insert i SORTED UNSORTED

Loop invariant: A[0..i-1] are sorted in non-decreasing order.

Fall 2020 15-121 (Reid-Miller) 10 public static void insertionSort(int[] a){ // insert a[i] into sorted part for (int i = 0; i < a.length; i++) {

int toInsert = a[i]; // creates hole int hole = i;

// move values right into to hole until // find the insertion point while (hole > 0 && toInsert < a[hole-1]){ a[hole] = a[hole-1]; hole--; } a[hole] = toInsert; }

Fall 2020 15-121 (Reid-Miller) 11 Insertion Sort Example

66 44 99 55 11 88 22 77 33 44 66 99 55 11 88 22 77 33 44 66 99 55 11 88 22 77 33 44 55 66 99 11 88 22 77 33 11 44 55 66 99 88 22 77 33 11 44 55 66 88 99 22 77 33 11 22 44 55 66 88 99 77 33 11 22 44 55 66 77 88 99 33 11 22 33 44 55 66 77 88 99

Fall 2020 15-121 (Reid-Miller) 12 Insertion sort: Runtime analysis Worst Case (when does this occur?): Insert 2nd element: 1 comparison Insert 3rd element: 2 comparisons ... Insert last element: n-1 comparisons Total comparisons: 1 + 2 + ... + (n-1) = O(____)n2 Average Case: = O(____)n2 Best Case: = O(____)n When? Insertion sort is adaptive: It’s runtime depends on the input data.

Fall 2020 15-121 (Reid-Miller) 13 Quadratic Sorts

• Quadratic sorts have a worst-case order of complexity of O(n2) • Selection sort always performs poorly, even on a sequence of sorted elements! • Insertion sort is (nearly) linear if the elements are (nearly) sorted.

Fall 2020 15-121 (Reid-Miller) 14 84 Tree Sort 41 96 24 50 98 1. Build a binary search 13 37 tree out of the elements. 2. Traverse the tree using an inorder traversal to get the elements in increasing order. Runtime to traverse? ______O(n) What is the runtime to build the search tree? build total Worst case ______O(n2) ______O(n2) Average case ______O(n log n) ______O(n log n) Best case ______O(n log n) ______O(n log n)

Fall 2020 15-121 (Reid-Miller) 15 Divide and Conquer • In computation: 1. Divide the problem into simpler versions of itself. 2. Conquer each problem using the same process (usually recursively). 3. Combine the results of the simpler versions to form your final solution.

• Examples: Towers of Hanoi, fractals, Binary Search, , , and many, many more

Fall 2020 15-121 (Reid-Miller) 16 4 Merge Sort

84 27 49 91 32 53 63 17 Divide: 84 27 49 91 32 53 63 17

Conquer: (sort recursively)

27 49 84 91 17 32 53 63 Combine: (merge) 17 27 32 49 53 63 84 91

Fall 2020 15-121 (Reid-Miller) 18 6 Merge Sort

• Split the array into two “halves”. • Sort each of the halves recursively using merge sort. • Merge the two sorted halves into a new sorted array. • Merge sort does not sort in place.

• Example: 66 33 77 55 / 11 99 22 88 44 sort the halves recursively... 33 55 66 77 / 11 22 44 88 99

Fall 2020 15-121 (Reid-Miller) 19 Merge Sort (cont’d)

Then merge the two sorted halves into a new array: 33 55 66 77 / 11 22 44 88 99 ______

33 55 66 77 / 11 22 44 88 99 11 ______

33 55 66 77 / 11 22 44 88 99 11 22 ______

Fall 2020 15-121 (Reid-Miller) 20 Merge Sort (cont’d)

33 55 66 77 / 11 22 44 88 99 11 22 33 ______

33 55 66 77 / 11 22 44 88 99 11 22 33 44 ______

33 55 66 77 / 11 22 44 88 99 11 22 33 44 55 ______

Fall 2020 15-121 (Reid-Miller) 21 Merge Sort (cont’d)

33 55 66 77 / 11 22 44 88 99 11 22 33 44 55 66 ______

44 55 66 77 / 11 22 33 88 99 11 22 33 44 55 66 77 __ __

Once one of the halves has been merged into the new array, copy the remaining element(s) of the other half into the new array: 44 55 66 77 / 11 22 33 88 99 11 22 33 44 55 66 77 88 99

Fall 2020 15-121 (Reid-Miller) 22 Analysis of Merge Sort: Divide

n

n/2 n/2

n/4 n/4 n/4 n/4 log n

n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8

… … 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Fall 2020 15-121 (Reid-Miller) 23 20 Merge in Merge Sort Always runs in O(n log n)

n 1 * n = n

n/2 n/2 2 * n/2 = n

n/4 n/4 n/4 n/4 4 * n/4 = n log n

n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 8 * n/8 = n

… … 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n * 1 = n Fall 2020 15-121 (Reid-Miller) 24 20 Comparing Big O Functions

O(2n) O(n2) Number of Operations O(n log n)

O(n)

O(log n) O(1)

n (amount of data)

Fall 2020 15-121 (Reid-Miller) 26 25 Quicksort

• Choose a pivot element of the array. • Partition the array so that - the pivot element is in its final correct position - all the elements to the left of the pivot are less than or equal to the pivot - all the elements to the right of the pivot are greater than the pivot • Sort the each partition recursively using quicksort

Fall 2020 15-121 (Reid-Miller) 27 Partition: move l right until >= p move g left until ≤ p l g p ?

l g p

p

swap l g p ≤ p ? ≥ p

Fall 2020 15-121 (Reid-Miller) 28 Partition: move l right until >= p move g left until ≤ p l g p ≤ p ? ≥ p

l g p ≤ p

p ≥ p

swap l g p ≤ p ? ≥ p

Fall 2020 15-121 (Reid-Miller) 29 Partition: stop when l and g meet or cross and put pivot between partitions

g l

p ≤ p ≤p ≥p ≥ p

swap

≤ p p ≥ p

p is in its final position

Fall 2020 15-121 (Reid-Miller) 30 Partitioning the array

Arbitrarily choose the first element as the pivot. 66 44 99 55 11 88 22 77 33 Search from the left end for the first element that is greater than (or equal to) the pivot. 66 44 99 55 11 88 22 77 33 Search from the right end for the first element that is less than (or equal to) the pivot. 66 44 99 55 11 88 22 77 33 Now swap these two elements. 66 44 33 55 11 88 22 77 99

Fall 2020 15-121 (Reid-Miller) 31 Partitioning the array (cont’d)

66 44 33 55 11 88 22 77 99 From the two elements just swapped, search again from the left and right ends for the next elements that are greater than and less than the pivot, respectively. 66 44 33 55 11 88 22 77 99 Swap these as well. 66 44 33 55 11 22 88 77 99 Continue this process until our searches from each end meet or cross.

Fall 2020 15-121 (Reid-Miller) 32 Partitioning the array (cont’d)

At this point, the array has been partitioned into two subarrays, one with elements less than (or equal to) the pivot, and the other with elements greater than (or equal to) the pivot. 66 44 33 55 11 22 88 77 99

Finally, swap the pivot with the last element in the first subarray section (the elements that are less than the pivot). 22 44 33 55 11 66 88 77 99 The pivot is now in its final position. Now sort the two subarrays on either side of the pivot using quick sort recursively.

Fall 2020 15-121 (Reid-Miller) 33 Quicksort

• Invariant: After the ith partition, the ith pivot is in its final position (i.e., all values to the left are less or equal than the pivot and all values to the right are greater than or equal the pivot). • Thus, after completing the divide and conquer phases, the data is completely sorted (every pivot is in its final position) and the combine phase is trivial. • Compare with Merge Sort where the divide phase is trivial and the conquer and combine phases do all the work.

Fall 2020 15-121 (Reid-Miller) 34 Run-Time Analysis

• What is the run time for partition? O(n) • Assume the pivot ends up in the center position of the array every time (recursively too). • Then, quicksort runs in O(n log n) time (best case) just like merge sort. • BUT, quicksort in the worst case is O(n2) – when might that be? • In practice, though, quicksort is usually O(n log n) and faster (better constants) than merge sort (and quicksort is in place). • Merge sort is better when need to stream data from disk. Fall 2020 15-121 (Reid-Miller) 35 Some Improvements to Quicksort

• Choose three values from the array, and use the middle element of the three as the pivot.

66 44 99 55 11 88 22 77 33 Of 11, 33, 66, use 33 as the pivot.

• Quick sort is called recursively and many recursive calls are "not cheap". • Stop the recursion when the subarrays are of “small size”. Now the array is almost sorted. • Apply insertion sort on the whole array. O(n)

Fall 2020 15-121 (Reid-Miller) 36 Randomized quicksort is fast

• Fact: Quicksort has expected runtime of O(n log n) averaged over all n! input orderings. • Randomized quicksort: For every partition, pick a pivot at random from the partition. • Fact: Randomized quicksort has expected runtime of O(n log n) for any input ordering. • Although it is possible for randomized quicksort to have O(n2) runtime (bad random pivots), it is highly unlikely. • If you run it again on the same data, the expected runtime will be O(n log n).

Fall 2020 15-121 (Reid-Miller) 37