CS116 - Module 8 - Efficiency: Searching and Sorting

Cameron Morland

Winter 2020

Reminder: if you have not already, ensure you:

Read the Wikipedia article on the Binary search Read the the Wikipedia articles on Selection sort, , and

1 CS116 - Module 8 - Efficiency: Searching and Sorting Efficiency: Searching and Sorting

Why are we doing this? Can’t I just use [].sort() and sorted() ? Yes! Use available sorting functions whenever possible! But these algorithms are beautiful to analyse for efficiency.

3 CS116 - Module 8 - Efficiency: Searching and Sorting Searching a List

Is 42 in the following list? [97,24,6,87,0,78,90,77,4,24,50,30,68,44,62,46,93,47,1,81,30,48,26,45,99]

With a list in no particular order, we can do no better than : check each value in the list. If you find a value which is the target, return True. If you arrive at the end of list and have not found the target, return False.

Write a function search(mylist, target) which implements linear search. Ex.

4 CS116 - Module 8 - Efficiency: Searching and Sorting Running time of linear search

Best case: mylist[0] == target, and we return immediately; O(1). Worst case: either mylist[-1] == target or target is not in mylist. We need to check every value; O(n). At this point we always consider the worst case: O(n).

5 CS116 - Module 8 - Efficiency: Searching and Sorting I need a volunteer...

Pick a number from 0 to 1000. I divide the region in 2 each step; we need only log2 1000 ≈ 10 guesses. I could go to 1 000 000 with only 20 guesses, or 1 000 000 000 with only 30 guesses.

6 CS116 - Module 8 - Efficiency: Searching and Sorting Recursive Binary Search

Is 42 in the following sorted list? [0,1,4,6,24,24,26,30,30,44,45,46,47,48,50,62,68,77,78,81,87,90,93,97,99] def bs(L,t): ''' Return True ift is inL, and False otherwise. bs:(listof Int) Int -> Bool ''' if L == []: return False elif len(L) == 1: return L[0] == t elif t < L[len(L)//2]: # first half return bs(L[:len(L)//2], t) else: # second half return bs(L[len(L)//2:], t)

It’s kind of like building a binary search , without the tree. What’s the running time? Every step we do O(n/2) = O(n) slicing, so our recurrence is T (n) = O(n) + T (n/2), which gives us only O(n) running time — the same as linear search! To do better, don’t slice the list.

7 CS116 - Module 8 - Efficiency: Searching and Sorting Iterative binary Search

We can do it recursively. How to do it iteratively? Use indices to keep track of the left and right halves of the region of interest: left indicates the lowest point it might be in, right indicates the highest point it might be in.

Using a while loop, create a function binary_search(L, target) that returns True if target is in L and False otherwise. Exercise

8 CS116 - Module 8 - Efficiency: Searching and Sorting Iterative binary Search def binary_search(L, target): if L == [] or target > L[-1] or target < L[0]: return False

left = 0 right = len(L)

while right - left > 1: middle = (right + left) // 2 if L[middle] == target: return True if L[middle] > target: # keep left half right = middle else: #L[middle] <= target; keep right half left = middle

return L[left] == target or L[right] == target Using left and right we could also do this recursively.

9 CS116 - Module 8 - Efficiency: Searching and Sorting Testing Binary Search

Probably this code is terribly buggy. We should test at least: empty list list of length 1 with and without target small lists of odd and even length longer list: The first and last values in the list. Values less than the first and greater than the last. Values near the middle of the list Values which would fit in the list, but are not present.

10 CS116 - Module 8 - Efficiency: Searching and Sorting Efficiency

What is the runtime of each iteration? How many iterations do we need? Worst case running time is O(log n). If n = 1000, we need at most 10 or 11 iterations since 210 = 1024. Doubling the input size adds just 1 iteration. n = 4 000 000 000 takes only 32 iterations.

To make binary_search more useful, we can make it return the location where target was found, or -1 if it is not found. Although the recursive solution we started with was O(n), we could build it recursively in O(log n) using indices instead of slicing.

11 CS116 - Module 8 - Efficiency: Searching and Sorting Sorting

How can we sort a list into increasing order? Many ways. We consider only here selection sort, insertion sort, and mergesort, as these can be easily analysed without statistics. (Many libraries use . Proper analysis of quicksort requires statistics, and is beyond the score of this course.)

13 CS116 - Module 8 - Efficiency: Searching and Sorting Selection Sort

Place the smallest item into L[0] Place the second item into L[1] Place the third item into L[2] ... After n − 1 steps, the list is sorted.

Get a random ordering of the first 10 natural numbers, 0 ... 9: import random L = list(range(10))

Ex. random.shuffle(L) print(L) By hand, sort L using selection sort. Count the number of comparisons.

For 10 cards, 9 comparisons. For 9 cards, 8 comparisons. For 8 cards, 7 comparisons. Pn−1 n(n−1) 2 So we get 9 + 8 + 7 + ··· + 2 + 1 = i=1 = 2 = 45 comparisons. This is O(n ). 14 CS116 - Module 8 - Efficiency: Searching and Sorting Insertion Sort

Enjoy Sorting out Sorting to 4:10. The first item of the list is sorted. Insert the first unsorted item of the list into the sorted part, keeping the sorted part sorted. Continue to expand the sorted part until it is the entire list. Running time? n times through, the ith time takes i moves. As before, O(n2) comparisions and moves. We can reduce the number of comparisons to O(n log n) by using binary search, but we still need O(n2) moves. For large n it will be roughly twice as fast, but the same order.

15 CS116 - Module 8 - Efficiency: Searching and Sorting Disassembling Selection Sort

Selection sort can be implemented as follows:

1 Begin with an empty list and an unsorted list S. 2 Find the smallest item in S, 3 ... remove it from S, 4 ... and add it at the end of D. 5 Repeat steps 2 – 4 until S is empty. 6 Move everything from D to S.

Implement selection sort using this framework. (It is possible to do this in 7 lines of code, maybe less.) Exercise

17 CS116 - Module 8 - Efficiency: Searching and Sorting In Place algorithms

The notes contain the following in place selection sort: def selection_sort(L): n = len(L) positions = list(range(n-1)) for i in positions: min_pos = i for j in range(i,n): if L[j] < L[min_pos]: min_pos = j temp = L[i] L[i] = L[min_pos] L[min_pos] = temp

18 CS116 - Module 8 - Efficiency: Searching and Sorting Mergesort

Mergesort is a Divide and Conquer algorithm. It works in the following manner:

1 Divide the list into two halves, using any method. 2 Sort the first half. 3 Sort the second half. 4 Merge the two sorted lists together to form a new sorted list.

Write a function merge(A, B) which consumes A and B, each of which is a sorted list, and returns the list containing all items from A and B, still in sorted order. Exercise What is the running time of merge(A, B)? O(len(A) + len(B)) = O(n) if the lengths are approximately equal.

Implement mergesort using this framework and your merge function. Note: a list of length 1 is already sorted – use this as the base case for recursion. Exercise Mergesort analysis

1 Divide the list into two halves, using any method. 2 Sort the first half. 3 Sort the second half. 4 Merge the two sorted lists together to form a new sorted list.

1 Divide the list into two halves, using any method. O(n) 2 Sort the first half. T (n/2) 3 Sort the second half. T (n/2) 4 Merge the two sorted lists together to form a new sorted list. O(n)

T (n) = O(n) + 2T (n/2) → O(n log n)

Draw the tree! 20 CS116 - Module 8 - Efficiency: Searching and Sorting Disassembling Insertion Sort

Insertion sort depends on insertion. We need to insert the new item in the list at the right place so it stays sorted.

Write a function insert_keep_sorted(D, item) which consumes a list D and an integer item, and mutates D so it contains item while remaining sorted. Exercise For example, D = [1, 2, 17]; insert_keep_sorted(D, 4), now D is [1, 2, 4, 7].

21 CS116 - Module 8 - Efficiency: Searching and Sorting Insertion Sort

Insertion sort can be implemented as follows:

1 Begin with an empty list D and an unsorted list S. 2 Find any item in S, 3 ... remove it from S, 4 ... and insert it into D, keeping D sorted. 5 Repeat steps 2 – 4 until S is empty. 6 Move everything from D to S.

Implement insertion sort using this framework and your insert_keep_sorted function. Ex.

In place insertion sort code is included in the notes. Appreciate how it works.

22 CS116 - Module 8 - Efficiency: Searching and Sorting Built in sorted and sort

Don’t implement your own for real work! Use the built-in functions.

23 CS116 - Module 8 - Efficiency: Searching and Sorting sorted sorted: (listof Any) -> (listof Any) Returns a sorted copy of the list. L = [2, 4, 6, 0, 1] M = sorted(L) #M is now [0, 1, 2, 4, 6].L is unchanged.

24 CS116 - Module 8 - Efficiency: Searching and Sorting list.sort list.sort: None -> None Mutates the list to be sorted. L = [2, 4, 6, 0, 1] M = L.sort() #M is now None.L is [0, 1, 2, 4, 6].

25 CS116 - Module 8 - Efficiency: Searching and Sorting Powerful extra arguments

The optional argument key allows us to change how the sorting works. This argument must be a function that maps an item in the list to an item that is sortable. M = [[1,2,3], [4], [5,7], [9,8,5,2]]

## sort by first value sorted(M) => [[1, 2, 3], [4], [5, 7], [9, 8, 5, 2]]

## sort by list length sorted(M, key=len) => [[4], [5, 7], [1, 2, 3], [9, 8, 5, 2]]

## sort by smallest value sorted(M, key=min) => [[1, 2, 3], [9, 8, 5, 2], [4], [5, 7]]

26 CS116 - Module 8 - Efficiency: Searching and Sorting Powerful extra arguments

L = ['Able', 'was', 'I', 'ere', 'I', 'saw', 'Elba']

## alphabetic sort: sorted(L) ['Able', 'Elba', 'I', 'I', 'ere', 'saw', 'was']

## sort by string length sorted(L, key=len) => ['I', 'I', 'was', 'ere', 'saw', 'Able', 'Elba']

## sort by last character sorted(L, key=lambda x: x[-1]) => ['I', 'I', 'Elba', 'Able', 'ere', 'was', 'saw']

## sort considering lowercase letters sorted(L, key=lambda x: x.lower()) ['Able', 'Elba', 'ere', 'I', 'I', 'saw', 'was']

27 CS116 - Module 8 - Efficiency: Searching and Sorting reverse argument

By setting the optional argument reverse to True, we sort as otherwise, but backwards: N = [2, 4, 6, 0, 1] sorted(N) => [0, 1, 2, 4, 6] sorted(N, reverse=True) => [6, 4, 2, 1, 0]

28 CS116 - Module 8 - Efficiency: Searching and Sorting Summary of Common Running Times

Order Common Name O(1) Constant O(log n) Logarithmic O(n) Linear O(n log n) Log Linear O(n2) Quadratic O(2n) Exponential

29 CS116 - Module 8 - Efficiency: Searching and Sorting Goals of Module 8

Understand how linear and binary search work. Be able to compare running times of algorithms for searching and sorting. Understand how insertion sort, selection sort, and merge sort work.

Before we begin the next module, please

Read Think Python, chapters 11, 15, 16, 17.

30 CS116 - Module 8 - Efficiency: Searching and Sorting