CompSci 107 Sorting Textbook section 5.3

1 One of the most common activities on computers

• Any collection of items can be sorted

• Being sorted helps us work with the information

• reference books, catalogues, filing systems

• e.g.

• We need a comparison operator

• Numbers are common

• Unicode or ASCII values can be used to sort words

• is ‘a’ (0x00061) less than or greater than ‘A’ (0x00041)?

• what about spaces? (0x00020)

2 Why bother?

• There is an excellent built-in sort function in Python

• sorted

• takes any iterable and returns a sorted list of the data

• also sort methods for many types

3 • sorted(a) returns a new list - a is unchanged • a.sort() makes a sorted

We bother because it gives us a greater understanding of how our programs work - in particular an idea of the amount of processing going on when we call such a function. It provides good examples of Big O values. And it is good for us :) ; it builds character. Also as Wikipedia says: “useful new algorithms are still being invented, with the now widely used dating to 2002, and the library sort being first published in 2006.” Timsort is used in Python. 4 It builds character

Sorting 5 How do we sort?

• Sort the towers of blocks, taking care to think about how you are doing it.

Sorting 6 Simple but slow sorts

• Sometimes we sort from smallest to largest, sometimes the other way (it is trivial to change the algorithms to work the other way)

7 Bubble Sort

• We generally regard the Bubble sort as the worst of all sorts - but there are much worse ones e.g. http://en.wikipedia.org/ wiki/Bogosort

• It is simple to understand and implement.

• Go through the list of n values comparing adjacent values

• swap them if the left hand one is greater

• this way the largest value bubbles to the top (right )

• Repeat on n-1 list (the largest is now sorted)

• eventually n is 1 and all elements are sorted

8 9 Bubble Sort Code

def bubble_sort(a_list): for pass_num in range(len(a_list) - 1, 0, -1): for i in range(pass_num): if a_list[i] > a_list[i + 1]: a_list[i], a_list[i + 1] = a_list[i + 1], a_list[i]

N.B. The use of multiple assignment

10 Big O

• What order is bubble sort?

• We go through the list n-1 times (for a list of size n)

• comparisons

• first n-1, then n-2, …. then 1

• so 1 + 2 + … + (n-2) + (n-1) = 1/2(n2 - n)

• i.e. O(n2)

• swaps are similar but only happen half the time on average

11 Speeding things up

• Because we repeatedly go through the list only swapping values when pairs are out of order we can easily detect if the list is now sorted (i.e. hopefully sorted before all of the iterations have been carried out)

• So we get the short bubble sort

12 Short Bubble Sort Code

def short_bubble_sort(a_list): exchanges = True pass_num = len(a_list) - 1 while pass_num > 0 and exchanges: exchanges = False for i in range(pass_num): if a_list[i] > a_list[i + 1]: exchanges = True a_list[i], a_list[i + 1] = a_list[i + 1], a_list[i] pass_num = pass_num - 1

# see bubble.py

13 Don’t do the swaps

• Even though the Big O analysis is going to be the same (because the number of comparisons is the same) we can speed up the sort by not swapping elements unnecessarily.

• This gives us the selection sort

• We still go through the list n-1 times

• each time we look for the next largest value

• we insert the value at the end of the reduced list in its proper location

14 15 Selection Sort Code

def selection_sort(a_list): for fill_slot in range(len(a_list) - 1, 0, -1): pos_of_max = 0 for location in range(1, fill_slot + 1): if a_list[location] > a_list[pos_of_max]: pos_of_max = location a_list[fill_slot], a_list[pos_of_max] = a_list[pos_of_max], a_list[fill_slot]

16 Do less comparisons

• If we change the Original 17 32 95 4 18 72 sorted part of the list to pass 1 17 32 72 4 18 95 be a sorted subset rather than the sorted pass 3 17 4 18 32 72 95 greatest values we can reduce the number of Original 17 32 95 4 18 72 comparisons we make. pass 1 17 32 95 4 18 72

• We insert into the pass 3 4 17 32 95 18 72 sorted sublist stopping in the right place.

17 Insertion Sort

• This gives us the insertion sort

• The first element is a sorted list of one.

• We add one element at a time into its place in the growing sorted list.

18 19 Insertion Sort Code

def insertion_sort(a_list): for index in range(1, len(a_list)): current_value = a_list[index] position = index while position > 0 and a_list[position - 1] > current_value: a_list[position] = a_list[position - 1] position -= 1 a_list[position] = current_value

20 Big 0

• Insertion sort does half as many comparisons on average than selection sort.

• But it does more moves.

• Still O(n2)

• Better with almost sorted lists.

• Selection is better if writes are more costly than reads.

21 Variations on Insertion

• For small lists insertion and selection are very fast because they are so simple. They are no good for large lists.

n Bubble Insertion 10,000 160ms 14ms 100,000 16,000 1,000 200,000 65,000 3,900 1,000,000 98,000

22