Sorting Textbook Section 5.3
Total Page:16
File Type:pdf, Size:1020Kb
CompSci 107 Sorting Textbook section 5.3 1 One of the most common activities on computers • Any collection of items can be sorted • Being sorted helps us work with the information • reference books, catalogues, filing systems • e.g. binary search algorithm • We need a comparison operator • Numbers are common • Unicode or ASCII values can be used to sort words • is ‘a’ (0x00061) less than or greater than ‘A’ (0x00041)? • what about spaces? (0x00020) 2 Why bother? • There is an excellent built-in sort function in Python • sorted • takes any iterable and returns a sorted list of the data • also sort methods for many types 3 • sorted(a) returns a new list - a is unchanged • a.sort() makes a sorted We bother because it gives us a greater understanding of how our programs work - in particular an idea of the amount of processing going on when we call such a function. It provides good examples of Big O values. And it is good for us :) ; it builds character. Also as Wikipedia says: “useful new algorithms are still being invented, with the now widely used Timsort dating to 2002, and the library sort being first published in 2006.” Timsort is used in Python. 4 It builds character Sorting 5 How do we sort? • Sort the towers of blocks, taking care to think about how you are doing it. Sorting 6 Simple but slow sorts • Bubble sort • Selection sort • Insertion sort • Sometimes we sort from smallest to largest, sometimes the other way (it is trivial to change the algorithms to work the other way) 7 Bubble Sort • We generally regard the Bubble sort as the worst of all sorts - but there are much worse ones e.g. bogosort http://en.wikipedia.org/ wiki/Bogosort • It is simple to understand and implement. • Go through the list of n values comparing adjacent values • swap them if the left hand one is greater • this way the largest value bubbles to the top (right ) • Repeat on n-1 list (the largest is now sorted) • eventually n is 1 and all elements are sorted 8 9 Bubble Sort Code def bubble_sort(a_list): for pass_num in range(len(a_list) - 1, 0, -1): for i in range(pass_num): if a_list[i] > a_list[i + 1]: a_list[i], a_list[i + 1] = a_list[i + 1], a_list[i] N.B. The use of multiple assignment 10 Big O • What order is bubble sort? • We go through the list n-1 times (for a list of size n) • comparisons • first n-1, then n-2, …. then 1 • so 1 + 2 + … + (n-2) + (n-1) = 1/2(n2 - n) • i.e. O(n2) • swaps are similar but only happen half the time on average 11 Speeding things up • Because we repeatedly go through the list only swapping values when pairs are out of order we can easily detect if the list is now sorted (i.e. hopefully sorted before all of the iterations have been carried out) • So we get the short bubble sort 12 Short Bubble Sort Code def short_bubble_sort(a_list): exchanges = True pass_num = len(a_list) - 1 while pass_num > 0 and exchanges: exchanges = False for i in range(pass_num): if a_list[i] > a_list[i + 1]: exchanges = True a_list[i], a_list[i + 1] = a_list[i + 1], a_list[i] pass_num = pass_num - 1 # see bubble.py 13 Don’t do the swaps • Even though the Big O analysis is going to be the same (because the number of comparisons is the same) we can speed up the sort by not swapping elements unnecessarily. • This gives us the selection sort • We still go through the list n-1 times • each time we look for the next largest value • we insert the value at the end of the reduced list in its proper location 14 15 Selection Sort Code def selection_sort(a_list): for fill_slot in range(len(a_list) - 1, 0, -1): pos_of_max = 0 for location in range(1, fill_slot + 1): if a_list[location] > a_list[pos_of_max]: pos_of_max = location a_list[fill_slot], a_list[pos_of_max] = a_list[pos_of_max], a_list[fill_slot] 16 Do less comparisons • If we change the Original 17 32 95 4 18 72 sorted part of the list to pass 1 17 32 72 4 18 95 be a sorted subset rather than the sorted pass 3 17 4 18 32 72 95 greatest values we can reduce the number of Original 17 32 95 4 18 72 comparisons we make. pass 1 17 32 95 4 18 72 • We insert into the pass 3 4 17 32 95 18 72 sorted sublist stopping in the right place. 17 Insertion Sort • This gives us the insertion sort • The first element is a sorted list of one. • We add one element at a time into its place in the growing sorted list. 18 19 Insertion Sort Code def insertion_sort(a_list): for index in range(1, len(a_list)): current_value = a_list[index] position = index while position > 0 and a_list[position - 1] > current_value: a_list[position] = a_list[position - 1] position -= 1 a_list[position] = current_value 20 Big 0 • Insertion sort does half as many comparisons on average than selection sort. • But it does more moves. • Still O(n2) • Better with almost sorted lists. • Selection is better if writes are more costly than reads. 21 Variations on Insertion • For small lists insertion and selection are very fast because they are so simple. They are no good for large lists. n Bubble Insertion 10,000 160ms 14ms 100,000 16,000 1,000 200,000 65,000 3,900 1,000,000 98,000 22.