<<

CSCI S-111 Section Notes

Unit 7, Section 4 1.

The quicksort uses a recursive "divide-and-conquer" approach to achieve a much better average-case than the O(n2) we've seen thus far. The elements to be sorted are partitioned into two subarrays such that all the elements in the "left" subarray are less than or equal to all the elements in the "right" subarray. The subarrays themselves are then recursively partitioned, until we get down to subarrays containing just a single element (which can't be further partitioned). Once all the recursive invocations reach the base case of a single-element subarray, the entire array is sorted.

Recall that partitioning is accomplished by choosing a pivot value, and repeatedly swapping elements such that the left subarray contains only values that are <= the pivot, and the right subarray contains only values that are >= the pivot. Let's trace through quicksort on the following array, assuming that we're using the middle element as the pivot value:

pivot = swaps: ------| 7 | 39 | 20 | 11 | 16 | 5 | ------/ \ / \ Recursively partition Recursively partition left subarray: right subarray: pivot = pivot = swaps: swaps: ------| | | | | | | | ------/ \ / \ / \ / \ left subarray: right subarray: left subarray: right subarray: base case reached! pivot = base case reached! base case reached! ------swaps: ------| | ------| | | | ------| | | | ------/ \ / \ left subarray: right subarray: pivot = base case reached! swaps: ------| | | | | ------/ \ / \ left subarray: right subarray: base case reached! base case reached! ------| | | | ------

1

What is the time complexity of quicksort in the best case? In the worst case? In the average case?

How would you characterize the performance of quicksort in the example we just stepped through? Was it an example of best-case, worst-case, or average-case performance? Why?

Optional: How many calls to the partition() method does quicksort perform as a of the input size n? Note that this will not be the same as the overall time complexity for the algorithm, which is based on the number of comparisons and moves rather than partition() calls.

2

2. Mergesort

Mergesort uses the same overall recursive "divide-and-conquer" strategy as quicksort, but whereas quicksort does all the work of sorting the array in the process of dividing (ie., partitioning) it, mergesort performs no sorting during the division phase of the algorithm, and instead does all the work of sorting the array in the process of re-combining (ie., merging) the subarrays formed during the division phase.

Let's trace through mergesort on the following array, paying particular attention to the order in which the recursive calls happen (they do not happen in parallel, despite what this diagram implies):

------| 7 | 39 | 20 | 11 | 16 | 5 | 9 | 28 | ------split / \ ------| | | | | | | | | | ------split split / \ / \ ------| | | | | | | | | | | | ------split split split split / \ / \ / \ / \ ------| | | | | | | | | | | | | | | | ------\ / \ / \ / \ / merge merge merge merge ------| | | | | | | | | | | | ------\ / \ / merge merge ------| | | | | | | | | | ------\ / merge ------| | | | | | | | | ------What major advantage does mergesort have over quicksort with respect to time complexity?

What major disadvantage does mergesort have compared to quicksort with respect to ?

3

3. Deriving an expression for runtime from experimental data

In Problem 2, we will ask you to take some experimental data and infer the big-O time complexity from the experimental data. Here is an example of this process.

The main method for the SortCount class we've given you runs the various algorithms on either a random or almost- and returns the number of comparisons and moves involved in each algorithm.

Here is sample output from SortCount for one of the sorting algorithms on random arrays: n comparisons from three runs moves from three runs ------100 4950, 4950, 4950 297, 297, 297 200 19900, 19900, 19900 597, 597, 597 800 319600, 319600, 319600 2397, 2397, 2397

What can we say about the number of comparisons and moves for different inputs?

What can we infer about the comparisons and moves needed by this algorithm?

Given these observations, what do we think this is?

Does it make sense that the number of comparisons and moves is the same regardless of the sortedness of the input?

4

However, this is not the case for all of our sorting algorithms. What are some examples of sorting algorithms whose comparisons vary with the input?

Here is some sample output from InsertionSort: n comparisons ------100 2926, 2483, 2753 (avg = 2721) 200 10815, 10098, 10239 (avg = 10384)

What do we notice about the number of comparisons?

5

4.

Radix sort is a stable, distributive sorting algorithm that works by processing the individual digits of each element according to their significant position. The version we worked with in lecture begins by evaluating the least-significant digit of each element in the array (i.., the rightmost digit of each element), and moves from left to right over the array of elements at each stage, depositing each value into a 'bucket' according to the value of its least-significant digit. It then repeats this process for each successive digit, stopping once it has evaluated each element according to the most-significant position of the largest element in the array.

We can break down radix sort into the following procedure: − Start at the beginning of our array and iterate over each element − For each element, place it into a “bucket” according to the value of its least- significant digit, but otherwise maintain the order of the elements (this is achieved by moving from left to right over the array) − When you reach the end of the array, repeat the process of the next most-significant digit. − Stop when you've evaluated all the elements according to the most-significant position of the largest element in the array.

An example:

Original Unsorted Array, n = 12:

41 326 18 1 117 56 86 7 14 221 19 30

1st Pass: 'Buckets' for the 1's digit: Sig. Digit 0 1 2 3 4 5 6 7 8 9 Elements

2nd Pass: 'Buckets' for the 10's digit: Sig. Digit 0 1 2 3 4 5 6 7 8 9

Elements

3rd Pass: 'Buckets' for the 100's digit: Sig. Digit 0 1 2 3 4 5 6 7 8 9 Elements

6

Questions on radix sort:

Thinking about the example above and keeping in mind that radix sort processes its data as a sequence of m quantities with k possible values, what do m and k represent in the example above?

How many operations did our example above require? How many operations would the example above have required if the elements were already in sorted order? If they were in reverse order?

Which sorting method would have been more efficient for sorting the above array: radix sort, or ?

7

5. The removeDups() problem from Problem Set 2

This problem asks you to remove duplicates from an already sorted array using an algorithm that requires O(n) steps. To do this, each element can move at most once.

For example, the problem set says that if we are given the array:

2 5 5 5 10 12 12 we need to change it to look like this:

2 5 10 12 0 0 0

To get a O(n) algorithm, the 10 and 12 should be moved only once.

This problem is somewhat like , in that you want to consider the elements from left to right and potentially "insert" each element arr[i] somewhere in the subarray that goes from arr[0] to arr[i].

However, the problem is different from insertion sort in that you don't need to perform a backwards pass in which you figure out where an element should go while shifting other elements to the right.

Instead, you should be able to use an index to keep track of where the next "insertion" (if any) should occur. Also, the "insertions" are really just moves, in which an element arr[i] is copied into a position originally occupied by another element, without sliding other elements over.

Let's consider how we would process the example from the problem set:

2 5 5 5 10 12 12

We consider the elements of the array from left to right, beginning with element 1 (the first 5).

- element 1 (the first 5): does it need to move? [no, because there are no duplicates to its left] - element 2 (the second 5): does it need to move? [no, because it's a duplicate] - element 3 (the third 5): does it need to move? [no, because it's a dup] - element 4 (the 10): does it need to move? o its left] - element 5 (the first 12): does it need to move? [yes, because it's not a dup, and there are dups to its left]

8

- element 6 (the second 12): does it need to move? [no, because it's a dup] - we've reached the end of the array - we conclude by filling the elements that are now unused with 0s

Here's another example:

5 8 10 15 15 18 23 30 30 35

- element 1 (the 8): does it need to move? - element 2 (the 10): does it need to move? - element 3 (the first 15): does it need to move? - element 4 (the second 15): does it need to move? - element 5 (the 18): does it need to move? - element 6 (the 23): does it need to move? - element 7 (the first 30): does it need to move? - element 8 (the second 30): does it need to move? - element 9 (the 35): does it need to move? - we've reached the end of the array - we conclude by filling the elements that are now unused with 0s

Given these examples, here are some questions to ask yourself as you design your implementation of this method:

1) What tests do you need to perform to determine whether an element should be moved? 2) How will you keep track of the position where the next element to be moved should go? 3) When should this position be updated?

9