Sorting

Chapter 8

Click to proceed How To View This Presentation

This presentation is arranged in outline format. To view the slides in proper order • For each slide – Read the entire slide’s text – Then click the links (if any) in the text, in the order they appear in the text – Then click the or button at the bottom- right side of the slide – Do not use the space bar to advance

2 Overview • Sorting is the process of ordering a set of items • Like data structures, sorting algorithms are evaluated based on – Their speed – The amount of memory overhead they require – The programming effort required to implement them • Classic sorting algorithms include the Binary Tree sort, , Heap sort, and Quick sort • Each of the above sorts has its merits

3 Ordering Items • Most common orderings are ascending and descending order • In the context of data structures – Nodes are sorted based on the contents of a field • To produce sorted output listings • To improve the speed of their data structure – E.g., and a sorted array fetch are O(log2n) – But sorting takes time, therefore we need • Fast algorithms O(log2n) • Good strategies – Store sorted (mostly fetch operations) – Sort before output (many inserts, deletes, and updates) 4 Speed

• All sorting algorithms repeatedly – Compare two items – Swap them if they are out of order • Therefore the speed of a sorting algorithm is dependent on the – Number of comparisons it makes – Number of data swaps it makes • When sorting nodes, shallow swaps save time

5 Number of Comparisons

• Sort effort, SE, is used as a metric – Defined as: The number of comparisons performed to sort n items – There is a theoretical minimum: SEmin = nlog2n – SE for some algorithms approaches SEmin • Contribution to algorithm speed is SE * (f * tf + tc) – f: fetches per comparison; tf: time to fetch an item; tc: time per comparison – To determine SE, the algorithm is analyzed • Even if SE = SEmin sorting large data sets is time consuming 6 Determination of Sort Effort

• Look for the lines comparing the items to be sorted • Determine how many times they execute

• For example; // n items to be sorted – Line 3 compares items 1. for(int i = 1; i <= n; i++) – It executes n times as part 2. { for(int j = 1; j <=n; j++) of the inner loop 3. { if( items[j] > items[i] ) – The inner loop executes 4. { temp = items[j]; n times as part of the outer 5. items[j] = items[i]; loop 6. items[i] = temp; 7. } // end of if statement – Therefore, Line 3 executes 8. } // end of inner loop n * n times 9. } // end of outer loop – Sort Effort = n ^ 2 7 Time to Sort Large Data Sets

Minimum Time to Sort n Items (no swaps, 1 fetch per comparison, and SE = SEmin) (assumes 41 nanoseconds for a data fetch and comparison) 8.00 6.00 4.00

in minutes 2.00 0.00 Minimumsorting time 0 40 80 120 160 200 240 280 320 360 400 Number of items to be sorted , n (in millions) • Even with no swaps and SE = SEmin, sorting 300 million social security records takes 5.5 minutes • For the assumed fetch and comparison time of 41 nanoseconds, an n ^ 2 algorithm would take

11.4 hours 8 Determining the Number of Swaps • Look for the lines that swap items to be sorted • Determine how many times they execute

• For example; // n items to be sorted – Lines 4-6 swap items 1. for(int i = 1; i <= n; i++) – If ½ of the comparisons result 2. { for(int j = 1; j <=n; j++) in swaps, they execute n/2 3. { if( items[j] > items[i] ) times as part of the inner loop 4. { temp = items[j]; – The inner loop executes 5. items[j] = items[i]; n times as part of the outer 6. items[i] = temp; loop 7. } // end of if statement 8. } // end of inner loop – Therefore, Lines 4-6 executes ½ n * n times 9. } // end of outer loop – Number of swaps = ½ n ^ 2 9 Shallow Swaps • Two ways to swap nodes – Deep swap: deep copies the nodes temp = items[j].deepCopy; // copy all node fields items[j] = items[i].deepCopy; items[i] = temp; – Shallow swap: shallow copies the nodes temp = items[j]; // copy 1 reference variable items[j] = items[i]; items[i] = temp; • Shallow swap is faster because a deep copy copies all node fields Graphical Depiction 10 Graphical Depiction of Shallow and Deep Swaps

object A object A items[0] node[0]

items[1] object C node[1] object C items[2] node[2] object B object B : : items[n-1] object X node[n-1] object X

Time Consuming Deep Swap Rapid Shallow Swap

temp = items[1].deepCopy; temp = items[1] ; items[1] = items[2].deepCopy; items[1] = items[2]; items[2] = temp; items[2] = temp;

11 Sorting Algorithm Overhead

• Can be minimal: 4 bytes – The variable temp used in the swap

• Can be significant: 8n bytes – Two reference variable per item sorted

• The amount required depends on the sorting algorithm

12 Binary Tree Sort

• Items to be sorted are inserted into a Binary Search tree – An LNR traversal visits the items in ascending order – An RNL traversal visits the items in decending order • Its algorithm is basically the Binary Search Tree’s Insert algorithm • Its sort effort can approach the minimum or be O(n^2) • It is one of the few algorithms not to perform swaps • Its overhead is high

Performance Summary 13 Binary Tree Sort Performance Summary

memory speed overhead algorithm range Effort and Swaps range bytes Comments

fast/slow Fast for random data, slow for Binary 2 O(log2n)/ (n+1) [ log2(n+1) – 2 ] < SE <= n /2 - n/2 high 8n already sorted data. Tree O(n2) Highest Overhead

14 Binary Tree Sort Algorithm

1. The first item becomes the root node. 2. For any subsequent item, consider the root node to be a root of a subtree, and start at the root of this subtree. 3. Compare the item to the root of the subtree. 3.1.1 If the new item is less than the subtree’s root, then the new subtree is the root’s left subtree 3.1.2 Else the new subtree is the root’s right subtree. 4. Repeat step 3 until the new subtree is empty. Position the new item as the root of this empty subtree. 5. Repeat steps 2, 3 and 4 for each item to be sorted

An LNR traversal Outputs the items in ascending order

Graphical Representation 15 Binary Tree Sort Algorithm Graphical Representation 50 50 50 50

40 40 40

47 47

50 50 50 63 40 63 40 63 40 63

47 55 47 55 47 55

43 43

50

40 63

70, 80, 35, 68 35 47 55 70

43 68 80 16 Sort Effort of the Binary Tree Sort

• Low when the items to be sorted are in 5 random order (or close to randomized) 3 7 2 4 6 9 – The tree is balanced

– Zero comparisons for first item; < log2(n+1) comparisons per item for all subsequent items; O(nlog2n)

• High when the items to be sorted are already 2 sorted (or closed to sorted) 4 7

– The tree is highly skewed 9 – Zero comparisons for the first item, 1 for the second, 2 for the third, … n-1 for the nth; O(n^2)

17 Overhead Of the Binary Tree Sort

• Two reference variables per item (4 bytes each) are required to point to the left and right child Overhead = 2 variables * 4 bytes * n items = 8n bytes • The array-based representation of a binary tree is not much better – If the items were randomized: 4n bytes – But, a sorted set would require: 2^n bytes

18 Bubble Sort

• Its algorithm is very simple to code • For most data sets – Its sort effort is high – It performs a high number of swaps O(n^2) • For some data sets (almost sorted) – It is the best performing algorithm • Its overhead is very low – It only requires two variables of overhead • temp for the swaps • A Boolean variable to detect an early completion

Performance Summary 19 Bubble Sort Performance Summary

memory speed overhead algorithm range Effort and Swaps range bytes Comments

fast/slow Fast for random data, slow 2 Binary O(nlog2n)/ (n+1) [ log2(n+1) – 2 ] < SE <= n /2 - n/2 high 8n for already sorted data. O(n2) Highest Overhead

Fast for data almost sorted in very fast/ n-1 < SE <= n2/2 – n/2 ascending order. Slow for slow Bubble lowest 4 most data sets. Low O(n)/O(n2) 0 < Number of swaps < SE / 2 overhead. Easy to code

20 Bubble Sort Algorithm • The algorithm can make up to n-1 passes – Each pass places at least one item in its sorted position • The smallest item on pass 1, the 2nd smallest item on pass 2, the 3rd smallest item on pass 3, … • Small items bubble up to the top of the array – Each pass compares each item to the item above it • When not in their proper relative position, they are swapped (flipped) • The algorithm can end early if a flip is not made during a pass

Pseudocode 21 Bubble Sort Algorithm, Graphical Representation

initial End of Pass order 1 2 3 4 8 1 1 1 1 2 8 2 2 2 6 2 8 3 3 10 6 3 8 6 3 10 6 6 8 1 3 10 9 9 9 9 9 10 10 items Placed in Their Correct Sorted Position at the End of Each Pass

Pass 1 Pass 3 Pass 2 Pass 4 Initial order 8 8 8 8 8 8 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 8 8 8 8 8 2 2 2 2 2 2 2 2 6 6 6 6 1 2 2 2 2 2 2 8 8 8 8 3 3 3 3 10 10 10 1 6 6 6 6 6 3 3 3 3 3 3 8 8 8 6 3 3 1 10 10 10 10 10 3 6 6 6 6 6 6 6 6 6 8 1 1 3 3 3 3 3 3 10 10 10 10 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 Items Compared and Swaps Made During A Pass 22 Bubble Sort Algorithm Pseudocode (to sort n items) 1. itemsSorted = 0; 2. do 3. { flip = false; // begin a pass 4. for ( b = n - 1, t = n - 2; t >= itemsSorted; b--, t--) 5. if(items[b] < items[t]) // two adjacent elements are // not in sorted order 6. { // switch the two elements 7. flip =true; 8. } 9. itemsSorted++; // one more item is in its final spot 10.} while (flip == true && itemsSorted != n - 1);

23 Sort Effort of the Bubble Sort

Pass Number of • When the items are already number Comparisons 1 6 (= n-1) sorted, one pass 2 5 (= n-2) 3 4 (= n-3) 4 3 (= n-4) SE = (n-1); O(n) 5 2 (= n-5) 6 1 (= n-6) • When items are random, Comparisons Per Pass n-1 passes are made When sorting 7 items SE = (n - 1) + (n - 2) + (n - 3) + … + 1 = (n – 1) * (n – 1 + 1) / 2 = (n – 1) * n/2; O(n^2)

24 Swaps Performed by the Bubble Sort

• On the average half the comparisons will require swaps – Therefore, Number of swaps NS = SE/2 • When the items are already sorted, one pass SE = (n-1); NS = (n – 1) / 2; O(n) • When items are random, n-1 passes are made SE = (n – 1) * n/2; NS = ((n – 1) * n/2)/2; O(n^2) 25 Heap Sort

• A heap is used in this sort, thus the name Heap Sort • The heap sort algorithm is simple to code once we understand a heap • Its overall performance is very good for all data sets – Its sort effort approaches the minimum for all data sets: O(nlog2n) – It performs less swaps than the bubble sort: O(nlog2n) – Its overhead is very low requiring only a temp variable for its swaps Performance Summary 26 Heap • A binary tree in which (the value of) each parent is greater than both of its children

54 7 65

40 20 6 21

32 3 14

4 30 1

7 99 80

5 50 35 19 77

1 32 6 8 17 3 25 1

2 7

27 Properties of Heaps

• From the definition of a heap we 80 can conclude 19 77 – The largest node in the heap is 17 3 25 10 the root node 2 7 1 • Because the root node of the heap is larger than its children, who are larger that their children, etc. – All subtrees in a heap are themselves heaps • Because the root node of any subtree is larger than its children, who are larger than their children, etc.

28 Heap Sort Algorithm • This is a three step algorithm – Step 1 is simple – Steps 2 and 3 use a sub-algorithm called reheap downward

1. Place all the items to be sorted in a left balanced binary tree. 2. Build the initial heap (i.e. re-position the items in the tree to make it a heap using the reheap downward algorithm 3. Repeatedly swap and rebuild: 3.1.1 Swap the root item into its sorted position 3.1.2 Rebuild the remaining items into a heap using the reheap downward algorithm

29 Step 1 of the Heap Store Algorithm Place the items to be sorted in a left balanced binary tree

• Do nothing… – Simply “view” the array to be sorted using the 2i+1, 2i+2 rule during the remainder of the sort – For a fully populated array, this always yields a left balanced binary tee

items[0] 25 node[0] becomes the root node items[1] 17 25 items[2] 36 items[3] 2 2i+1 17 36 items[4] 3 “2i+1, 2i+2” rule , 2 3 100 1 items[5] 100 locates the other 2i+2 items[6] 1 elements in the 19 7 tree. items[7] 19 items[8] 7 30 Step 2 of the Heap Sort Build the Initial Heap 25 • For every parent beginning with the 17 36 highest-level-right-most-parent 2 3 100 1 19 7

– Continually swap it with its largest child, 25 until it is larger than both of its children 17 36 or it becomes a leaf 19 3 100 1 2 7 • To locate the highest-level-right-most- parent’s index, ip, in an n element array ip = floor[(n / 2 - 1)] e.g. for n = 9, ip = n / 2 -1 = 3

Complete heap build 31 Building a Heap

25 25 25

17 36 Parent 36 17 36 Parent 17 17 100

Parent 2 2 3 100 1 19 3 100 1 19 3 36 1

19 7 2 7 2 7

25 Parent 25 25 100

19 100 19 100 19 25

17 3 36 1 17 3 36 1 17 3 36 1

2 7 2 7 2 7 index [0] 25 25 25 25 25 100 100 100 compared [1] 17 17 17 19 19 19 19 19 36 swapped [2] 36 36 100 100 100 25 36 Initial [3] 2 19 19 17 17 17 17 both 17 3 25 1 Heap [4] 3 3 3 3 3 3 3 [5] 100 100 36 36 36 36 25 2 7 [6] 1 1 1 1 1 1 1 [7] 19 2 2 2 2 2 2 32 [8] 7 7 7 7 7 7 7 Reheap Downward Algorithm • Builds a heap from a tree, or subtree, in which the only item preventing it from being a heap is the root item.

1. P = the root node of a tree whose subtrees are already heaps 2. if (P has no children) return; 3. if (P > both children) return; 4. Swap P with its greatest child; 5. Repeat steps 2, 3 and 4 for the subtree that P is now the root of.

• Applied to each parent to build the initial heap 33 Step 3 of the Heap Sort Repeatedly Swap and Rebuild • Swap the root item into its sorted position 100 19 36

– From the properties of a heap, the root is the 17 3 25 1

largest item 2 7 – Its “proper position” therefore is at the “end” of the array – The end of the array is just above those items not yet repositioned 7

• Rebuild the “remaining items” into a heap 19 36 – The “remaining items” are those items not 17 3 25 1 yet in their “proper position” 2 100 36 – We can uses the reheap downward algorithm 19 7 17 3 25 1

to rebuild the heap 2 100 • Considering only remaining items, the root is the only item preventing the tree from being a heap Repeat34 100 7 7

19 36 19 36 19 36

17 3 25 1 17 3 25 1 17 3 25 1 2 7 2 a 2 100 b 100 c

36 36 2

19 7 19 25 19 25

17 3 25 1 17 3 7 1 17 3 7 1 2 100 2 100 36 100 d e, end of pass 1 f 2 25 25 19 25 19 2 19 7 …… 17 3 7 1 17 3 7 1 17 3 2 1 36 100 36 100 36 100 g h i, end of pass 2

index Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 compared [0] 100 7 7 36 36 2 2 25 25 1 19 19 2 2 17 17 2 2 7 [1] 19 19 19 19 19 19 19 19 19 19 1 17 17 17 2 3 3 3 3 swapped [2] 36 36 36 7 25 25 25 2 7 7 7 7 7 7 7 7 7 7 2 [3] 17 17 17 17 17 17 17 17 17 17 17 1 1 1 1 1 1 1 1 …… both [4] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 17 17 17 [5] 25 25 25 25 7 7 7 7 2 2 2 2 19 19 19 19 19 19 19 [6] 1 1 1 1 1 1 1 1 1 25 25 25 25 25 25 25 25 25 25 [7] 2 2 2 2 2 36 36 36 36 36 36 36 36 36 36 36 36 36 36 [8] 7 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 35 a b c d e f g h i Sort Effort of the Heap Sort

• Step 1, Build the left balanced binary tree – Nothing is done; SE1 = 0 • Step 2, Build the initial heap – < n/2 parent items descend 2/3 of the way down < floor(log2n) levels making two comparisons a level – SE2 < n/2 * (2/3) * floor(log2n) * 2 = (2/3)n (floor(log2n)) • Step 3, Repeatedly swap and rebuild – < n items descend < floor(log2n) levels making two comparisons a level – SE3 < n * floor(log2n) * 2 = 2n(floor(log2n)) • SE < 0 + (2/3)n (floor(log2n)) + 2n(floor(log2n)): O(nlog2n) 36 Heap Sort Swaps

• When building or rebuilding the heap, two- thirds of the time a parent is less than one of its children • Therefore, two-thirds of the comparison pairs result in a swap • There are < ½ * 3nlog2n comparison pairs • Swaps < 2/3 * ½ * 3nlog2n = nlog2n

37 Heap Sort Performance Summary

memory speed overhead algorithm range Effort and Swaps range bytes Comments

fast /slow Fast for random data, slow 2 Binary O(nlog2n)/ (n+1) [ log2(n+1) – 2 ] < SE <= n /2 - n/2 high 8n for already sorted data. O(n2) Highest Overhead

Fast for data almost sorted very fast/ n-1 < SE <= n2/2 – n/2 in ascending order. Slow slow Bubble 0 < Number of swaps < SE / 2 lowest 4 for most data sets. Low O(n)/O(n2) overhead. Easy to code SE < 3nlog (n) Fast for all data sets. Heap fast 2 lowest 4 O(log2n) Number of swaps = 1/3 * SE3 Low overhead.

38 Merge Sort

• This sort uses a merge process – Items from two sorted sub-arrays containing n/2 items are merged (integrated) into one sorted array of n items • The merge sort algorithm recursively applies the merge process to generate the sorted sub-arrays – Its implementation includes a merge process method • Its sort effort approaches the minimum O(nlog2n) • However, it performs a high number of swaps, and its overhead not low

Performance Summary 39 The Merge Process

1. PA and PB are set pointing to the first items of the two sub- arrays, A and B 2. The smaller item is copied into another array T 3. The pointer of the array the item was in is incremented 4. Steps 2 and 3 are repeated until one sub-array is empty 5. The remaining items from the other sub-array are copied into T

3 21 30 50 85 93 99 107 Array T

Array A 3 21 50 93 30 85 99 107 Array B

Order of 1 2 4 6 3 5 7 8 transfer into T 40 Merge Sort Algorithm Graphical Representation The sub-arrays always initially contain 1 item each

3 21 30 50 85 93 99 107

17 24 The circled numbers indicate the order the 3 21 50 93 30 85 99 107 items are moved 5 8 13 16

3 93 21 50 85 107 99 30

Original data set 93 3 50 21 85 107 30 99

2 1 4 3 9 10 11 12

41 Merge Sort Recursive Implementation • Base Case – One item to be sorted? Do nothing • Reduced problem – Sort half of the array • General Solution – Sort the left half of the array – Sort the right half of the array – Merge the two halves Pseudocode 42 Merge Sort Recursive Pseudocode mergeSort(items, temp, leftIndex, rightIndex)

items is the array to be sorted; leftIndex and rightIndex are the bounds of the sub-array, initially 0 and n-1 1. nitems = rightIndex – leftIndex + 1; 2. if(nitems == 1) // base case, 1 item to be sorted 3. return; 4. middleIndex = (rightIndex – leftIndex) / 2; 5. mergeSort(items, temp, leftIndex, middleIndex); // Sort the left sublist 6. mergeSort(items, temp, middleIndex+1 rightIndex); // right sublist 7. merge(items, temp, leftIndex, middleIndex, rightIndex); // merge sublists 8. return;

merge is a method that performs the merge process on two sub-arrays of the array items using the array temp 43 Merge Process Method

1. public static void merge (int items[], int temp[], int leftIndex, int midIndex, 2. int rightIndex) 3. { int leftEnd, nItems, tempsIndex; 4. leftEnd = midIndex-1; tempsIndex = leftIndex; nItems = rightIndex-leftIndex+1; 5. while ((leftIndex <= leftEnd) && (midIndex <= rightIndex)) 6. { // move items into temp from left and right sub-arrays 7. // incrementing leftIndex or middleIndex as an item is moved 8. } 9. if (items[leftIndex] <= items[midIndex]) 10. { // move remaining items from left sub-array into temp 11. } 12. else 13. { // move remaining items from right sub-array into temp 14. } 15. // copy nitems from the array temp into the array items 16.}

44 Sort Effort of the Merge Sort

• Sublists grow by a factor of 2 each pass – Number of passes = log2n • Number of comparisons per pass (two cases) 1. All items in one sub-array is less than the minimum item in other sub-array: n/2 comparisons 2. Items moved into temp alternate between the two sub- arrays: n-1 comparisons – Average: = (n/2 + (n-1)) / 2 ≈ 3n/4 = 0.75n • SE = comparisons per pass * passes ≈ 0.75nlog2n

45 Merge Sort Swaps

• On every pass all n items are swapped twice – Into the array temp from the array items – Back into the array items from the array temp – Therefore, 2n swaps per pass • The algorithm makes log2n passes to sort n items • Therefore swaps = 2nlog2n = 2.67*SE

46 Overhead of the Merge Sort

• Requires an n element array, temp, to move the sub-lists into

• At 4 bytes per element, temp represents 4n bytes of overhead

47 Performance of the Merge Sort

memory speed overhead algorithm range Effort and Swaps range bytes Comments

fast/slow Fast for random data, slow 2 Binary O(log2n)/ (n+1) [ log2(n+1) – 2 ] < SE <= n /2 - n/2 high 8n for already sorted data. O(n2) Highest Overhead

Fast for data almost sorted in very fast/ n-1 < SE <= n2/2 – n/2 ascending order. Slow slow Bubble 0< Number of swaps < SE/2 lowest 4 for most data sets. O(n)/O(n2) Low overhead. Easy to code

SE < 3nlog (n) Fast for all data sets. Heap fast 2 lowest 4 O(nlog2n) Number of swaps = 2/3 SE Low overhead.

SE ≈ (0.75n)log n Fast for all data sets. Merge fast 2 moderate 4n O(nlog2n) Number of swaps = 2.67SE Moderate overhead

48

• Like the Merge sort – It divides the array into sub-arrays (here called partitions, separated by a “pivot value”) – It is normally represented recursively, and its implementation is simple (15 Java statements) – Its sort effort is normally fast: O(nlog2n) • In addition – It normally performs a low number of swaps – Its overhead is normally low • Some unusual data sets degrade its performance – They are detectable and can be treated as a special case Performance Summary 49 Quicksort Partitions • A partitioning process generates two partitions. It: – Places one item, a pivot value, in its proper sorted spot – Positions all items < the pivot value below it in the array – Positions all items > the pivot value above it in the array

Array after partitioning 21 3 30 99 85 50 93 107 40 process Pivot value

Array before partitioning 93 3 85 99 30 50 21 107 40 process 50 The Quicksort Partitioning Process Algorithm

1. Select the middle item in the array to be a “pivot value” 2. Position pointers i and j at the left and right ends of the array 3. Until i and j cross 3.1 Move i right until it is at an item not less than the pivot 3.2 Move j left until it is at an item not greater than the pivot 3.3 If (i < j) Swap the items at i and j

After second swap 21 3 30 99 85 50 93 107 40 and pointers crossed j i

After first swap and pointers 21 3 85 99 30 50 93 107 40 have been moved i j

Original array after pointers 93 3 85 99 30 50 21 107 40 have been moved j i 51 Quicksort Recursive Representation • Base Case – One item to be sorted? Do nothing • Reduced problem – Sort a partition • General Solution – Generate two partitions using the partitioning process – Sort the left partition – Sort the right partition Pseudocode 52 Pseusocode of the Quicksort Algorithm

rightIndex and leftIndex are the bounds of the array, items, to be sorted 1. partitionSize = rightIndex - leftIndex + 1; 2. if(partitionSize <= 1) // base case, one item to be sorted 3. return; 4. pivotValue = items[(leftIndex + rightIndex) / 2]; 5. i = leftIndex; // initialize the two partition indices 6. j = rightIndex; 7. do // general solution. First generate rate the two partitions 8. { while (items[i] < pivotValue) // left partition item is in the correct partition 9. i++; 10. while (items[j] > pivotValue) // right partition item is in the correct partition 11. j--; 12. if (i <= j) // pointers have not crossed, switch items in wrong partition 13. { temp = items[i]; items[i] = items[j]; items[j]=temp; 13. i++; j--; 15. } 16. } while (i <= j); // the pointers have not crossed 17. quickSort(items, leftIndex, j); //reduced problems: sort left partition, 18. quickSort(items, i, rightIndex); // sort right partition 53 Implementation of the Quicksort (To Sort an Array of Integers) • Implemented as a static method public static void quickSort(int [] items, int leftIndex, int rightIndex) • Its code is the Java equivalent of the algorithm pseudocode • To sort the array items the invocation would be quickSort(items, 0, items.length - 1)

54 Sort Effort of the Quicksort

• n + 2 comparisons 21 3 30 99 85 50 93 107 40 j i ` made per pass, e.g. One 21 3 85 99 30 50 93 107 40 pass 4 moving i + 7 moving j i j

• Number of passes 93 3 85 99 30 50 21 107 40 depends on the data set i j – log2(n - 1) passes when pivot values are always placed in the middle of the partitions – n passes when pivot values are always placed at the left or right end of the partition (rare case) – Empirical studies show average is 1.45log2(n-1) • Average SE = (n + 2) 1.45log2(n - 1) 55 Swaps performed by the Quicksort

• Reasonable to assume that only half of the n/2 items in a partition will require a swap each pass • Average number of passes is 1.45log2(n – 1) • Swaps = (n/2)/2 swaps per pass * 1.45log2(n – 1) = n/4 * 1.45log2(n – 1) ≈ 0.25 SE

56 Overhead of the Quicksort

• Two integer pointers, i and j, to keep track to the ends of the partitions • For an average number of passes = 1.45log2(n – 1), there are log2(n-1) levels of recursion resulting in log2(n-1) partitions • Total overhead = 2 initial pointers + 2 per level of recursion = 2 + 2log2(n-1)

57 A Data Set that Degrades the Performance of the Quicksort • Data sets that result in the pivot value being always placed at the left or right end of the partition • In this case SE = n * (n + 2); O(n^2) Swaps = n * n/2; O(n^2) • These cases – Are very rare as indicated by empirical studies – Can be detected and treated as special cases

58 Quicksort Performance Summary Memory speed Overhead algorithm range Effort and Swaps range bytes Comments

fast/slow Fast for random data, slow for 2 Binary O(log2n)/ (n+1) [ log2(n+1) – 2 ] < SE <= n /2 - n/2 high 8n already sorted data. O(n2) Highest Overhead

very fast/ Fast for data almost sorted in n-1 < SE <= n2/2 – n/2 ascending order. Slow for Bubble slow lowest 4 most data sets. Low 2 0< Number of swaps < SE/2 O(n)/O(n ) overhead. Easy to code

fast SE < 3nlog2(n) Fast for all data sets. Heap lowest 4 Low overhead O(nlog2n) Number of swaps = 2/3 SE

SE ≈ (0.75n)log n Fast for all data sets. fast 2 moderate Merge 4n Moderate overhead O(nlog2n) Number of swaps = 2.66SE (n+2)log (n-1) ≤ SE ≤ n2 Fastest for most data sets, but 2 8+8* could be O(n2) for certain Quick fast/slow Average = 1.45nlog2(n) low data sets. log2(n-1) Number of swaps = 0.25SE Easy to code. Low overhead.

59 Relative Merits of Sorting Algorithms

speed memory overhead algorithm range range bytes Comments Fast for random data, slow for Binary high 8n already sorted data. Tree Highest Overhead Fast for data almost sorted in Bubble lowest 4 ascending order. Slow for most data sets. Fast for all data sets. Heap fast lowest 4

Merge Fast for all data sets. fast moderate 4n Sort Moderate overhead Fastest for most data sets, but could Quick 8+8* fastest low be O(n2) for certain Sort log (n-1) 2 data sets.

good bad 60 The End

Return to, Overview, Orderings, sort algorithm Speed or Overhead, the Binary Tree sort, Bubble sort, Heap sort, Merge sort, Quicksort, relative merits

End Presentation

61