Sorting Chapter 8

Sorting Chapter 8 Click to proceed How To View This Presentation This presentation is arranged in outline format. To view the slides in proper order • For each slide – Read the entire slide’s text – Then click the links (if any) in the text, in the order they appear in the text – Then click the or button at the bottom- right side of the slide – Do not use the space bar to advance 2 Overview • Sorting is the process of ordering a set of items • Like data structures, sorting algorithms are evaluated based on – Their speed – The amount of memory overhead they require – The programming effort required to implement them • Classic sorting algorithms include the Binary Tree sort, Bubble sort, Heap sort, Merge sort and Quick sort • Each of the above sorts has its merits 3 Ordering Items • Most common orderings are ascending and descending order • In the context of data structures – Nodes are sorted based on the contents of a field • To produce sorted output listings • To improve the speed of their data structure – E.g., Binary search tree and a sorted array fetch are O(log2n) – But sorting takes time, therefore we need • Fast algorithms O(log2n) • Good strategies – Store sorted (mostly fetch operations) – Sort before output (many inserts, deletes, and updates) 4 Sorting Algorithm Speed • All sorting algorithms repeatedly – Compare two items – Swap them if they are out of order • Therefore the speed of a sorting algorithm is dependent on the – Number of comparisons it makes – Number of data swaps it makes • When sorting nodes, shallow swaps save time 5 Number of Comparisons • Sort effort, SE, is used as a metric – Defined as: The number of comparisons performed to sort n items – There is a theoretical minimum: SEmin = nlog2n – SE for some algorithms approaches SEmin • Contribution to algorithm speed is SE * (f * tf + tc) – f: fetches per comparison; tf: time to fetch an item; tc: time per comparison – To determine SE, the algorithm is analyzed • Even if SE = SEmin sorting large data sets is time consuming 6 Determination of Sort Effort • Look for the lines comparing the items to be sorted • Determine how many times they execute • For example; // n items to be sorted – Line 3 compares items 1. for(int i = 1; i <= n; i++) – It executes n times as part 2. { for(int j = 1; j <=n; j++) of the inner loop 3. { if( items[j] > items[i] ) – The inner loop executes 4. { temp = items[j]; n times as part of the outer 5. items[j] = items[i]; loop 6. items[i] = temp; 7. } // end of if statement – Therefore, Line 3 executes 8. } // end of inner loop n * n times 9. } // end of outer loop – Sort Effort = n ^ 2 7 Time to Sort Large Data Sets Minimum Time to Sort n Items (no swaps, 1 fetch per comparison, and SE = SEmin) (assumes 41 nanoseconds for a data fetch and comparison) 8.00 6.00 4.00 in minutes 2.00 0.00 Minimum sortingtime 0 40 80 120 160 200 240 280 320 360 400 Number of items to be sorted , n (in millions) • Even with no swaps and SE = SEmin, sorting 300 million social security records takes 5.5 minutes • For the assumed fetch and comparison time of 41 nanoseconds, an n ^ 2 algorithm would take 11.4 hours 8 Determining the Number of Swaps • Look for the lines that swap items to be sorted • Determine how many times they execute • For example; // n items to be sorted – Lines 4-6 swap items 1. for(int i = 1; i <= n; i++) – If ½ of the comparisons result 2. { for(int j = 1; j <=n; j++) in swaps, they execute n/2 3. { if( items[j] > items[i] ) times as part of the inner loop 4. { temp = items[j]; – The inner loop executes 5. items[j] = items[i]; n times as part of the outer 6. items[i] = temp; loop 7. } // end of if statement 8. } // end of inner loop – Therefore, Lines 4-6 executes ½ n * n times 9. } // end of outer loop – Number of swaps = ½ n ^ 2 9 Shallow Swaps • Two ways to swap nodes – Deep swap: deep copies the nodes temp = items[j].deepCopy; // copy all node fields items[j] = items[i].deepCopy; items[i] = temp; – Shallow swap: shallow copies the nodes temp = items[j]; // copy 1 reference variable items[j] = items[i]; items[i] = temp; • Shallow swap is faster because a deep copy copies all node fields Graphical Depiction 10 Graphical Depiction of Shallow and Deep Swaps object A object A items[0] node[0] items[1] object C node[1] object C items[2] node[2] object B object B : : items[n-1] object X node[n-1] object X Time Consuming Deep Swap Rapid Shallow Swap temp = items[1].deepCopy; temp = items[1] ; items[1] = items[2].deepCopy; items[1] = items[2]; items[2] = temp; items[2] = temp; 11 Sorting Algorithm Overhead • Can be minimal: 4 bytes – The variable temp used in the swap • Can be significant: 8n bytes – Two reference variable per item sorted • The amount required depends on the sorting algorithm 12 Binary Tree Sort • Items to be sorted are inserted into a Binary Search tree – An LNR traversal visits the items in ascending order – An RNL traversal visits the items in decending order • Its algorithm is basically the Binary Search Tree’s Insert algorithm • Its sort effort can approach the minimum or be O(n^2) • It is one of the few algorithms not to perform swaps • Its overhead is high Performance Summary 13 Binary Tree Sort Performance Summary memory speed overhead algorithm range Effort and Swaps range bytes Comments fast/slow Fast for random data, slow for Binary 2 O(log2n)/ (n+1) [ log2(n+1) – 2 ] < SE <= n /2 - n/2 high 8n already sorted data. Tree O(n2) Highest Overhead 14 Binary Tree Sort Algorithm 1. The first item becomes the root node. 2. For any subsequent item, consider the root node to be a root of a subtree, and start at the root of this subtree. 3. Compare the item to the root of the subtree. 3.1.1 If the new item is less than the subtree’s root, then the new subtree is the root’s left subtree 3.1.2 Else the new subtree is the root’s right subtree. 4. Repeat step 3 until the new subtree is empty. Position the new item as the root of this empty subtree. 5. Repeat steps 2, 3 and 4 for each item to be sorted An LNR traversal Outputs the items in ascending order Graphical Representation 15 Binary Tree Sort Algorithm Graphical Representation 50 50 50 50 40 40 40 47 47 50 50 50 63 40 63 40 63 40 63 47 55 47 55 47 55 43 43 50 40 63 70, 80, 35, 68 35 47 55 70 43 68 80 16 Sort Effort of the Binary Tree Sort • Low when the items to be sorted are in 5 random order (or close to randomized) 3 7 2 4 6 9 – The tree is balanced – Zero comparisons for first item; < log2(n+1) comparisons per item for all subsequent items; O(nlog2n) • High when the items to be sorted are already 2 sorted (or closed to sorted) 4 7 – The tree is highly skewed 9 – Zero comparisons for the first item, 1 for the second, 2 for the third, … n-1 for the nth; O(n^2) 17 Overhead Of the Binary Tree Sort • Two reference variables per item (4 bytes each) are required to point to the left and right child Overhead = 2 variables * 4 bytes * n items = 8n bytes • The array-based representation of a binary tree is not much better – If the items were randomized: 4n bytes – But, a sorted set would require: 2^n bytes 18 Bubble Sort • Its algorithm is very simple to code • For most data sets – Its sort effort is high – It performs a high number of swaps O(n^2) • For some data sets (almost sorted) – It is the best performing algorithm • Its overhead is very low – It only requires two variables of overhead • temp for the swaps • A Boolean variable to detect an early completion Performance Summary 19 Bubble Sort Performance Summary memory speed overhead algorithm range Effort and Swaps range bytes Comments fast/slow Fast for random data, slow 2 Binary O(nlog2n)/ (n+1) [ log2(n+1) – 2 ] < SE <= n /2 - n/2 high 8n for already sorted data. O(n2) Highest Overhead Fast for data almost sorted in very fast/ n-1 < SE <= n2/2 – n/2 ascending order. Slow for slow Bubble lowest 4 most data sets. Low O(n)/O(n2) 0 < Number of swaps < SE / 2 overhead. Easy to code 20 Bubble Sort Algorithm • The algorithm can make up to n-1 passes – Each pass places at least one item in its sorted position • The smallest item on pass 1, the 2nd smallest item on pass 2, the 3rd smallest item on pass 3, … • Small items bubble up to the top of the array – Each pass compares each item to the item above it • When not in their proper relative position, they are swapped (flipped) • The algorithm can end early if a flip is not made during a pass Pseudocode 21 Bubble Sort Algorithm, Graphical Representation initial End of Pass order 1 2 3 4 8 1 1 1 1 2 8 2 2 2 6 2 8 3 3 10 6 3 8 6 3 10 6 6 8 1 3 10 9 9 9 9 9 10 10 items Placed in Their Correct Sorted Position at the End of Each Pass Pass 1 Pass 3 Pass 2 Pass 4 Initial order 8 8 8 8 8 8 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 8 8 8 8 8 2 2 2 2 2 2 2 2 6 6 6 6 1 2 2 2 2 2 2 8 8 8 8 3 3 3 3 10 10 10 1 6 6 6 6 6 3 3 3 3 3 3 8 8 8 6 3 3 1 10 10 10 10 10 3 6 6 6 6 6 6 6 6 6 8 1 1 3 3 3 3 3 3 10 10 10 10 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 Items Compared and Swaps Made During A Pass 22 Bubble Sort Algorithm Pseudocode (to sort n items) 1.

Load more