CSE 373: Master Method, Finishing Sorts, Intro to Graphs
Total Page:16
File Type:pdf, Size:1020Kb
The tree method: precise analysis Problem: Need a rigorous way of getting a closed form We want to answer a few core questions: CSE 373: Master method, finishing sorts, How much work does each recursive level do? intro to graphs 1. How many nodes are there on level i?(i = 0 is “root” level) 2. At some level i, how much work does a single node do? (Ignoring subtrees) Michael Lee 3. How many recursive levels are there? Monday, Feb 12, 2018 How much work does the leaf level (base cases) do? 1. How much work does a single leaf node do? 2. How many leaf nodes are there? 1 2 The tree method: precise analysis The tree method: precise analysis n 1 node, n work per How many levels are there, exactly? Is it log2(n)? n n n Let’s try an example. Suppose we have T(4). What happens? 2 2 2 nodes, 2 work per 4 n n n n n 4 4 4 4 4 nodes, 4 work per 2 2 . 2i nodes, n work per . i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2h nodes, 1 work per Height is log (4) = 2. 1. numNodes(i) = 2i 2 n For this recursive function, num recursive levels is same as height. 2. workPerNode(n; i) = 2i 3. numLevels(n) = ? Important: total levels, counting base case, is height + 1. 4. workPerLeafNode(n) = 1 Important: for other recursive functions, where base case doesn’t 5. numLeafNodes(n) = ? 3 4 happen at n ≤ 1, num recursive levels might be different then height. The tree method: precise analysis The tree method: precise analysis We discovered: Solve for recursive case: log2(n) 1. numNodes(i) = 2i X n recursiveWork = 2i · n 2i 2. workPerNode(n; i) = 2i i=0 3. numLevels(n) = log (n) log2(n) 2 X 4. workPerLeafNode(n) = 1 = n 5. numLeafNodes(n) = 2numLevels(n) = 2log2(n) = n i=0 = n log2(n) Our formulas: numLevels(n) Solve for base case: X recursiveWork = numNodes(i) · workPerNode(n; i) = ( ) · ( ) i=0 baseCaseWork numLeafNodes n workDonePerLeafNode n baseCaseWork = numLeafNodes(n) · workPerLeafNode(n) = n · 1 = n totalWork = recursiveWork + baseCaseWork So exact closed form is n log2(n) + n. 5 6 The tree method: practice The tree method: practice n2 1 node, n2 work per 2 2 n n n2 9 9 2 nodes, 32 work per 4 2 2 2 2 n n n n 4 nodes, n work per Practice: Let’s go back to our old recurrence... 81 81 81 81 34 8 2 . 2i n <2 if n ≤ 1 . nodes, 32i work per S(n) = . :2S (n/3) + n2 otherwise 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2h nodes, 1 work per 1. numNodes(i) = 2i n2 2. workPerNode(n; i) = 9i 3. numLevels(n) = log3(n) 4. workPerLeafNode(n) = 2 7 5. numLeafNodes(n) = 2numLevels(n) = 2log3(n) = nlog3(2) 8 The tree method: practice The finite geometric series log3(n) i ( ) 2i 2 X 2 (2) 1. numNodes i = We have: n + 2nlog3 n2 9 2. workPerNode(n; i) = 9i i=0 3. numLevels(n) = log (n) n−1 3 X 1 − r n 4. workPerLeafNode(n) = 2 The finite geometric series identity: r i = 1 − r 5. numLeafNodes(n) = 2numLevels(n) = 2log3(n) = nlog3(2) i=0 Plug and chug: Combine into a single expression representing the total runtime. log3(n) i 0 1 2 X 2 log (2) log3(n) 2 totalWork = n + 2n 3 X i n log (2) 9 totalWork = 2 · + 2n 3 i=0 @ 9i A i=0 log3(n)+1−1 i 2 X 2 log (2) log (n) = n + 2n 3 X3 2i 9 = 2 + 2 log3(2) i=0 n i n 9 log (n)+1 i=0 1 − 2 3 = n2 9 + 2nlog3(2) log3(n) i 2 X 2 1 − 9 = n2 + 2nlog3(2) 9 i=0 9 10 Applying the finite geometric series The master theorem With a bunch of effort... 1 − 2 log3(n)+1 2 9 log3(2) totalWork = n 2 + 2n 1 − 9 ! Is there an easier way? 9 2 2log3(n) = n2 1 − + 2nlog3(2) 7 9 9 9 2 2log3(n) If we want to find an exact closed form, no. Must use either the = n2 − n2 + 2nlog3(2) 7 7 9 unfolding technique or the tree technique. 9 2 = n2 − n2nlog3(2/9) + 2nlog3(2) 7 7 9 2 2 2 (2)−2 (2) Θ = n − n nlog3 + 2nlog3 If we want to find a big- bound, yes. 7 7 9 2 = n2 − nlog3(2) + 2nlog3(2) 7 7 9 12 = n2 + nlog3(2) 7 7 11 12 The master theorem The master theorem The master theorem Given: Then... ( ) < ( ) 2 Θ( c ) Suppose we have a recurrence of the following form: 8 If logb a c, then T n n <d c T(n) = If logb(a) = c, then T(n) 2 Θ (n log(n)) 8 n c + logb (a) <d if n = 1 :aT b n If logb(a) > c, then T(n) 2 Θ n T(n) = n + c :aT b n otherwise Sanity check: try checking merge sort. Then... We have a = 2, b = 2, and c = 1. We know logb(a) = log2(2) = 1 = c, therefore merge sort is Θ(n log(n)). c I If logb(a) < c, then T(n) 2 Θ (n ) 2 c Sanity check: try checking S(n) = 2S(n/3) + n . I If logb(a) = c, then T(n) 2 Θ(n log(n)) logb (a) = 2 = 3 = 2 (2) ≤ 1 < 2 = I If logb(a) > c, then T(n) 2 Θ n We have a , b , and c . We know log3 c, 2 therefore S(n) 2 Θ n . 13 14 The master theorem: intuition The master theorem: intuition Intuition, the logb(a) > c case: Intuition, the logb(a) < c case: 1. We divide more rapidly then we do work. 1. We do work more rapidly then we divide. 2. So, most of the work happens near the “bottom”, which 2. So, more of the work happens near the “top”, which means means the work done in the leaves dominates. that the nc term dominates. 3. Note: Work in leaves is about ( ) ( ) d · aheight = d · alogb n = d · nlogb a . 15 16 The master theorem: intuition Parting thoughts on sorting Intuition, the logb(a) = c case: 1. Work is done roughly equally throughout tree. 2. Each level does about the same amount of work, so we A few final thoughts about sorting... approximate by just multiplying work done on first level by the c height: n logb(n). 17 18 Hybrid sorts Hybrid sort example: introsort Example: Introsort Core idea: Combine quick sort with heap sort Problem: Quick sort, in the best case, is pretty fast – often a (Why heap sort? It’s also O (n log(n)), and can be implemented constant factor faster then merge sort. But in the worst case, it’s 2 in-place.) O n ! 1. Run quicksort, but keep track of how many recursive calls Idea: If things start looking bad, stop using quicksort! Switch to a we’ve made different sorting algorithm. 2. If we pass some threshold (usually, 2blg(n)c) Hybrid sort 2.1 Assume we’ve hit our worst case and switch to heapsort A sorting algorithm which combines two or more other sorting 2.2 Else continue using quick sort algorithms. 2 Punchline: worst-case runtime is now O (n log(n)), not O n . Used by various C++ implementations, used by Microsoft’s .NET framework. 19 20 Adaptive sorts Adaptive sorts: Timsort Example: Timsort Observation: Most real-world data is contains “mostly-sorted Core idea: Combine merge sort with insertion sort runs” – chunks of data that are already sorted. (Who’s Tim? A Python core developer) Idea: Modify our strategy based on what the input data actually 1. Works by looking for “sorted runs”. looks like! 2. Will use insertion sort to merge two small runs and merge sort Adaptive sort to merge two large runs A sorting algorithm that adapts to patterns and pre-existing order 3. Implementation is very complex – lots of nuances, lots of in the input. clever tricks (e.g. detecting runs that are in reverse sorted Most adaptive sorts take advantage of how real-world data is order, sometimes skipping elements) often partially sorted. Punchline: worst-case runtime is still O (n log(n)), but best case runtime is O (n). Used by Python and Java. 21 22 Linear sorts Counting sort Counting sort I Assumption: Input is a list of ints where every item is between 0 and k I Worst-case runtime: O (n + k) Linear sorts (aka ‘Niche’ sorts) How would you implement this? Hint: start by creating a temp array of length k. Take inspiration from how hash tables work. Basic idea: Can we do better then O (n log(n)) if we assume more things about the input list? The algorithm: 1. Create a temp array named arr of length k. 2. Loop through the input array. For each number num, run arr[num] += 1 3. The temp array will now contain how many times we say each number in the input list. 23 4. Iterate through it to create the output list. 24 Other linear sorts Graphs Other interesting linear sorts: I Radix sort I Assumes items in list are some kind of sequence-like thing such as strings or ints (which is a sequence of digits) I Assumes each “digit” is also sortable Introduction to graphs I Sorts all the digits in the 1s place, then the 2s place..