Amortized Analysis
Total Page:16
File Type:pdf, Size:1020Kb
DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2018) Juha Kärkkäinen Based on slides by Veli Mäkinen Master’s Programme in Computer Science 06/09/2018 1 ANALYSIS OF RECURRENCES & AMORTIZED ANALYSIS Master’s Programme in Computer Science DAA 2018 week 1 / Juha Kärkkäinen 06/09/2018 2 ANALYSIS OF RECURRENCES • Analysing recursive, divide-and-conquer algorithms • Step 1: Divide problem into subproblems • Step 2: Solve subproblems recursively • Step 3: Combine subproblem results • Three methods • Substitution method (Section 4.3 in book) • Recursion-tree method (Section 4.4) • Master method (Section 4.5) • Quicksort (Chapter 7) • We will continue on Week II with this topic with advanced recursive algorithms Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 3 QUICKSORT pivot 4 7 8 1 3 6 5 2 9 1 3 2 4 7 8 6 5 9 1 3 2 4 6 5 7 8 9 2 3 4 5 6 7 8 9 2 3 5 6 Bad pivot causes recursion tree to be skewed O(n2) worst case. We learn next week how to select a perfect pivot (median) in linear time! Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 4 QUICKSORT WITH PERFECT PIVOT … … … … … … … log n levels … … O(n) work on each level O(n log n) time This is called the recursion tree method. Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 5 QUICKSORT WITH PERFECT PIVOT • Running time can also be stated as a recurrence (recursively defined equation): recursive calls • T(n) = 2T(n/2) + O(n) divide and combine • T(1) = O(1) base case • Assumes n=2k for some integer k>0 (why is this fine to assume?). • Substitution method: 1. Guess a solution (with unknown constants) 2. Prove the solution by induction a. Assume solution holds for inputs smaller than n b. Substitute according to induction assumption c. Check that that the solution holds (with appropriate constants) d. Check (and adjust if necessary) the base case Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 6 SUBSTITUTION METHOD EXAMPLE • Observation: big-O() notation is not compatible with substitution method, as we need more exact claims for induction to work. Hence we solve T(n) = 2T(n/2)+an and T(1)=a for some constant a>0. 1. Guess: T(n) ≤ c n log n for some c>0 when n≥n0 2. Prove by induction a. Induction assumption: T(n/2) ≤ cn/2 log (n/2) = cn/2 log n – cn/2 b. Substitute: T(n) = 2T(n/2)+an ≤ cn log n-cn + an. c. Check: T(n) ≤ cn log n, for any c≥a. d. Base case: T(1) = a > c 1 log 1 = 0, but T(2) = 4a ≤ c2 log 2 = 2c, when c≥2a. Thus we can choose e.g. c=2a and n0=2. • Here induction base case (n=2) is different from recurrence base case (n=1). Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 7 MASTER METHOD • Master Theorem characterizes many cases of recurrences of type T(n) = aT(n/b)+f(n). • Depending on the relationship between a,b, and f(n), three different outcomes for T(n) follow. • Let α = logb a. The cases are • If f(n)=O(nα-ε) for some constant ε>0, then T(n)=Θ(nα). • If f(n)=Θ(nα), then T(n)=Θ(nα log n). • If f(n)=Ω(nα+ε) for some constant ε>0 and if af(n/b)≤cf(n) for some constant c<1 and all sufficiently large n, then T(n)=Θ(f(n)). • Example: T(n) = 2T(n/2)+Θ(n). α α • α = log2 2=1 and f(n)=Θ(n ), thus T(n)=Θ(n log n)=O(n log n). Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 8 AMORTIZED ANALYSIS • Consider algorithms whose running time can be expressed as (time of a step) * (number of steps) = tstep * #steps = ttotal • E.g. linked list: O(1) append * n items added = O(n) • Sometimes a single step can take long time, but the total time is much smaller than what the simple analysis gives • Work done on heavy steps can be charged on the light steps • Amortized cost of a step = ttotal / #steps • Examples: • Cartesian tree construction (separate pdf) • Dynamic array (Section 17.4.2) Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 9 CARTESIAN TREE Cartesian tree on array CT(A) • root = smallest element • left subtree = Cartesian tree of subarray to left of root • right subtree = Cartesian tree of subarray to right of root A = 7 9 1 5 8 3 4 2 3.5 Naive construction needs Θ(n2) time in the worst case. Incremental left-to-right construction runs in linear time. Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 10 CARTESIAN TREE CONSTRUCTION 7 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 11 CARTESIAN TREE CONSTRUCTION 7 9 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 12 CARTESIAN TREE CONSTRUCTION 7 9 1 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 13 CARTESIAN TREE CONSTRUCTION 7 9 1 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 14 CARTESIAN TREE CONSTRUCTION . 7 9 1 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 15 CARTESIAN TREE CONSTRUCTION . 7 9 1 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 16 CARTESIAN TREE CONSTRUCTION General step: Compare new element to elements on the right-most path starting from bottom and insert in appropriate place. 7 9 1 5 8 3 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 17 CARTESIAN TREE CONSTRUCTION General step: Compare new element to elements on the right-most path starting from bottom and insert in appropriate place. 7 9 1 5 8 3 Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 18 CARTESIAN TREE Comparing a new item to all items in the right-most path may take O(n) time. But after comparing an old item, you either insert the new item, or never compare that old item again (by-pass). 7 9 1 5 8 3 4 2 3.5 The total running time is #by-passes + #insertions, which both are O(n). Hence, the amortized cost of modifying CT(A[1..n-1]) into CT(A[1..n]) is O(1). Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 19 DYNAMIC ARRAY / TABLE Insert (to full array) Double array size Delete Bad idea Half array size Insert Double array size … Worst case: each insert and delete needs O(n) time for doubling/halfing Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 20 DYNAMIC ARRAY / TABLE Insert (to full array) Double array size Delete … More deletes Half array size after n/4 deletions Each doubling/halfing of array of size n is followed by O(n) inserts/deletes before another doubling/halfing → constant amortized time for insert/delete Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 21 STRATEGIES FOR AMORTIZED ANALYSIS • Aggregate method (Section 17.1) • Show that each step grows some quantity that is bounded. The bound on the quantity can be used to show that total time used for all steps is proportional to that same bound. • In Cartesian tree construction, each step added one to #by-passes or #insertions. Both are bounded by n, and hence the total number of steps is at most 2n. • Accounting method (Section 17.2) • Pay in advance the expensive operations by charging them from the cheap operations. Then show that any sequence of operations has more operations in bank account than the number of true operations. • In Dynamic array we pay for 2 copy operations at each insertion or deletion. Consider any sequence of operations after a halfing / doubling to size n until next: ‒ Halfing: n/4 deletions have gathered n/2 credits which is sufficient to copy n/4 elements to a new location. ‒ Doubling: n/2 insertions have gathered n credits which is sufficient to copy n elements to a new location. Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 22 STRATEGIES FOR AMORTIZED ANALYSIS • Potential method (Section 17.3) • Let p(t), p(t) ≥0, be a potential of data structure after t operations with p(0)=0. • Let at(t)=c(t)+p(t)-p(t-1) be the amortized time of operation t, where c(t) is the actual cost of that operation. • By telescoping cancellation one can see that the sum of amortized times of n operations is at(1)+at(2)+...+at(n) = c(1)+c(2)+…+c(n)+p(n) and thus an upper bound for the actual running time. • To show e.g. that total running time is linear, it is sufficient to show that for each type of operation amortized time is constant! • This kind of analysis requires a good guess on p(t).