DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2018)

Juha Kärkkäinen

Based on slides by Veli Mäkinen

Master’s Programme in Computer Science 06/09/2018 1 ANALYSIS OF RECURRENCES & AMORTIZED ANALYSIS

Master’s Programme in Computer Science DAA 2018 week 1 / Juha Kärkkäinen 06/09/2018 2 ANALYSIS OF RECURRENCES

• Analysing recursive, divide-and-conquer algorithms • Step 1: Divide problem into subproblems • Step 2: Solve subproblems recursively • Step 3: Combine subproblem results • Three methods • Substitution method (Section 4.3 in book) • Recursion-tree method (Section 4.4) • Master method (Section 4.5) • Quicksort (Chapter 7) • We will continue on Week II with this topic with advanced recursive algorithms

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 3 QUICKSORT

pivot 4 7 8 1 3 6 5 2 9

1 3 2 4 7 8 6 5 9

1 3 2 4 6 5 7 8 9 2 3 4 5 6 7 8 9 2 3 5 6

Bad pivot causes recursion tree to be skewed  O(n2) worst case. We learn next week how to select a perfect pivot (median) in linear time!

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 4 QUICKSORT WITH PERFECT PIVOT

… …

… … … … log n

levels

O(n) work on each level  O(n log n) time This is called the recursion tree method.

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 5 QUICKSORT WITH PERFECT PIVOT

• Running time can also be stated as a recurrence (recursively defined equation): recursive calls • T(n) = 2T(n/2) + O(n) divide and combine • T(1) = O(1) base case • Assumes n=2k for some integer k>0 (why is this fine to assume?). • Substitution method: 1. Guess a solution (with unknown constants) 2. Prove the solution by induction a. Assume solution holds for inputs smaller than n b. Substitute according to induction assumption c. Check that that the solution holds (with appropriate constants) d. Check (and adjust if necessary) the base case

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 6 SUBSTITUTION METHOD EXAMPLE

• Observation: big-O() notation is not compatible with substitution method, as we need more exact claims for induction to work. Hence we solve T(n) = 2T(n/2)+an and T(1)=a for some constant a>0.

1. Guess: T(n) ≤ c n log n for some c>0 when n≥n0 2. Prove by induction a. Induction assumption: T(n/2) ≤ cn/2 log (n/2) = cn/2 log n – cn/2 b. Substitute: T(n) = 2T(n/2)+an ≤ cn log n-cn + an. c. Check: T(n) ≤ cn log n, for any c≥a. d. Base case: T(1) = a > c 1 log 1 = 0, but T(2) = 4a ≤ c2 log 2 = 2c, when c≥2a.

Thus we can choose e.g. c=2a and n0=2. • Here induction base case (n=2) is different from recurrence base case (n=1).

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 7 MASTER METHOD

• Master Theorem characterizes many cases of recurrences of type T(n) = aT(n/b)+f(n). • Depending on the relationship between a,b, and f(n), three different outcomes for T(n) follow.

• Let α = logb a. The cases are • If f(n)=O(nα-ε) for some constant ε>0, then T(n)=Θ(nα). • If f(n)=Θ(nα), then T(n)=Θ(nα log n). • If f(n)=Ω(nα+ε) for some constant ε>0 and if af(n/b)≤cf(n) for some constant c<1 and all sufficiently large n, then T(n)=Θ(f(n)). • Example: T(n) = 2T(n/2)+Θ(n). α α • α = log2 2=1 and f(n)=Θ(n ), thus T(n)=Θ(n log n)=O(n log n).

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 8

AMORTIZED ANALYSIS

• Consider algorithms whose running time can be expressed as (time of a step) * (number of steps) = tstep * #steps = ttotal • E.g. linked list: O(1) append * n items added = O(n) • Sometimes a single step can take long time, but the total time is much smaller than what the simple analysis gives • Work done on heavy steps can be charged on the light steps

• Amortized cost of a step = ttotal / #steps • Examples: • construction (separate pdf) • (Section 17.4.2)

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 9 CARTESIAN TREE

Cartesian tree on array CT(A) • root = smallest element • left subtree = Cartesian tree of subarray to left of root • right subtree = Cartesian tree of subarray to right of root

A = 7 9 1 5 8 3 4 2 3.5

Naive construction needs Θ(n2) time in the worst case.

Incremental left-to-right construction runs in linear time.

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 10 CARTESIAN TREE CONSTRUCTION

7

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 11 CARTESIAN TREE CONSTRUCTION

7 9

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 12 CARTESIAN TREE CONSTRUCTION

7 9 1

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 13 CARTESIAN TREE CONSTRUCTION

7 9 1

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 14 CARTESIAN TREE CONSTRUCTION

. . . 7 9 1

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 15 CARTESIAN TREE CONSTRUCTION

. . . 7 9 1

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 16 CARTESIAN TREE CONSTRUCTION

General step: Compare new element to elements on the right-most path starting from bottom and insert in appropriate place.

7 9 1 5 8 3

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 17 CARTESIAN TREE CONSTRUCTION

General step: Compare new element to elements on the right-most path starting from bottom and insert in appropriate place.

7 9 1 5 8 3

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 18 CARTESIAN TREE

Comparing a new item to all items in the right-most path may take O(n) time.

But after comparing an old item, you either insert the new item, or never compare that old item again (by-pass).

7 9 1 5 8 3 4 2 3.5 The total running time is #by-passes + #insertions, which both are O(n). Hence, the amortized cost of modifying CT(A[1..n-1]) into CT(A[1..n]) is O(1).

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 19 DYNAMIC ARRAY / TABLE

Insert (to full array)

Double array size

Delete

Bad idea Half array size

Insert

Double array size

… Worst case: each insert and delete needs O(n) time for doubling/halfing

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 20 DYNAMIC ARRAY / TABLE

Insert (to full array)

Double array size

Delete

… More deletes

Half array size after n/4 deletions

Each doubling/halfing of array of size n is followed by O(n) inserts/deletes before another doubling/halfing → constant amortized time for insert/delete

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 21 STRATEGIES FOR AMORTIZED ANALYSIS

• Aggregate method (Section 17.1) • Show that each step grows some quantity that is bounded. The bound on the quantity can be used to show that total time used for all steps is proportional to that same bound. • In Cartesian tree construction, each step added one to #by-passes or #insertions. Both are bounded by n, and hence the total number of steps is at most 2n. • Accounting method (Section 17.2) • Pay in advance the expensive operations by charging them from the cheap operations. Then show that any sequence of operations has more operations in bank account than the number of true operations. • In Dynamic array we pay for 2 copy operations at each insertion or deletion. Consider any sequence of operations after a halfing / doubling to size n until next: ‒ Halfing: n/4 deletions have gathered n/2 credits which is sufficient to copy n/4 elements to a new location. ‒ Doubling: n/2 insertions have gathered n credits which is sufficient to copy n elements to a new location.

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 22 STRATEGIES FOR AMORTIZED ANALYSIS

• Potential method (Section 17.3) • Let p(t), p(t) ≥0, be a potential of after t operations with p(0)=0. • Let at(t)=c(t)+p(t)-p(t-1) be the amortized time of operation t, where c(t) is the actual cost of that operation. • By telescoping cancellation one can see that the sum of amortized times of n operations is at(1)+at(2)+...+at(n) = c(1)+c(2)+…+c(n)+p(n) and thus an upper bound for the actual running time. • To show e.g. that total running time is linear, it is sufficient to show that for each type of operation amortized time is constant! • This kind of analysis requires a good guess on p(t). • Consider Dynamic array with insertions only. Let p(t)=2m-n, where n is the size of the array and m is the number of elements: For insertions not causing doubling, at(t)=1+2m-n-(2(m-1)-n)=3. For insertions causing doubling, at(t)=n/2+1+2(n/2+1)-n-(2n/2-n/2)=3.

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 23

AMORTIZED ANALYSIS VS COMPLEXITY

• Amortized analysis is a technique to analyse (worst case) complexity of an algorithm • E.g. Cartesian tree construction takes linear worst case time. • Amortized complexity refers to operations on data structures:

• Any series of n operations takes ttotal time, hence one operation takes amortized ttotal / n time. One can talk about amortized complexity or amortized cost of an operation. ‒ Some subset of supported operations might even have good worst case bounds. • E.g. Insert / delete on dynamic arrays have amortized complexity O(1) ‒ Any series of n intermixed insertions / deletions take O(n) worst case time.

Master’s Programme in Computer Science DAA 2018 lecture 1 / Juha Kärkkäinen 06/09/2018 24