<<

Analyzing Code with Θ, O and Ω

Owen Kaser November 2, 2017

Originally written Fall 2012, heavily revised Fall 2016. last revised November 2, 2017. These notes, prepared for CS2383, try to show how to use Θ and friends to analyze time complexity of algorithms and code fragments. For an alterna- tive explanation, see the Wiki textbook sections https://en.wikibooks.org/ wiki/Data_Structures/Asymptotic_Notation and https://en.wikibooks. org/wiki/Algorithms/Mathematical_Background.

1 Definitions

The Sedgewick book used in 2016 prefers to fix a cost model for an algorithm (frequently the number of array accesses it makes) and then use a “tilde (∼) notation” to give a fairly accurate estimate of the cost. It briefly mentions a related approach on pages 206&207, then explains why they have avoided using it. Nevertheless, this related approach is very widely used, so I think you need to learn it. We expand on the related approach in this document. In the related approach, our cost model is implicitly the number of primitive operations executed at runtime, which would be closely related to the run- ning time: we assume a unit-cost model where each primitive operation takes the same number of nanoseconds to execute. Because we (thankfully!) write programs in high level languages rather than by specifying only primitive opera- tions, we have to make a few guesses about the basic operations that arise from our high level programs. Fortunately, the details will end up not mattering. As with the tilde notation, our focus is on the growth rate of the running time as the amount of input becomes very large. To describe growth rates, we use “big Oh” notation to compare the running- time growth rate of an algorithm against the growth rate of a function (e.g., n2) that we intuitively understand. In what follows, n is a nonnegative integer and represents the “size” of the input being processed. Usually, it will be the number of data items in an array or set being processed However, it could represent the value of a single input integer or it could even represent the total number of bits required to hold the input. If you are not told what n is measuring, it is probably supposed to be clear from the context.

1 Functions f and g are functions that make sense for algorithm running times. Given an input size n, f(n) is the running time, and is thus some nonnegative real number. Since we usually care only about growth rates of functions, the units (eg, nanoseconds) are usually not relevant. In the math examples, we sometimes play with functions that can give negative values for small n. Pretend that they don’t . . .

Big-Oh: We write that f(n) ∈ O(g(n)) if there exists a positive integer n0 and a non-negative constant c such that

f(n) ≤ cg(n), for all n ≥ n0 . “Eventually, f(n) is less than some scaled version of g(n).” “f’s growth rate is no larger than g’s growth rate.” Examples:

• n − 1 ∈ O(n) because n − 1 ≤ 1 · n for every n. (So we could take n0 to be 1.) • n + 1 ∈ O(n) because n + 1 ≤ 2n when n is large enough. (I could just choose n0 = 10, or if I wanted to choose n0 as small as possible, I could start by solving n + 1 = 2n for n. But this is not required.)

• 2n − 1 ∈ O(n). I can choose c = 3 and n0 = 10.

2 2 • n + 1 ∈ O(n ). I can choose c = 2 and n0 = 10. • n + 1 ∈ O(n2) since n + 1 ≤ 1 · n2 for n ≥ 2. • n2 6∈ O(n). Even if I choose c = 1000, it is not true that n2 ≤ 1000n when n is big: any value of n > 1000 is problematic. A conventional abuse of the notation is that we write f(n) = O(g(n)) or say “f(n) is O(g(n))” instead of f(n) ∈ O(g(n)). Note that O(g(n)) is supposed to be an (infinite) set of functions. For instance, O(n2) = {n2, n2 − 1, n2 + 1 2 1.5 1, 2 n , n, n + 1, n ,...} Typical uses in CS: • “The merge-sort algorithm’s average-case running time is O(n log n).” We are not told exactly how well the merge-sort algorithm will scale up to handle large amounts of input data, but we are told the growth rate is at worst “linearithmic”, which is a desirable growth rate. • “The insertion-sort algorithm’s average-case running time is O(n2).” Again, we are not told exactly how well the insertion sort will scale up to handle large amounts of input data, but we are told the growth rate is at worst “quadratic”. It might, or might not, be better than that (it could be lin- earithmic, for instance). But the statement allows that it could actually have a quadratic growth rate, which is pretty bad.

2 The person making the statement above could have legitimately also say “The insertion-sort algorithm’s average-case running time is O(n3)” with- out contradicting the first statement, since O(n2) ⊆ O(n3). But if they knew it is O(n2), they are depriving you of useful information by merely telling you it is O(n3).

Big : In a sense, big-Oh allows us to state upper bounds on the growth rate of a function. If we want to state a lower bound on a growth rate, we use big-Omega notation. The definition is almost the same as the big-Oh definition, except that the direction of the inequality has been reversed. We write that f(n) ∈ Ω(g(n)) if there exists a positive integer n0 and a non-negative constant c such that

f(n) ≥ cg(n), for all n ≥ n0

. “Eventually, f(n) is more than some scaled version of g(n).” “f’s growth rate is no less than g’s growth rate.” “The insertion-sort algorithm’s average-case running time is Ω(n2).” This is a (true) statement that insertion sort is bad (given that it is possible to sort in O(n log n) time with other algorithms). We’re not saying exactly how bad, but it is at least quadratic in its slowness. We can provide true, but less useful, results by replacing Ω(n2) by Ω(n1.5), since Ω(n2) ⊆ Ω(n1.5). This would say that insertion sort is at least somewhat bad.

Big Theta: Big-Theta notation allows us to state both an upper and a lower bound for the growth rate of a function. We write that f(n) ∈ Θ(g(n)) if there exists a positive integer n0 and a non-negative constants c1 and c2 such that

c1g(n) ≤ f(n) ≤ c2g(n), for all n ≥ n0

. If f(n) ∈ Θ(g(n)), then the two functions have the same growth behaviour. For instance, n2 + n + 5 ∈ Θ(n2). Big-Oh, Big-Theta and Big-Omega are more formally referred to as “asymp- totic notations”: they describe the behaviours of the functions when the input n is approaching infinity. However, we hope to use them to describe the behaviours of algorithms on reasonably large inputs. It is helpful to view big-Oh as letting you state a ≤ relationship between two growth rates, while big-Omega lets you state a ≥ relationship. Big-Theta lets you state that two functions have the same asymptotic growth rate.

3 2 Rarer asymptotic notations

There are other notations that are sometimes used to start < and > relationships between growth rates. They are usually defined using limits, though the Wiki textbooks (and others) have an alternate definition that works out the same in many, but perhaps not all, cases.

f(n) little-oh: If limn→∞ g(n) = 0, then we write f(n) ∈ o(g(n)).

f(n) little-omega: If limn→∞ g(n) diverges toward +∞, then we write f(n) ∈ ω(g(n)). soft-Oh: If we can find a positive constant k such that f(n) ∈ O(logk(n)g(n), then we write f(n) ∈ O˜(g(n)).

Comments: 1. Little-oh lets us say that a function definitely grows more slowly than another. 2. Little-omega lets us say that a function definitely grows more quickly than another. f(n) 3. If limn→∞ g(n) = c, for some positive constant c, then f(n) ∼ c · g(n) and also f(n) ∈ Θ(g(n)). However, there are cases where one can find two functions f 0 and g0 where f 0(n) ∈ Θ(g0(n)) but the limit is undefined. For instance, consider  2n if n is even f(n) = 3n if n is odd Here, f(n) ∈ Θ(n) but the limit of f(n)/g(n) does not exist. Fortunately, many interesting algorithms’ runtime functions are not as as weird as f. 4. In many practical situations, logarithmic factors might as well be con- stants, and soft-Oh gives you a way to ignore them.

3 Handling Multi-variable Functions

Sometimes, the “input size” is not best expressed by a single number and we want running time functions that might depend on several input parameters. Consider a sorting algorithm whose speed depends both on the number of values in the array and also on the magnitude of the largest value in the array. This can be handled in different ways. The solution given in a widely used textbook by Cormen [?] is to define:

f(n, m) ∈ O(g(n, m)) if there are positive constants c, n0, m0 such that f(n, m) ≤ cg(n.m) for all n ≥ n0 or m ≥ m0.

4 4 Properties of Big-Oh, Theta & Omega

There are a variety of properties that are sometimes useful. See [?, ?]. • f(n) ∈ O(g(n)) ∧ f(n) ∈ Ω(g(n)) ⇐⇒ f(n) ∈ Θ(g(n)). This gives an alternative way to show a big-Theta relationship. • (Transitivity) If f(n) ∈ Θ(g(n)) and g(n) ∈ Θ(h(n)) then f(n) ∈ Θ(h(n)). Similarly for big-Oh and big-Omega.

• (Symmetry of Theta) f(n) ∈ Θ(g(n)) ⇐⇒ g(n) ∈ Θ(f(n)). • (Transpose Symmetry for big Oh and Omega) f(n) ∈ O(g(n)) ⇐⇒ g(n) ∈ Ω(f(n)) • If f(n) ∼ g(n) then f(n) ∈ Θ(g(n)). This lets you adapt results from Sedgewick’s book. • If p(n) is a polynomial of degree k, then p(n) ∈ Θ(nk). • For any positive constant c, we have c · f(n) ∈ Θ(f(n)).

• For any positive constant c, we have logc(n) ∈ Θ(log2(n)). √ • For any constant k, logk(n) ∈ O(n). Actually, it is in O( a n) for any a > 0.

5 Simplifications

(Each of the following simplifications could be proved, but most of the proofs have been omitted.) In what follows, c is a positive constant and f and g are functions from integers to non-negative real numbers. The formal definition of Θ, O and Ω in some textbooks allows functions that produce negative numbers. This could make some simplifications invalid. Running times cannot be negative, so if our functions describe the relationship between input sizes and running times, we can ban negative outputs. Recall that knowing f ∈ Θ(g(n)) means that f ∈ O(g(n)) and f ∈ Ω(g(n)), so you can replace “Θ(g(n))” by the (less informative) “O(g(n))” if it is helpful. Besides the common simplifications listed below, you can often use the def- initions of Θ, O and Ω to justify other simplifications when necessary.

1. Θ(cf(n)) should always be replaced Θ(f(n)). There should be no unnec- essary constants c.

2. Θ(logcf(n)) should be written without c; the base of the logarithm has at most a constant effect and thus does not matter.

5 3. Θ(f(n)) + Θ(g(n)) ⇒ Θ(max(f(n), g(n))). (We consider only which func- tion is bigger when n is very large.) E.g., replace Θ(n log n) + Θ(n2) by Θ(n2). 4. Θ(f(n)) + Θ(g(m)) ⇒ Θ(f(n) + g(m)). This rule is better than nothing. E.g., replace Θ(x2y) + Θ(xy2) by Θ(x2y + xy2). 5. Θ(f(n)) ∗ Θ(g(n)) ⇒ Θ(f(n) ∗ g(n)) 6. Pn Θ(g(i)) ⇒ Θ(Pn (g(i)). i=0 P i=0 P Proof: i Θ(g(i)) means i f(i), for some function f(n) with

c1g(n) ≤ f(n) ≤ c2g(n) for n > n0. So Pn f(i) = Pn0−1 f(i) + Pn f(i) i=0 i=0 i=n0 = c + Pn f(i) 3 i=n0 Now

Pn Pn c3 + i=n f(i) ≤ c3 + i=n c2g(i) 0 Pn 0 ≤ c3 + i=0 c2g(i) Pn ≤ i=0(c2 + c3)g(i) if g(i) ≥ 1. Pn Pn A similar but simpler argument shows i=0 c2g(i) ≥ i=0 f(i) This jus- Pn tifies the replacement by i=0 Θ(g(i)) (providing that g(i) ≥ 1). For a function that depends on i and n, a sum over i can be similarly simplified. E.g. n n n X X X Θ(ni2) ⇒ Θ( ni2) ⇒ Θ(n i2) ⇒ Θ(n ∗ Θ(n3)) ⇒ Θ(n4) i=2 i=2 i=2 . P P 7. I f(i) ∈ O( I∪I0 f(i). This rule is not valid for Θ or Ω and basically says that you can add in extra items that “aren’t really there.” E.g.,

n/2 n X X i2 ∈ O( i2) i=0 i=0 . P P 0 8. I f(i) ∈ Ω( I0 f(i)), where I ⊆ I. This rule is not valid for O or Θ and says you can ignore troublesome terms in a sum. E.g., n n X X i ∈ Ω( i). i=0 i=n/2 Pn We can then attack Ω( i=n/2 i) because each term is bigger than n/2, Pn 2 2 2 and there are about n/2 terms. So i=n/2 i > n /4. Since n /4 ∈ Ω(n ) Pn 2 we have an easy argument that i=0 i ∈ Ω(n ), even if you have forgotten Pn the exact formula for i=0 i.

6 6 Direction of Attack

Recursive code requires the creation and solving of recurrences. Since this is a little more complicated, we ignore recursion in this section. So assume that we have structured and non-recursive code — the control flow is given by a combination of if, while, for statements, as well as blocks of code that are sequentially composed. It is easy to deal with method/function calls, because if we don’t have recursion, you can always substitute the called code in place of the method call. Nested statements should be attacked from the inside out. Work on getting a Θ expression for the innermost statements first. Then, once you have determined that that innermost statement is Θ(g(n)) (or whatever), use the rules in the following section to determine the Θ expression for the statement it is nested within, and so forth.

7 Rules for Structures

Simple statements are Θ(1), presuming they reflect activities that can be done in a constant number of steps. E.g., i = A[j/2]+ 5 has a cost of Θ(1).

Blocks of sequential statements: Add the costs of the statements. The sim- plification rule for Θ(f(n)) + Θ(g(n)) comes in handy here. Essentially, you can analyze the cost of an entire block by taking the most expensive statement in the block.

Loops — we presume that the loop test is cheap enough that we can ignore it. The cost of the loop is obtained by adding up the individual costs of all the iterations. E.g.

for j=1 to n i = A[j/2]+ 5 Pn Pn The cost of this is j=1 Θ(1), which we can simplify to Θ( j=1 1) which further simplifies to Θ(n). When the cost of each iteration is not affected by the value of the index variable (j in this case), we have a simpler rule: For a loop that iterates Θ(f(n)) times and has a cost (that does not vary from iteration to iteration) of Θ(g(n)) per iteration, the total cost is Θ(f(n))∗Θ(g(n)). For O, there is a particularly simple rule. Determine the cost of the most expensive iteration of the loop. Multiply it by the number of iterations. This rule is only for O. Reanalyzing the loop

for j=1 to n i = A[j/2]+ 5

7 we compute the cost by noting the most expensive iteration costs O(1) and the loop iterates O(n) times. The total cost is thus O(n ∗ 1) = O(n). A more interesting case is

for j=1 to n for k=1 to j i = A[j/2]+ k The innermost statement (the line with A) has cost Θ(1). We work on the “for k” statement next. Once this loop is reached, it iterates j times, costing us Θ(1) per iteration. Its total cost (when activated once) is thus Θ(j ∗ 1). Now gets harder, because the outermost loop needs to be analyzed. It iterates n times, but the trick is that its first iteration (when j = 1) does not make the “for k” loop do much. The last iteration of the outermost loop, when j = n, makes the “for k” loop work hard. A big-Oh analysis can just say that the “for k” loop does O(n) work. (Math- ematically, if we know the the loop does O(j) work and we also know j < n, a simplification rule not listed above says we can replace j by n and deduce the “for k” loop does O(n) work every time it is activated.) It is the body of the outermost loop, which runs O(n) times. Hence, the total cost for the outermost loop is O(n) ∗ O(n). A better, big-Theta analysis can be done as follows: The total cost of the outermost loop is Pn Pn j=1 Θ(j) = Θ( i=1 j) = Θ(n(n + 1)/2) = Θ(n2) It seems like more work, and we just got the same “n2” answer as with big-Oh. However, it is both an upper and a lower bound. Another way to get a Θ(n2) answer would have been to do the quick and sloppy O(n2) analysis. If we can do a quick and sloppy Ω(n2) analysis, then the two sloppy bounds together imply the desirable Θ(n2) answer. So let’s do an Ω analysis of the code. First, the innermost statement costs Ω(1). This statement is repeated Ω(j) times, giving us a cost of Ω(j ∗ 1) for the “for k” loop. The trick of replacing j by n is not valid in an Ω analysis, where we could only replace j by something that will make the result smaller. The corresponding idea to the big-Oh approach of identifying the maximum cost of any iteration is to identify the minimum cost of any iteration, and then multiply it by the number of iterations. Unfortunately, the minimum cost of the “for k” loop is very small (consider when k is 1). Taking this minimum cost of Ω(1) and multiplying it by the number of iterations, we have Ω(1) ∗ Ω(n), which does not get the n2 bound we want. One solution is inspired by the example in the last simplification rule: con- sider just the work done by the last n/2 iterations. For these iterations, we have j ≥ n/2 and so the cost per iteration of the outermost loop is Ω(n). Since

8 n/2 ∈ Ω(n) we can multiply the number of (considered) iterations by the mini- mum cost per (considered) iteration. There are Ω(n) considered iterations and each costs Ω(n). Thus the bound is Ω(n2). Combined with the O(n2) analysis, we have shown the running time is in Θ(n2).

If-Then-Else statements are tricky. We shall assume the condition tested is cheap and can be ignored. First, note that a missing “else” can be modelled by an else statement that does nothing significant and costs Θ(1). The problem is that we may have to use human cleverness (that cannot be written into a formula) to deal properly with the fact that sometimes we execute the “then” part, and other times we execute the “else” part. For big-Oh, there is a sloppy but correct solution: we take the maximum (or the sum) of the two parts. Whatever actually happens, it it can be no worse than our pessimistic bound. The difficulty is with code like if () else For big-Omega, the similar sloppy solution is to take the smaller of the two costs. This will give a bad bound for cases like if () else For big-Theta, we generally have to use human cleverness. Consider for i=0 to n if i is even for j=1 to n*n k=k+A[j] else for j=1 to n k=k+A[j]

Working from the inside out, we discover the “then” part would cost Θ(n2), but the “else” part would cost only Θ(n). Using human cleverness, we might observe that every iteration (of the outermost loop) with i being even is followed by an iteration where i is odd. So there are n/2 pairs of iterations, each doing Θ(n2) + Θ(n) ⇒ Θ(n2) work. Since n/2 is Θ(n), we get a final cost of Θ(n3).

Method calls can be handled by first analyzing the cost of the method being called. This cost should be a function of its parameters (e.g., how much data is being passed in to it).

9 Once you have determined the cost function of the method, when the method is called you can plug in the input costs. E.g., suppose you have a method InsertionSort. As its parameter, it takes an array with m elements. It will do Θ(m2) work, worst case, when invoked. worst 2 So let TISort (m) = Θ(m ). Consider code that repeatedly calls the method. // assume array A with n items

for i = 1 to log(n) create a new array B with n items // copy A into B for j = 1 to n do B[i]=A[i] call InsertionSort(B) Analyzing it: The innermost statement is an assignment and takes Θ(1) time. It’s nested inside a loop that runs n times, so the “for j” loop as a whole takes Θ(n) time. Let’s assume it takes Θ(1) to create array B. Then, the body of “for i” loop is a block of three consecutive statements: the creation of B, the “for j” loop, and the “call InsertionSort” statement. The last of these three worst things costs, when asking it to process n items, TISort (n). And we have been worst 2 2 told already that TISort (m) = Θ(m ). So the last of these 3 things costs Θ(n ). Our block has a total cost of Θ(1) + Θ(n) + Θ(n2) which simplifies to Θ(n2). Continuing to work from the inside out, we are now ready to analyze the whole “for i” loop. We’ve just determined that its body costs Θ(n2), and this cost does not depend on i. The body is repeated Θ(log n)) times, so the total cost is Θ(n2 log n). This is a worst case, and it can arise if A is sorted backwards. This illustrates a subtle issue: what if the worst-case behaviour of the called method cannot happen, according to the way that the method is actually used? Now consider what happens if we just repeatedly sort the same array, over and over. // assume array A with n items

for i = 1 to log(n) create a new array B with n items // copy A into B for j = 1 to n do B[i]=A[i] call InsertionSort(A) // only change is B --> A The first call to InsertionSort(A) can process a worst-case input and thus take Θ(n2) time. However, that puts A into sorted order, which is a best-case scenario for Insertion Sort, where the sorting is done in Θ(n) time. The proper analysis would then be that the iteration with i = 1 costs Θ(n2) and then we have Θ(log n) iterations that each cost Θ(n), for a total cost of Θ(n2) + Θ(n log n) ⇒ Θ(n2).

10 In a sense, big-Oh is a safer analysis here, because it is more-or-less correct any 2 to say that TISort(m) = O(m ) for any possible worst-case use InsertionSort. So a quick big-Oh analysis could just say that the body of the “for i” loop costs O(n2), and since the loop repeats log n times, the total cost of the code fragment is O(n2 log n). This is an example where the big-Oh answer is quickly obtained and mathematically correct, but still the bound is not tight: the better answer is Θ(n2) — but getting that answer required more work and some human insight.

8 Recursion

The analysis techniques so far do not suffice to analyze recursive algorithms. Later in the course, we will need to handle recursion. There are two major approaches: 1. Reasoning based on tracing the pattern in which the recursion unfolds, typically using a “recursion trace”.

2. Forming a mathematical “recurrence relation” that describes the running time. The recurrence relation is a mathematical object that is recursively defined in terms of itself. Fortunately, mathematicians have developed various methods for solving recurrence relations.

8.1 Recursion Trace We can make a diagram that shows how the recursion unfolds. (The Sedgewick textbook does this on page 274.) At the top (“root”) of diagram, we have a circle (“node”) that represents the first call to the recursive method. Usually, inside the circle we depict the parameter values for that first call. Beneath the root node, we have circles/nodes that represent the recursive calls made (directly) by the root node. Their left-to-right order is based on the time when the call was made (the earliest is leftmost). Inside each node, we have the parameter values passed to the recursive call. A line segment (called an edge) connects these nodes to the root. Since each node that does not represent a base-case situation will itself make some recursive calls, these second-level nodes will themselves be joined to some third-level nodes, and so forth. For an example, see Figure 2, which arises from the code in Figure 1, with the top-level call to foo(4). It may be possible to use the recursion trace to reason about the cost of the recursive call. As a simple example, for this code, the recursion trace for foo(N) has N levels. Each level has no more than twice the number of nodes than the level above it. So the total number of nodes in the recursion trace is O(2N ), which is also the total number of recursive calls made. For each non- base-case recursive call we do O(n2) looping work, where n is the parameter value for that call. But since we can reason that n < N, we can say that each of O(2N ) nodes is associated with O(N 2) work that is done. In total, we can

11 int foo(int n) { if (n <= 1) return 67; // base case int z=0; for (int i=0; i

int temp = foo(n-1); int temp2 = foo(n/2); return temp+temp2+z; }

Figure 1: Recursive method foo.

4

3 2

2 1 1 1

1 1

Figure 2: Recursion trace. conclude that the running time of foo(N) is O(N 22N ). (This result is true, but has relied on some pessimistic oversimplications.) The structure generated by the recursion trace is a “tree” structure (think about family trees) and the terminology of nodes, edges and levels is commonly used in CS. The structure is sometimes also called a “call tree”.

8.2 Recurrences A recurrence is a way of describing a mathematical function in terms of itself. Mathematicians often use them to describe the later elements in a sequence in terms of elements that came earlier in the sequence. A famous example would be the Fibonacci sequence, where the first two numbers in the sequence are 0 and 1 (although others use 1 and 1), and every later numbers in the sequence is the sum of the two numbers before it. So the sequence is 0,1,1,2,3,5,8,13,21,. . . . We can consider the function F that takes the input n and then outputs the nth number in the Fibonacci sequence. We can then describe the function as   0 , if n = 1 F (n) = 1 , if n = 2  F (n − 1) + F (n − 2) , if n > 2

This recurrence describes the behaviour of some function F , but the recursive

12 nature of it is unsatisfying. In software engineering terms, it is a specification of a function, and this specification may be met by no function, one function, or many functions 1. In ordinary algebra, the property 3x + 2 = 2x is satisfied by one value of x. However, the property x = x + 1 is not satisfied by any value of x, and the property x = x is satisfied by infinitely many values. For specifications like 3x + 2 = 2x, ordinary algebra helps you solve x to a numerical value. When a function is specified recursively, we would like our solution to be (hopefully one) function that is described non-recursively and without other pesky things like sums, often said to be a “closed-form” solution. Depending on your previous math courses, you may know that the mathemati- cians know a closed-form solution: √ !n √ !n! 1 1 + 5 1 − 5 F (n) = √ − . 5 2 2 For the example recursive method foo examined earlier, let the (unknown) function measuring the number of operations executed be called T (n), where the mathematical variable n represents the value of the programming parameter that shares the same name. (The name T is very common, because the number of operations is our model of Time.) Although we do not know a closed form solution for T (n) yet, we can still analyze the code in method foo. First, note the behaviour of the method when n ≤ 1. We see that Θ(1) operations will occur. There will be a comparison and a return, and possibly a few other minor operations. So we know to write T (n) = Θ(1) if n ≤ 1. If n is larger, then we do a loop that takes Θ(n2) operations. We also do an initialization and a return. The total cost of this is Θ(n2) + Θ(1), which simplifies to Θ(n2). But we also make two recursive calls, and we have to account for all the operations that this will lead to. Our first recursive call is with parameter n − 1. So, I need to write down an expression that represents the number of operations executed when foo is called a value of n − 1. Hmmm. Wait! I have a name already for “the number of operations executed when foo is called with some parameter value x”—it’s T (x). So the expression I need for the cost of the first recursive call is T (n−1). Similarly, my second recursive call is with the code parameter n/2. Since Java’s integer division rounds down, and in the world of math we don’t get this behaviour from “/”, the mathematical n expression representing the value of the parameter is b 2 c. Thus the cost (i.e., n number of operations) from the second recursive call is T (b 2 c). Therefore, when 2 n n > 1 we have T (n) = Θ(n ) + T (n − 1) + T (b 2 c). Putting it all together, our recurrence describing T is  Θ(1) , if n ≤ 1 T (n) = 2 n Θ(n ) + T (n − 1) + T (b 2 c) , if n > 1 Now we need to solve the recurrence, so that we can get a closed-form ex- pression for T. 1 More advanced math courses involving differential or difference equations also end up solving for unknown functions, and it is not surprising that the techniques we need can overlap with those used in differential equation courses.

13 8.3 Solving Recurrences: There are a variety of methods for solving recurrences: 1. the characteristic equation method, which is sometimes taught in CS3913 and would be comfortable to mathematicians;

2. a calculus-based approach based on “generating functions” sometimes taught to graduate students, which would also be comfortable to mathe- maticians; 3. the Master Theorem, which lets you read out an asymptotic answer for a very specific class of recurrences; 4. the Plug-and-Crunch (a.k.a Repeated Substitution) method, tedious but very general; and 5. the Recursion Tree method, a visual form of repeated substitution;

6. the “ask Maple or Wolfram ” method — some packages for doing symbolic math are able to solve many recurrences. We first look at the Master Theorem and then look at the Plug-and-Crunch approach.

Master Theorem: The Master Theorem approach is used to solve the kind of recurrences that typically arise from “divide and conquer” algorithms. Many, perhaps most, of the best-known recursive algorithms fit into this category, which will be studied in CS3913. In CS2383, Merge Sort is a classic example of a divide-and-conquer algorithm. A divide and conquer algorithm takes its input (of size n). If n is small enough (less than some constant), the algorithm does O(1) work in its base case. Otherwise, it divides its input into some number of subproblems. (Each subproblem asks us to solve the same kind of problem as the overall problem, just on less data.) Let’s suppose it divides its input into A subproblems, each of size n/B, for some constants A and B with A ≥ 2 and B ≥ 2. Each subproblem is then solved recursively. Finally, the algorithm combines the A subprobem solutions into an overall solution. Suppose the amount of work done to divide the problem into subproblems and then reassemble the solutions is Θ(nk) for some constant k. This leads to a recurrence of  O(1) , if n < c T (n) = A · T (n/B) + Θ(nk) , if n ≥ c

where constants A, B ≥ 2 and constant c ≥ 1, and k is a constant. The Master Theorem applies to recurrences of this form. To use the theorem, identify the values of A, B and k. Then compare A against Bk and read off the answer as below:

14  k k  O(n log n) , if A = B T (n) = O(nk) , if A < Bk  O(nlogB (A)) , if A > Bk For example, the Merge Sort recurrence is

 O(1) , if n < 2 T (n) = 2T (n/2) + Θ(n) , if n ≥ 2

It fits the pattern, with A = 2, B = 2, k = 1. The first case applies, so we know the running time of Merge Sort is O(n log n). Consider our first recurrence example,

 Θ(1) , if n ≤ 1 T (n) = 2 n Θ(n ) + T (n − 1) + T (b 2 c) , if n > 1 Unfortunately, our Master Theorem approach does not work, because the two subproblems are not the same size, and also the first subproblem is only 1 smaller than the original problem (it needs to be some fraction of the original problem). In truth, after the original Master Theorem was discovered and promoted in the 1970s, there have been increasing complicated versions discovered that can handle more cases. For instance, what if A is almost, but not quite, a constant? What if the algorithm creates some subproblems of size n/2 and some of size n/3? We sometimes study these more powerful versions of the Master Theorem in CS3913.

Plug and Crunch: In the plug-and-crunch method, one repeatedly substi- tutes the recurrence into itself, simplifies, and looks for patterns that emerge. It is somewhat inelegant, and textbook authors sometimes modify the basic idea with clever simplifications. The analysis on pages 273 and 274 of Sedgewick’s textbook has been cleaned up in this fashion. For Plug and Crunch, we tackle an exact recurrence without internal big-Oh or big-Theta notation. For instance, Sedgewick has an exact recurrence when determining C(N), which is related to the number of comparisons done when Merge Sort processes N items. He assumes that N is an even power of two, say N = 2n, and has

 0 , if N = 1 C(N) = C(bN/2c) + C(dN/2e) + N , if N > 1 Because N = 2n, repeatedly dividing by 2 never creates any fractions, so the floors and ceilings are not needed.

 0 , if N = 1 C(N) = 2C(N/2) + N , if N > 1 A simple minded plug and crunch for large N then writes

15 C(N) = 2C(N/2) + N and then we start substituting for occurrences of C on the right hand side. Since the recurrence is generally true for all values greater than one, we can replace C(N/2) with 2C((N/2)/2) + (N/2), because it essentially says C(∗) = 2C(∗/2) + ∗ and we are filling in * with N/2. After the substitution (plug), we have

C(N) = 2[2C((N/2)/2) + (N/2)] + N where the part in brackets is the result of the substitution. We simplify (crunch) this to get

C(N) = 4C(N/4) + N + N And then we can plug into the C(N/4), filling in * by N/4:

C(N) = 4[2C((N/4)/2) + (N/4)] + N + N then crunch it to

C(N) = 8C(N/8) + N + N + N . Another plug ( * is N/8) and crunch gives

C(N) = 16C(N/16) + N + N + N + N . At this point, a pattern should be obvious: after k plug-and-crunch steps, we will have

C(N) = 2kC(N/2k) + kN . Now, eventually, we will have done so many substitutions that we will hit our base case of C(1) = 0. This happens when N/2k = 1; i.e., N = 2k. At this point, we would have

C(N) = 2kC(1) + kN = 2k · 0 + kN = kN k Since N = 2 , we know k = log2 N, from which we conclude

C(N) = log2(N) · N . The approach used by Sedgewick is a less obvious, cleaned-up version of the work shown above.

16 Getting an Exact Recurrence: If you have a running-time recurrence, it is likely to have internal big-Oh or big-Theta expressions. Usually, we solve a “re- lated” exact relation. If there is an internal Θ(nk), one would normally replace it by cnk, after looking carefully around to make sure no real mathematicians are in sight. If you are absolutely certain there are no mathematicians in the area, you might even replace Θ(nk) by nk. After getting an exact solution, if you are especially bold you might slap a Theta around it and simplify. There’s some chance that you will get the right answer. . . but to do this correctly, you would need to justify each step, and I suspect that sometimes these moves cannot be justified.

Checking a Guess: The plug-and-crunch approach is not always safe, as need to guess a pattern, and you might guess wrong. However, if you have a guessed solution (no matter how obtained), you can check its correctness easily: a guessed “solution” is correct if it satisfies the specification (the recurrence). Otherwise, it’s wrong. . . First, let’s check that the solution obtained above is correct: C(1) is supposed to be 0. Plug in our guessed C of N log2 N, getting C(1) = 1 log2 1 = 1 · 0 = 0. So far, so good. The second part of the specification says that

C(N) = 2C(N/2) + N when N is a larger power of 2.

Plugging in our guessed C, we get

N log N = 2[(N/2) log2(N/2)] + N and after some simplification:

= N log2(N/2) + N

= N(log2 N − log2 2) + N

= N(log2 N − 1) + N

= N log2 N − N + N

= N log2 N So the left and right hand sides are indeed equal, and we have satisfied the second part of the specification. Our guessed solution works. On the other hand, if you guess incorrectly, the specification won’t be met. For example, let’s suppose I guessed C(N) = N − 1 as a solution. For the first part of the specification, we need C(1) = 0. Since C(1) = 1 − 1, the first part of the specification is met.

17 However, for the second part of the specification, we need

C(N) = 2C(N/2) + N when N is a larger power of 2.

Let’s try it:

N − 1 = 2(N/2 − 1) + N

= N − 2 + N = 2N − 2 . However, it is not the case that N − 1 = 2N − 2 whenever N is a larger power of 2. For instance, if N = 4, N − 1 = 3 but 2N − 2 = 6. Since the second part of the specification is not met, we know my “solution” is wrong.

18