Chapter 4 Algorithm Analysis

Chapter 4 Algorithm Analysis In this chapter we look at how to analyze the cost of algorithms in a way that is useful for comparing algorithms to determine which is likely to be better, at least for large input sizes, but is abstract enough that we don’t have to look at minute details of compilers and machine architectures. There are a several parts of such analysis. Firstly we need to abstract the cost from details of the compiler or machine. Secondly we have to decide on a concrete model that allows us to formally define the cost of an algorithm. Since we are interested in parallel algorithms, the model needs to consider parallelism. Thirdly we need to understand how to analyze costs in this model. These are the topics of this chapter. 4.1 Abstracting Costs When we analyze the cost of an algorithm formally, we need to be reasonably precise about the model we are performing the analysis in. Question 4.1. How precise should this model be? For example, would it help to know the exact running time for each instruction? The model can be arbitrarily precise but it is often helpful to abstract away from some details such as the exact running time of each (kind of) instruction. For example, the model can posit that each instruction takes a single step (unit of time) whether it is an addition, a division, or a memory access operation. Some more advanced models, which we will not consider in this class, separate between different classes of instructions, for example, a model may require analyzing separately calculation (e.g., addition or a multiplication) and communication (e.g., memory read). Question 4.2. Why would abstracting away from exact running times help? 39 40 CHAPTER 4. ALGORITHM ANALYSIS There are two ways in which such abstraction helps: simplicity and portability. Abstraction simplifies analysis and makes the analysis applicable to many different actual hardware and machines, rather that just one or a few, that are consistent with the model. A model too precise can hurt portability because machines are all different. Example 4.3. Consider the following function: 1 fun add (x:int,y:int) = x+y This ML function, when invoked with two integers performs 1 unit of work in a model where all operations take unit time. Note that the cost is independent of the input. Question 4.4. What happens when cost depends on the input but we don’t know the input? Cost can depend on the input. Usually, however, all we need is the input size, which we can use for our analysis. Input size is usually written n, m, etc.. Example 4.5. Consider the following function: 1 datatype tree = Leaf of int | Node of tree * tree 2 fun addTree (t: tree) = 3 case t of 4 Leaf x x ) 5 | Node(l,r) addTree (l) + addTree (r) ) This ML function, when invoked with a tree of n internal nodes and m leaves, performs 4n + m work in a model where case statement, function call, and the add operations all take unit time. If we also count returning from a function call as unit time, the work is 5n + 2m. Since in a (full) binary tree m = n + 1, the work can be written just in terms of n as 7n + 1. Question 4.6. How do we know whether we need to count the “return” or not? At the level of abstraction that we write our algorithms in this course, certain “hidden” instructions such as the return instruction can be executed. The existence of such instructions can be difficult to predict, without using a lower level of abstraction. Fortunately, for our purposes, asymptotic analysis will be sufficient. Instead of performing an exact analysis, we will instead analyze asymptotic costs, e.g., using big-O notation. 4.2. ASYMPTOTIC COMPLEXITY 41 Example 4.7. Consider the following function 1 fun add (x:int,y:int) = x+y This ML function performs Θ(1) work. Remark 4.8. We assume you have seen big-O, big-Theta (Θ), and big-Omega (Ω) in a previous class. If not there is a quick review in the next section. Example 4.9. Consider again the function in Example 4.5. when invoked with a tree of n internal nodes and m leaves, performs Θ(n + m) work in a model where case statement, function call, and the add operations all take unit time. Since in a binary tree m = n + 1, the work is Θ(n). Asymptotic analysis thus adds another level of abstraction, making our analysis further removed from the actual machine. The advantages continue to hold. Asymptotic analysis further enables comparing algorithms in terms of how they scale to large inputs. Example 4.10. Some sorting algorithms have Θ(n log n) work and others Θ(n2). Clearly the Θ(n log n) algorithm scales better. The Θ(n2) algorithm, however, can be more efficient on small inputs. In this class we are concerned with how algorithms scale, and therefore asymptotic analysis is indeed what we want. Because we are using asymptotic analysis the exact constants in the model do not matter, but what matters is that the asymptotic costs are well defined. 4.2 Asymptotic Complexity What is O(f(n))? Precisely, O(f(n)) is the set of all functions that are asymptotically dominated by the function f(n). Intuitively this means that these are functions that grow at the same or slower rate than f(n) as n goes to infinity. We write g(n) O(f(n)) to refer to a function g(n) that is in the set O(f(n)). We often use the abusive notation2 g(n) = O(f(n)) as well. In an expression such as 4W (n=2) + O(n), the O(n) refers to some function g(n) O(n). 2 Question 4.11. Do you remember the definition of O( )? · 42 CHAPTER 4. ALGORITHM ANALYSIS For a function g(n), we say that g(n) O(f(n)) if there exist positive constants n0 and c 2 such that for all n n0, we have g(n) c f(n). ≥ ≤ · If g(n) is a finite function (g(n) in finite for all n), then it follows that there exist constants k1 and k2 such that for all n 1, ≥ g(n) k1 f(n) + k2; ≤ · Pn0 where, for example, we can take k1 = c and k2 = g(i) . i=1 j j Remark 4.12. Make sure to become very comfortable with asymptotic analysis. Also its different versions such as the Θ( ) and Ω( ). · · Exercise 4.13. Can you illustrate graphically when g(n) O(f(n))? Show different examples, to hone your understanding. 2 4.3 Defining a Cost Model Although we have decided to use asymptotic analysis so we can ignore details about the imple- mentation, we still need to be reasonably precise about the definition of the cost model. This is because differences in the model can make asymptotic differences in performance. There are two standard ways to define cost models: machine based and language based. In a machine-based model, costs is defined by counting the number of instructions taken by a particular machine. In a language-based model, cost is defined by composing costs across various programming language constructs. Both types can be applied to analyzing either sequential or parallel algorithms. Question 4.14. What are the advantages of using a machine based and a language- based model? When using a machine model, we have to reason about how a program (the algorithm) compiles and runs on that machine. For sequential programs this can be straightforward when using low-level languages such as C since there is an almost one-to-one mapping of statements in the language to machine instructions. For higher-level languages it becomes somewhat trickier— there might be uncertainties, for example, about the cost of automatic memory management, or the cost of dispatching in an object-oriented language. For parallel programs it becomes even trickier since it can require reasoning about how tasks are scheduled on the processors. When analyzing algorithms in a language-based model we don’t need to care about how the language compiles or runs on the machine. Costs are defined directly in the language. On the other hand we probably do want to have a sense that the costs in the model have some relevance when we run the code on a real machine. We get back to this issue in Section 4.4. We note that 4.3. DEFINING A COST MODEL 43 in the same way that a machine model does not need to be a specific machine such as an Intel Nehalem or AMD Phenom, the language doesn’t need to be a specific language such as C++ or Java. Instead it can be an abstract language such as the lambda calculus. The lamba calculus is effectively what we use in this book, with some added syntactic sugar, but you don’t need to know much about the lamba calculus to follow the model. Traditionally with sequential algorithms machine models have been preferred since the mapping from language constructs to machine cost (time or number of instructions) have been mostly obvious. In this course, however, we will use a language-based cost model since the mapping is not so obvious, and because it allows us to use abstract costs, work and span, which have no direct meaning on a physical machine. We first review the traditional sequential machine model.

Chapter 4 Algorithm Analysis

Introduction to Parallel Computing, 2Nd Edition

Oblivious Network RAM and Leveraging Parallelism to Achieve Obliviousness

An External-Memory Work-Depth Model and Its Applications to Massively Parallel Join Algorithms

Introduction to Parallel Computing

Parallel Functional Arrays

Oblivious Parallel RAM and Applications

Parallel Programming

Oblivious Network

Effective Data Parallel Computing on Multicore Processors

Parallel Computation Models Parallel Computation Models

A Practical Hierarchial Model of Parallel Computation: the Model

The Choice of Hybrid Architectures, a Realistic Strategy to Reach the Exascale Julien Loiseau