Algorithm Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Analysis of Algorithms Input Algorithm Output © 2015 Goodrich and Tamassia Analysis of Algorithms 1 Scalability q Scientists often have to deal with differences in scale, from the microscopically small to the astronomically large. q Computer scientists must also deal with scale, but they deal with it primarily in terms of data volume rather than physical object size. q Scalability refers to the ability of a system to gracefully accommodate growing sizes of inputs or amounts of workload. © 2015 Goodrich and Tamassia Analysis of Algorithms 2 Application: Job Interviews q High technology companies tend to ask questions about algorithms and data structures during job interviews. q Algorithms questions can be short but often require critical thinking, creative insights, and subject knowledge. n All the “Applications” exercises in Chapter 1 of the Goodrich- Tamassia textbook are taken from reports of actual job interview questions. xkcd “Labyrinth Puzzle.” http://xkcd.com/246/ Used with permission under Creative Commons 2.5 License. © 2015 Goodrich and Tamassia Analysis of Algorithms 3 Algorithms and Data Structures q An algorithm is a step-by-step procedure for performing some task in a finite amount of time. n Typically, an algorithm takes input data and produces an output based upon it. Input Algorithm Output q A data structure is a systematic way of organizing and accessing data. © 2015 Goodrich and Tamassia Analysis of Algorithms 4 Running Time q Most algorithms transform best case input objects into output average case worst case objects. 120 q The running time of an 100 algorithm typically grows 80 with the input size. 60 q Average case time is often difficult to determine. 40 Running Time Running 20 q We focus primarily on the worst case running time. 0 1000 2000 3000 4000 n Easier to analyze Input Size n Crucial to applications such as games, finance and robotics © 2015 Goodrich and Tamassia Analysis of Algorithms 5 Experimental Studies 9000 8000 7000 q Write a program implementing the 6000 algorithm 5000 q Run the program with 4000 inputs of varying size Time(ms) 3000 and composition, noting the time 2000 needed: 1000 q Plot the results 0 0 50 100 Input Size © 2015 Goodrich and Tamassia Analysis of Algorithms 6 Limitations of Experiments q It is necessary to implement the algorithm, which may be difficult q Results may not be indicative of the running time on other inputs not included in the experiment. q In order to compare two algorithms, the same hardware and software environments must be used © 2015 Goodrich and Tamassia Analysis of Algorithms 7 Theoretical Analysis q Uses a high-level description of the algorithm instead of an implementation q Characterizes running time as a function of the input size, n q Takes into account all possible inputs q Allows us to evaluate the speed of an algorithm independent of the hardware/ software environment © 2015 Goodrich and Tamassia Analysis of Algorithms 8 Pseudocode q High-level description of an algorithm q More structured than English prose q Less detailed than a program q Preferred notation for describing algorithms q Hides program design issues © 2015 Goodrich and Tamassia Analysis of Algorithms 9 Pseudocode Details q Control flow q Method call n if … then … [else …] method (arg [, arg…]) n while … do … q Return value n repeat … until … return expression n for … do … q Expressions: n Indentation replaces braces ← Assignment q Method declaration = Equality testing Algorithm method (arg [, arg…]) Input … n2 Superscripts and other Output … mathematical formatting allowed © 2015 Goodrich and Tamassia Analysis of Algorithms 10 The Random Access Machine (RAM) Model CPU A RAM consists of q A CPU q An potentially unbounded bank 2 of memory cells, each of which 1 can hold an arbitrary number or 0 character q Memory cells are numbered and accessing any cell in memory Memory takes unit time © 2015 Goodrich and Tamassia Analysis of Algorithms 11 Seven Important Functions q Seven functions that often appear in algorithm 1E+30 1E+28 Cubic analysis: 1E+26 n Constant ≈ 1 1E+24 Quadratic 1E+22 n Logarithmic ≈ log n Linear 1E+20 n Linear ≈ n 1E+18 ) n 1E+16 N-Log-N ≈ n log n n ( 2 1E+14 n Quadratic n T ≈ 1E+12 3 n Cubic ≈ n 1E+10 n n Exponential ≈ 2 1E+8 1E+6 1E+4 q In a log-log chart, the 1E+2 slope of the line 1E+0 corresponds to the 1E+0 1E+2 1E+4 1E+6 1E+8 1E+10 growth rate n © 2015 Goodrich and Tamassia Analysis of Algorithms 12 Slide by Matt Stallmann Functions Graphed included with permission. Using “Normal” Scale g(n) = n lg n g(n) = 1 g(n) = 2n g(n) = n2 g(n) = lg n g(n) = n g(n) = n3 © 2015 Goodrich and Tamassia Analysis of Algorithms 13 Primitive Operations q Basic computations q Examples: performed by an algorithm n Evaluating an q Identifiable in pseudocode expression q Largely independent from the n Assigning a value to a variable programming language n Indexing into an q Exact definition not important array (we will see why later) n Calling a method n Returning from a q Assumed to take a constant method amount of time in the RAM model © 2015 Goodrich and Tamassia Analysis of Algorithms 14 Counting Primitive Operations q Example: By inspecting the pseudocode, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size © 2015 Goodrich and Tamassia Analysis of Algorithms 15 Estimating Running Time q Algorithm arrayMax executes 7n - 2 primitive operations in the worst case, 5n in the best case. Define: a = Time taken by the fastest primitive operation b = Time taken by the slowest primitive operation q Let T(n) be worst-case time of arrayMax. Then a(5n) ≤ T(n) ≤ b(7n - 2) q Hence, the running time T(n) is bounded by two linear functions © 2015 Goodrich and Tamassia Analysis of Algorithms 16 Growth Rate of Running Time q Changing the hardware/ software environment n Affects T(n) by a constant factor, but n Does not alter the growth rate of T(n) q The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax © 2015 Goodrich and Tamassia Analysis of Algorithms 17 Slide by Matt Stallmann included with permission. Why Growth Rate Matters if runtime time for n + 1 time for 2 n time for 4 n is... c lg n c lg (n + 1) c (lg n + 1) c(lg n + 2) c n c (n + 1) 2c n 4c n ~ c n lg n 2c n lg n + 4c n lg n + runtime c n lg n + c n 2cn 4cn quadruples when c n2 ~ c n2 + 2c n 4c n2 16c n2 problem size doubles c n3 ~ c n3 + 3c n2 8c n3 64c n3 c 2n c 2 n+1 c 2 2n c 2 4n © 2015 Goodrich and Tamassia Analysis of Algorithms 18 Analyzing Recursive Algorithms q Use a function, T(n), to derive a recurrence relation that characterizes the running time of the algorithm in terms of smaller values of n. © 2015 Goodrich and Tamassia Analysis of Algorithms 19 Constant Factors 1E+26 Quadratic q The growth rate is 1E+24 1E+22 Quadratic minimally affected by 1E+20 Linear n constant factors or 1E+18 Linear 1E+16 n lower-order terms ) 1E+14 n ( q Examples T 1E+12 1E+10 2 5 n 10 n + 10 is a linear 1E+8 function 1E+6 5 2 8 1E+4 n 10 n + 10 n is a 1E+2 quadratic function 1E+0 1E+0 1E+2 1E+4 1E+6 1E+8 1E+10 n © 2015 Goodrich and Tamassia Analysis of Algorithms 20 Big-Oh Notation 10,000 q Given functions f(n) and 3n g(n), we say that f(n) is 1,000 2n+10 O(g(n)) if there are positive constants n 100 c and n0 such that f(n) ≤ cg(n) for n ≥ n0 10 q Example: 2n + 10 is O(n) n 2n + 10 ≤ cn 1 n (c - 2) n ≥ 10 1 10 100 1,000 n n ≥ 10/(c - 2) n n Pick c = 3 and n0 = 10 © 2015 Goodrich and Tamassia Analysis of Algorithms 21 Big-Oh Example 1,000,000 n^2 q Example: the function 100n 2 100,000 n is not O(n) 10n 2 n n ≤ cn 10,000 n n n ≤ c n The above inequality 1,000 cannot be satisfied since c must be a 100 constant 10 1 1 10 100 1,000 n © 2015 Goodrich and Tamassia Analysis of Algorithms 22 More Big-Oh Examples q 7n - 2 7n-2 is O(n) need c > 0 and n0 ≥ 1 such that 7 n - 2 ≤ c n for n ≥ n0 this is true for c = 7 and n0 = 1 q 3 n3 + 20 n2 + 5 3 n3 + 20 n2 + 5 is O(n3) 3 2 3 need c > 0 and n0 ≥ 1 such that 3 n + 20 n + 5 ≤ c n for n ≥ n0 this is true for c = 4 and n0 = 21 q 3 log n + 5 3 log n + 5 is O(log n) need c > 0 and n0 ≥ 1 such that 3 log n + 5 ≤ c log n for n ≥ n0 this is true for c = 8 and n0 = 2 © 2015 Goodrich and Tamassia Analysis of Algorithms 23 Big-Oh and Growth Rate q The big-Oh notation gives an upper bound on the growth rate of a function q The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n) q We can use the big-Oh notation to rank functions according to their growth rate f(n) is O(g(n)) g(n) is O(f(n)) g(n) grows more Yes No f(n) grows more No Yes Same growth Yes Yes © 2015 Goodrich and Tamassia Analysis of Algorithms 24 Big-Oh Rules q If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e., 1.