Compsci 201, Mathematical & Emprical Analysis
Owen Astrachan Jeff Forbes September 27, 2017
Compsci 201, Fall 2017, 9/22/17 1 Analysis+Markov I is for …
• Invariant • Reasoning about your code • Interface • MarkovModel implements MarkovInterface
9/27/17 Compsci 201, Fall 2017, Analysis 2 Plan for the Week
• Empirical & mathematical analysis of algorithms • Big-Oh basics • Calculations from code
• Code in https://coursework.cs.duke.edu/201fall17/classwork
• Towards Test #1
9/27/17 Compsci 201, Fall 2017, Analysis 3 Computer Science
• Scientific Method • Observe some feature of the natural world. • Hypothesize a model that is consistent with the observations. • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by repeating until the hypothesis and observations agree. • Principles • Experiments we design must be reproducible; hypothesis must be falsifiable. • In CompSci 201: • Empirical & mathematical analysis
9/27/17 Compsci 201, Fall 2017, Analysis 4 Scientific Method • Analysis of algorithms. Framework for comparing algorithms and predicting performance.
• Scientific method. • Observe some feature of the natural world. • Hypothesize a model that is consistent with the observations. • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by repeating until the hypothesis and observations agree. • Principles. Experiments we design must be reproducible; hypothesis must be falsifiable. • Computer Science
5 • Empirical & mathematical analysis Dropping Glass Balls
• Tower with 100 Floors • Given 2 glass balls • Want to determine the lowest floor from which a ball can be dropped and will break • How?
• Is your algorithm the most efficient one? • Generalize to n floors
9/27/17 CompSci 201, Fall 2017, Analysis 6 Glass balls continued http://bit.ly/CS201-f17-0927-0
• Assume the number of floors is 100 • In the worst case, how many • In the best case how many balls balls will I have to drop? will I have to drop to determine the lowest floor where a ball will break?
If there are n floors, how many balls will you have to drop? (roughly) What is big-Oh about? (preview)
• Intuition: avoid details when they don’t matter, and they don’t matter when input size (N) is big enough • For polynomials, use only leading term, ignore coefficients
y = 3x y = 6x - 2 y = 15x + 44 y = x2 y = x2 - 6x+ 9 y = 3x2 + 4x
• The first family is O(n), the second is O(n2) • Intuition: family of curves, generally the same shape • More formally: O(f(n)) is an upper-bound, when n is large enough the expression cf(n) is larger • Intuition: linear function: double input, double time, quadratic function: double input, quadruple the time More on O-notation, big-Oh • Big-Oh hides/obscures some empirical analysis, but is good for general description of algorithm • Allows us to compare algorithms in the limit • 20N hours vs N2 microseconds: which is better? • O-notation is an upper-bound, this means that N is O(N), but it is also O(N2); we try to provide tight bounds. Formally: cf(N) • g(N) ∈ O(f(N)) iff there exist constants g(N) c and n0 such that for all g(N) < cf(N), N > n x = n0 Rank orders of growth
• n4 grows faster than n2 • n4 ∉ O(n2)
• 0.001n4 is in the same growth class as 1E6n4 • 0.001n4, 1E6n4 ∈ O(n4) http://bit.ly/201-f17-0927-1 Reasoning about growth
• Consider a 3-tower
1. How tall is a 5-tower? 2. How tall is a 10 tower? 3. How many blocks in a 5-tower? 4. Which best captures the height of an n-tower?
http://bit.ly/201-f17-0927-2
9/27/17 CompSci 201, Fall 2017, Analysis 11 Three-Sum
• Given N integers, find triples that sum to 0. • Deeply related to problems in computational geometry.
public class ThreeSum {
// return number of distinct triples (i, j, k) // such that (a[i] + a[j] + a[k] == 0) public static int count(int[] a) { int N = a.length; int cnt = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) for (int k = j+1; k < N; k++) if (a[i] + a[j] + a[k] == 0) cnt++; return cnt; } }
12 Empirical Analysis • Empirical analysis. Run the program for various input sizes.
N time † 512 0.03 1024 0.26 2048 2.16 4096 17.18 8192 136.76
† Running Linux on Sun-Fire-X4100 with 16GB RAM • How much time for N = 4096 on my machine? • How much could I do in an minute, hour, day?
13 Empirical Analysis
• Data analysis. Plot running time vs. input size N.
14 Mathematical Analysis
• Count up frequency of execution of each instruction and weight by its execution time.
int count = 0; how many times is each for (int i = 0; i < N; i++) instruction executed? if (a[i] == 0) count++;
int count = 0; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) if (a[i] + a[j] == 0) count++;
int count = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) if (a[i] + a[j] == 0) count++; 15 Three Sum Analysis • Mathematical analysis. • The running time is proportional to N 3.
• Focus on instructions in "inner loop."
16 Order of Growth Classifications
• Observation. A small subset of mathematical functions suffice to describe running time of many fundamental algorithms. public void g(int N) { if (N == 0) return;
N log2 N g(N/2); while (N > 1) { log2 N g(N/2); N = N / 2; for (int i = 0; i < N; i++ ... ) } ... } N for (int i = 0; i < N; i++) ... public void f(int N) { N if (N == 0) return; for (int i = 0; i < N; i++) 2 N2 for (int j = 0; j < N; j++) f(N-1); ... f(N-1); ... 17 } Big-Oh calculations from code
• Search for element in an array: • What is complexity of code (using O-notation)? • What if array doubles, what happens to time? for(int k=0; k < a.length; k++) if (a[k].equals(target)) return true; return false;
• Complexity if we call N times on M-element vector? • What about best case? Average case? Worst case? IsomorphicWords
• Consider code from the solution to IsomophicWords:
int total = 0; for(int j=0; j < words.length; j++) { for(int k=j+1; k < words.length; k++) { if (isomorphic(words[j],words[k])) { total += 1; } } } return total; • What is the input size? What does the runtime depend on? • What’s the big-Oh for the run-time?
9/27/17 CompSci 201, Fall 2017, Analysis 19 Array vs. ArrayList
• Run the code ArrayVsArrayList.java • https://coursework.cs.duke.edu/201fall17/clas swork/blob/master/src/ArrayVsArrayList.java
• Change the value of argument
• Submit your data here: http://bit.ly/201-f17-0927-3 • Submit as many times as you want Amortization: Expanding ArrayLists • Expand capacity of list when add() called • Calling add N times, doubling capacity as needed Item # Resizing cost Cumulative Resizing Cost Capacity After cost per item add 1 0 0 0 1 2 2 2 1 2 3-4 4 6 1.5 4 5-8 8 14 1.75 8 ...
2m+1 - 2m+1 2 m+1 2m+2-2 around 2 2m+1 • Big-Oh of adding n elements? • What if we grow size by one each time? Some helpful mathematics
• 1 + 2 + 3 + 4 + … + N • N(N+1)/2, exactly = N2/2 + N/2 which is O(N2) why? • N + N + N + …. + N (total of N times) • N*N = N2 which is O(N2) • N + N + N + …. + N + … + N + … + N (total of 3N times) • 3N*N = 3N2 which is O(N2) • 1 + 2 + 4 + … + 2N • 2N+1 – 1 = 2 x 2N – 1 which is O(2N )
• Impact of last statement on adding 2N+1 elements to a vector • 1 + 2 + … + 2N + 2N+1 = 2N+2-1 = 4x2N-1 which is O(2N) resizing + copy = total (let x = 2N) Running times @ 109 instructions/sec
N O(log N) O(N) O(N log N) O(N2)
10 3E-9 1E-8 3.3E-8 0.0000001
100 7E-9 1E-7 6.64E-7 0.0001
1,000 1E-8 1E-6 0.00001 0.001
10,000 1.3E-8 0.00001 0.0001329 0.102
100,000 1.7E-8 0.0001 0.001661 10.008
1,000,000 0.00000002 0.001 0.0199 16.7 min
1,000,000,000 0.00000003 1.002 65.8 3.18 centuries Analysis: Empirical vs. Mathematical
• Empirical analysis. • Measure running times, plot, and fit curve. • Easy to perform experiments. • Model useful for predicting, but not for explaining.
• Mathematical analysis. • Analyze algorithm to estimate # ops as a function of input size. • May require advanced mathematics. • Model useful for predicting and explaining.
• Critical difference. Mathematical analysis is independent of a particular machine or compiler; applies to machines not yet built.
24