Compsci 201, Mathematical & Emprical

Owen Astrachan Jeff Forbes September 27, 2017

Compsci 201, Fall 2017, 9/22/17 1 Analysis+Markov I is for …

• Invariant • Reasoning about your • Interface • MarkovModel implements MarkovInterface • Inheritance • EfficientMarkov extends MarkovModel • Identity • You’ a .

9/27/17 Compsci 201, Fall 2017, Analysis 2 Plan for the Week

• Empirical & of • Big-Oh basics • from code

• Code in https://coursework.cs.duke.edu/201fall17/classwork

• Towards Test #1

9/27/17 Compsci 201, Fall 2017, Analysis 3 Computer

• Scientific Method • Observe some feature of the natural world. • Hypothesize a model that is consistent with the observations. • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by repeating until the hypothesis and observations agree. • Principles • Experiments we must be reproducible; hypothesis must be falsifiable. • In CompSci 201: • Empirical & mathematical analysis

9/27/17 Compsci 201, Fall 2017, Analysis 4 Scientific Method • . Framework for comparing algorithms and predicting performance.

• Scientific method. • Observe some feature of the natural world. • Hypothesize a model that is consistent with the observations. • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by repeating until the hypothesis and observations agree. • Principles. Experiments we design must be reproducible; hypothesis must be falsifiable. •

5 • Empirical & mathematical analysis Dropping Glass Balls

• Tower with 100 Floors • Given 2 glass balls • Want to determine the lowest floor from which a ball can be dropped and will break • How?

• Is your the most efficient one? • Generalize to n floors

9/27/17 CompSci 201, Fall 2017, Analysis 6 Glass balls continued http://bit.ly/CS201-f17-0927-0

• Assume the of floors is 100 • In the worst case, how many • In the best case how many balls balls will I have to drop? will I have to drop to determine the lowest floor where a ball will break?

If there are n floors, how many balls will you have to drop? (roughly) What is big-Oh about? (preview)

• Intuition: avoid details when they don’t matter, and they don’t matter when input size (N) is big enough • For , use only leading term, ignore coefficients

y = 3x y = 6x - 2 y = 15x + 44 y = x2 y = x2 - 6x+ 9 y = 3x2 + 4x

• The first family is O(n), the second is O(n2) • Intuition: family of curves, generally the same shape • More formally: O(f(n)) is an upper-bound, when n is large enough the expression cf(n) is larger • Intuition: linear : double input, double time, quadratic function: double input, quadruple the time More on O-notation, big-Oh • Big-Oh hides/obscures some empirical analysis, but is good for general description of algorithm • Allows us to compare algorithms in the • 20N hours vs N2 microseconds: which is better? • O-notation is an upper-bound, this that N is O(N), but it is also O(N2); we try to provide tight bounds. Formally: cf(N) • g(N) ∈ O(f(N)) iff there exist constants g(N) and n0 such that for all g(N) < cf(N), N > n x = n0 Rank orders of growth

• n4 grows faster than n2 • n4 ∉ O(n2)

• 0.001n4 is in the same growth class as 1E6n4 • 0.001n4, 1E6n4 ∈ O(n4) http://bit.ly/201-f17-0927-1 Reasoning about growth

• Consider a 3-tower

1. How tall is a 5-tower? 2. How tall is a 10 tower? 3. How many blocks in a 5-tower? 4. Which best captures the height of an n-tower?

http://bit.ly/201-f17-0927-2

9/27/17 CompSci 201, Fall 2017, Analysis 11 Three-Sum

• Given N , find triples that sum to 0. • Deeply related to problems in computational .

public class ThreeSum {

// return number of distinct triples (i, j, k) // such that (a[i] + a[j] + a[k] == 0) public static int count(int[] a) { int N = a.; int cnt = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) for (int k = j+1; k < N; k++) if (a[i] + a[j] + a[k] == 0) cnt++; return cnt; } }

12 Empirical Analysis • Empirical analysis. Run the program for various input sizes.

N time † 512 0.03 1024 0.26 2048 2.16 4096 17.18 8192 136.76

† Running Linux on Sun-Fire-X4100 with 16GB RAM • How much time for N = 4096 on my machine? • How much could I do in an minute, hour, day?

13 Empirical Analysis

analysis. Plot running time vs. input size N.

14 Mathematical Analysis

• Count up frequency of execution of each instruction and weight by its execution time.

int count = 0; how many times is each for (int i = 0; i < N; i++) instruction executed? if (a[i] == 0) count++;

int count = 0; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) if (a[i] + a[j] == 0) count++;

int count = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) if (a[i] + a[j] == 0) count++; 15 Three Sum Analysis • Mathematical analysis. • The running time is proportional to N 3.

• Focus on instructions in "inner loop."

16 Order of Growth Classifications

• Observation. A small of mathematical functions suffice to describe running time of many fundamental algorithms. public void g(int N) { if (N == 0) return;

N log2 N g(N/2); while (N > 1) { log2 N g(N/2); N = N / 2; for (int i = 0; i < N; i++ ... ) } ... } N for (int i = 0; i < N; i++) ... public void f(int N) { N if (N == 0) return; for (int i = 0; i < N; i++) 2 N2 for (int j = 0; j < N; j++) f(N-1); ... f(N-1); ... 17 } Big-Oh calculations from code

• Search for in an array: • What is of code (using O-notation)? • What if array doubles, what happens to time? for(int k=0; k < a.length; k++) if (a[k].equals(target)) return true; return false;

• Complexity if we call N times on M-element vector? • What about best case? Average case? Worst case? IsomorphicWords

• Consider code from the solution to IsomophicWords:

int total = 0; for(int j=0; j < words.length; j++) { for(int k=j+1; k < words.length; k++) { if (isomorphic(words[j],words[k])) { total += 1; } } } return total; • What is the input size? What does the runtime depend on? • What’s the big-Oh for the run-time?

9/27/17 CompSci 201, Fall 2017, Analysis 19 Array vs. ArrayList

• Run the code ArrayVsArrayList.java • https://coursework.cs.duke.edu/201fall17/clas swork/blob/master/src/ArrayVsArrayList.java

• Change the value of argument

• Submit your data here: http://bit.ly/201-f17-0927-3 • Submit as many times as you want Amortization: Expanding ArrayLists • Expand capacity of list when add() called • Calling add N times, doubling capacity as needed Item # Resizing cost Cumulative Resizing Cost Capacity After cost per item add 1 0 0 0 1 2 2 2 1 2 3-4 4 6 1.5 4 5-8 8 14 1.75 8 ...

2m+1 - 2m+1 2 m+1 2m+2-2 around 2 2m+1 • Big-Oh of adding n elements? • What if we grow size by one each time? Some helpful mathematics

• 1 + 2 + 3 + 4 + … + N • N(N+1)/2, exactly = N2/2 + N/2 which is O(N2) why? • N + N + N + …. + N (total of N times) • N*N = N2 which is O(N2) • N + N + N + …. + N + … + N + … + N (total of 3N times) • 3N*N = 3N2 which is O(N2) • 1 + 2 + 4 + … + 2N • 2N+1 – 1 = 2 x 2N – 1 which is O(2N )

• Impact of last statement on adding 2N+1 elements to a vector • 1 + 2 + … + 2N + 2N+1 = 2N+2-1 = 4x2N-1 which is O(2N) resizing + copy = total (let x = 2N) Running times @ 109 instructions/sec

N O(log N) O(N) O(N log N) O(N2)

10 3E-9 1E-8 3.3E-8 0.0000001

100 7E-9 1E-7 6.64E-7 0.0001

1,000 1E-8 1E-6 0.00001 0.001

10,000 1.3E-8 0.00001 0.0001329 0.102

100,000 1.7E-8 0.0001 0.001661 10.008

1,000,000 0.00000002 0.001 0.0199 16.7 min

1,000,000,000 0.00000003 1.002 65.8 3.18 centuries Analysis: Empirical vs. Mathematical

• Empirical analysis. • running times, plot, and fit curve. • Easy to perform experiments. • Model useful for predicting, but not for explaining.

• Mathematical analysis. • Analyze algorithm to estimate # ops as a function of input size. • May require advanced mathematics. • Model useful for predicting and explaining.

• Critical difference. Mathematical analysis is independent of a particular machine or compiler; applies to machines not yet built.

24