Compsci 201, Mathematical & Emprical Analysis

Compsci 201, Mathematical & Emprical Analysis

Compsci 201, Mathematical & Emprical Analysis Owen Astrachan Jeff Forbes September 27, 2017 Compsci 201, Fall 2017, 9/22/17 1 Analysis+Markov I is for … • Invariant • Reasoning about your code • Interface • MarkovModel implements MarkovInterface<String> • Inheritance • EfficientMarkov extends MarkovModel • Identity • You’re a computer scientist. 9/27/17 Compsci 201, Fall 2017, Analysis 2 Plan for the Week • Empirical & mathematical analysis of algorithms • Big-Oh basics • Calculations from code • Code in https://coursework.cs.duke.edu/201fall17/classwork • Towards Test #1 9/27/17 Compsci 201, Fall 2017, Analysis 3 Computer Science • Scientific Method • Observe some feature of the natural world. • Hypothesize a model that is consistent with the observations. • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by repeating until the hypothesis and observations agree. • Principles • Experiments we design must be reproducible; hypothesis must be falsifiable. • In CompSci 201: • Empirical & mathematical analysis 9/27/17 Compsci 201, Fall 2017, Analysis 4 Scientific Method • Analysis of algorithms. Framework for comparing algorithms and predicting performance. • Scientific method. • Observe some feature of the natural world. • Hypothesize a model that is consistent with the observations. • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by repeating until the hypothesis and observations agree. • Principles. Experiments we design must be reproducible; hypothesis must be falsifiable. • Computer Science 5 • Empirical & mathematical analysis Dropping Glass Balls • Tower with 100 Floors • Given 2 glass balls • Want to determine the lowest floor from which a ball can be dropped and will break • How? • Is your algorithm the most efficient one? • Generalize to n floors 9/27/17 CompSci 201, Fall 2017, Analysis 6 Glass balls continued http://bit.ly/CS201-f17-0927-0 • Assume the number of floors is 100 • In the worst case, how many • In the best case how many balls balls will I have to drop? will I have to drop to determine the lowest floor where a ball will break? If there are n floors, how many balls will you have to drop? (roughly) What is big-Oh about? (preview) • Intuition: avoid details when they don’t matter, and they don’t matter when input size (N) is big enough • For polynomials, use only leading term, ignore coefficients y = 3x y = 6x - 2 y = 15x + 44 y = x2 y = x2 - 6x+ 9 y = 3x2 + 4x • The first family is O(n), the second is O(n2) • Intuition: family of curves, generally the same shape • More formally: O(f(n)) is an upper-bound, when n is large enough the expression cf(n) is larger • Intuition: linear function: double input, double time, quadratic function: double input, quadruple the time More on O-notation, big-Oh • Big-Oh hides/obscures some empirical analysis, but is good for general description of algorithm • Allows us to compare algorithms in the limit • 20N hours vs N2 microseconds: which is better? • O-notation is an upper-bound, this means that N is O(N), but it is also O(N2); we try to provide tight bounds. Formally: cf(N) • g(N) ∈ O(f(N)) iff there exist constants g(N) c and n0 such that for all g(N) < cf(N), N > n x = n0 Rank orders of growth • n4 grows faster than n2 • n4 ∉ O(n2) • 0.001n4 is in the same growth class as 1E6n4 • 0.001n4, 1E6n4 ∈ O(n4) http://bit.ly/201-f17-0927-1 Reasoning about growth • Consider a 3-tower 1. How tall is a 5-tower? 2. How tall is a 10 tower? 3. How many blocks in a 5-tower? 4. Which best captures the height of an n-tower? http://bit.ly/201-f17-0927-2 9/27/17 CompSci 201, Fall 2017, Analysis 11 Three-Sum • Given N integers, find triples that sum to 0. • Deeply related to problems in computational geometry. public class ThreeSum { // return number of distinct triples (i, j, k) // such that (a[i] + a[j] + a[k] == 0) public static int count(int[] a) { int N = a.length; int cnt = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) for (int k = j+1; k < N; k++) if (a[i] + a[j] + a[k] == 0) cnt++; return cnt; } } 12 Empirical Analysis • Empirical analysis. Run the program for various input sizes. N time † 512 0.03 1024 0.26 2048 2.16 4096 17.18 8192 136.76 † Running Linux on Sun-Fire-X4100 with 16GB RAM • How much time for N = 4096 on my machine? • How much could I do in an minute, hour, day? 13 Empirical Analysis • Data analysis. Plot running time vs. input size N. 14 Mathematical Analysis • Count up frequency of execution of each instruction and weight by its execution time. int count = 0; how many times is each for (int i = 0; i < N; i++) instruction executed? if (a[i] == 0) count++; int count = 0; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) if (a[i] + a[j] == 0) count++; int count = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) if (a[i] + a[j] == 0) count++; 15 Three Sum Analysis • Mathematical analysis. • The running time is proportional to N 3. • Focus on instructions in "inner loop." 16 Order of Growth Classifications • Observation. A small subset of mathematical functions suffice to describe running time of many fundamental algorithms. public void g(int N) { if (N == 0) return; N log2 N g(N/2); while (N > 1) { log2 N g(N/2); N = N / 2; for (int i = 0; i < N; i++ ... ) } ... } N for (int i = 0; i < N; i++) ... public void f(int N) { N if (N == 0) return; for (int i = 0; i < N; i++) 2 N2 for (int j = 0; j < N; j++) f(N-1); ... f(N-1); ... 17 } Big-Oh calculations from code • Search for element in an array: • What is complexity of code (using O-notation)? • What if array doubles, what happens to time? for(int k=0; k < a.length; k++) if (a[k].equals(target)) return true; return false; • Complexity if we call N times on M-element vector? • What about best case? Average case? Worst case? IsomorphicWords • Consider code from the solution to IsomophicWords: int total = 0; for(int j=0; j < words.length; j++) { for(int k=j+1; k < words.length; k++) { if (isomorphic(words[j],words[k])) { total += 1; } } } return total; • What is the input size? What does the runtime depend on? • What’s the big-Oh for the run-time? 9/27/17 CompSci 201, Fall 2017, Analysis 19 Array vs. ArrayList • Run the code ArrayVsArrayList.java • https://coursework.cs.duke.edu/201fall17/clas swork/blob/master/src/ArrayVsArrayList.java • Change the value of argument • Submit your data here: http://bit.ly/201-f17-0927-3 • Submit as many times as you want Amortization: Expanding ArrayLists • Expand capacity of list when add() called • Calling add N times, doubling capacity as needed Item # Resizing cost Cumulative Resizing Cost Capacity After cost per item add 1 0 0 0 1 2 2 2 1 2 3-4 4 6 1.5 4 5-8 8 14 1.75 8 ... 2m+1 - 2m+1 2 m+1 2m+2-2 around 2 2m+1 • Big-Oh of adding n elements? • What if we grow size by one each time? Some helpful mathematics • 1 + 2 + 3 + 4 + … + N • N(N+1)/2, exactly = N2/2 + N/2 which is O(N2) why? • N + N + N + …. + N (total of N times) • N*N = N2 which is O(N2) • N + N + N + …. + N + … + N + … + N (total of 3N times) • 3N*N = 3N2 which is O(N2) • 1 + 2 + 4 + … + 2N • 2N+1 – 1 = 2 x 2N – 1 which is O(2N ) • Impact of last statement on adding 2N+1 elements to a vector • 1 + 2 + … + 2N + 2N+1 = 2N+2-1 = 4x2N-1 which is O(2N) resizing + copy = total (let x = 2N) Running times @ 109 instructions/sec N O(log N) O(N) O(N log N) O(N2) 10 3E-9 1E-8 3.3E-8 0.0000001 100 7E-9 1E-7 6.64E-7 0.0001 1,000 1E-8 1E-6 0.00001 0.001 10,000 1.3E-8 0.00001 0.0001329 0.102 100,000 1.7E-8 0.0001 0.001661 10.008 1,000,000 0.00000002 0.001 0.0199 16.7 min 1,000,000,000 0.00000003 1.002 65.8 3.18 centuries Analysis: Empirical vs. Mathematical • Empirical analysis. • Measure running times, plot, and fit curve. • Easy to perform experiments. • Model useful for predicting, but not for explaining. • Mathematical analysis. • Analyze algorithm to estimate # ops as a function of input size. • May require advanced mathematics. • Model useful for predicting and explaining. • Critical difference. Mathematical analysis is independent of a particular machine or compiler; applies to machines not yet built. 24.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us