ECS289: Scalable Machine Learning Big-O Notations

ECS289: Scalable Machine Learning Big-O Notations Cho-Jui Hsieh UC Davis Oct 20, 2015 Outline Time complexity and Big-O notations Time complexity for basic linear algebra operators Although time complexity is a good indication of efficiency, in practical numerical computation sometimes constants are important: For example, time for running 1 billion operations: \exp": 30.19 secs \*": 1.84 secs \/": 7.31 secs \+": 1.77 secs In this course we will ignore these constants Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms. Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms. Although time complexity is a good indication of efficiency, in practical numerical computation sometimes constants are important: For example, time for running 1 billion operations: \exp": 30.19 secs \*": 1.84 secs \/": 7.31 secs \+": 1.77 secs In this course we will ignore these constants How to show the time complexity O(g(x))? Show there exists a way to implement the algorithm and the implementation requires Cg(x) operations for some C Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g(x)) as x ! 1 if and only if there exists a positive constant M and x0 such that jf (x)j ≤ Mjg(x)j for all x ≥ x0 In short, f (x) = O(g(x)) means f is upper bounded by g up to a constant factor Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g(x)) as x ! 1 if and only if there exists a positive constant M and x0 such that jf (x)j ≤ Mjg(x)j for all x ≥ x0 In short, f (x) = O(g(x)) means f is upper bounded by g up to a constant factor How to show the time complexity O(g(x))? Show there exists a way to implement the algorithm and the implementation requires Cg(x) operations for some C How to show the time complexity Ω(g(x))? Prove any implementation requires at least Cg(x) operations for some constant C. Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g(x)) as x ! 1 if and only if there exists a positive constant m and x0 such that jf (x)j ≥ mjg(x)j for all x ≥ x0 In short, f (x) = Ω(g(x)) means f is lower bounded by g up to a constant factor Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g(x)) as x ! 1 if and only if there exists a positive constant m and x0 such that jf (x)j ≥ mjg(x)j for all x ≥ x0 In short, f (x) = Ω(g(x)) means f is lower bounded by g up to a constant factor How to show the time complexity Ω(g(x))? Prove any implementation requires at least Cg(x) operations for some constant C. How to show the time complexity Θ(g(x))? Show both Big-O and Big-Omega Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g(x)) as x ! 1 if and only if there exists positive constant m; M and x0 such that Mjg(x)j ≥ jf (x)j ≥ mjg(x)j for all x ≥ x0 In short, f (x) = Θ(g(x)) means f has the same order with g up to a constant factor Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g(x)) as x ! 1 if and only if there exists positive constant m; M and x0 such that Mjg(x)j ≥ jf (x)j ≥ mjg(x)j for all x ≥ x0 In short, f (x) = Θ(g(x)) means f has the same order with g up to a constant factor How to show the time complexity Θ(g(x))? Show both Big-O and Big-Omega Count number of operations Count the total number of operations (+; −; ∗; =; exp; log; if;::: ) Only need to count the \order" of operations, and then use the big-O notation Dense vector and sparse vector m If x; y 2 R are dense: x + y, x − y, xT y: O(m) operations m If x; y 2 R , x is dense and y is sparse: x + y, x − y, xT y: O(nnz(y)) operations m If x; y 2 R and both of them are sparse: x + y, x − y, xT y: O(nnz(y) + nnz(x)) operations Dense Matrix vs Sparse Matrix m×n Any matrix X 2 R can be dense or sparse Dense Matrix: most entries in X are nonzero (mn space) Sparse Matrix: only few entries in X are nonzero (O(nnz) space) Dense Matrix Operations m×n m×n Let A 2 R , B 2 R , s 2 R A + B; sA; AT : O(mn) operations m×n n×1 Let A 2 R , b 2 R Ab: O(mn) operations Dense Matrix Operations m×k k×n Matrix-matrix multiplication: let A 2 R , B 2 R , what is the time complexity of computing AB? Dense Matrix Operations n×n Assume A; B 2 R , what is the time complexity of computing AB? Naive implementation: O(n3) Theoretical best: O(n2:xxx ) (but slower than naive implementation in practice) Best way: using BLAS (Basic Linear Algebra Subprograms) Dense Matrix Operations BLAS matrix product: O(mnk) for computing AB where m×k k×n A 2 R ; B 2 R Compute matrix product block by block to minimize cache miss rate Can be called from C, Fortran; can be used in MATLAB, R, Python, . Sparse Matrix Operations Widely-used format: Compressed Sparse Column (CSC), Compressed Sparse Row (CSR), . CSR: three arrays for storing an m × n matrix with nnz nonzeroes 1 val (nnz real numbers): the values of each nonzero elements 2 row ind (nnz integers): the column indices corresponding to the values 3 col ptr (m + 1 integers): the list of value indexes where each column starts Sparse Matrix Operations m×n m×n If A 2 R (sparse), B 2 R (sparse or dense), s 2 R A + B; sA; AT : O(nnz) operations m×n n×1 If A 2 R , b 2 R Ab: O(nnz) operations m×k k×n If A 2 R (sparse), B 2 R (dense): AB: O((nnz)n) operations (use sparse BLAS) m×k k×n If A 2 R (sparse), B 2 R (sparse): AB: O(nnz(A)nnz(B)=k) in average AB: O(nnz(A)n) worst case The resulting matrix will be much denser.

ECS289: Scalable Machine Learning Big-O Notations

Time Complexity

Quick Sort Algorithm Song Qin Dept

Time Complexity of Algorithms

A Short History of Computational Complexity

Sorting Algorithm 1 Sorting Algorithm

Algorithm Time Cost Measurement

NP-Completeness General Problems, Input Size and Time Complexity

New Worst-Case Upper Bound for #2-SAT and #3-SAT with the Number of Clauses As the Parameter

Evaluation of Sorting Algorithms, Mathematical and Empirical Analysis of Sorting Algorithms

Sorting Algorithm 1 Sorting Algorithm

The Maximum Clique Problem

A Tutorial on Clique Problems in Communications and Signal Processing Ahmed Douik, Student Member, IEEE, Hayssam Dahrouj, Senior Member, IEEE, Tareq Y