Quick viewing(Text Mode)

ECS289: Scalable Machine Learning Big-O Notations

ECS289: Scalable Big-O Notations

Cho-Jui Hsieh UC Davis

Oct 20, 2015 Outline

Time complexity and Big-O notations Time complexity for basic linear algebra operators Although time complexity is a good indication of efficiency, in practical numerical computation sometimes constants are important: For example, time for running 1 billion operations: “exp”: 30.19 secs “*”: 1.84 secs “/”: 7.31 secs “+”: 1.77 secs In this course we will ignore these constants

Time Complexity

From Wikipedia: The time complexity of an quantifies the amount of time taken by an algorithm to run as a of the length of the input The time complexity of an algorithm is commonly expressed using , which excludes coefficients and lower order terms. Time Complexity

From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms. Although time complexity is a good indication of efficiency, in practical numerical computation sometimes constants are important: For example, time for running 1 billion operations: “exp”: 30.19 secs “*”: 1.84 secs “/”: 7.31 secs “+”: 1.77 secs In this course we will ignore these constants How to show the time complexity O(g(x))? Show there exists a way to implement the algorithm and the implementation requires Cg(x) operations for some C

Big-O

Definition of O(·): Let f and g be two functions, we write

f (x) = O(g(x)) as x → ∞

if and only if there exists a positive constant M and x0 such that

|f (x)| ≤ M|g(x)| for all x ≥ x0

In short, f (x) = O(g(x)) means f is upper bounded by g to a constant factor Big-O

Definition of O(·): Let f and g be two functions, we write

f (x) = O(g(x)) as x → ∞

if and only if there exists a positive constant M and x0 such that

|f (x)| ≤ M|g(x)| for all x ≥ x0

In short, f (x) = O(g(x)) means f is upper bounded by g up to a constant factor How to show the time complexity O(g(x))? Show there exists a way to implement the algorithm and the implementation requires Cg(x) operations for some C How to show the time complexity Ω(g(x))? Prove any implementation requires at least Cg(x) operations for some constant C.

Big-Omega

Definition of Ω(·): Let f and g be two functions, we write

f (x) = Ω(g(x)) as x → ∞

if and only if there exists a positive constant m and x0 such that

|f (x)| ≥ m|g(x)| for all x ≥ x0

In short, f (x) = Ω(g(x)) means f is lower bounded by g up to a constant factor Big-Omega

Definition of Ω(·): Let f and g be two functions, we write

f (x) = Ω(g(x)) as x → ∞

if and only if there exists a positive constant m and x0 such that

|f (x)| ≥ m|g(x)| for all x ≥ x0

In short, f (x) = Ω(g(x)) means f is lower bounded by g up to a constant factor How to show the time complexity Ω(g(x))? Prove any implementation requires at least Cg(x) operations for some constant C. How to show the time complexity Θ(g(x))? Show both Big-O and Big-Omega

Big-Theta

Definition of Θ(·): Let f and g be two functions, we write

f (x) = Θ(g(x)) as x → ∞

if and only if there exists positive constant m, M and x0 such that

M|g(x)| ≥ |f (x)| ≥ m|g(x)| for all x ≥ x0

In short, f (x) = Θ(g(x)) means f has the same order with g up to a constant factor Big-Theta

Definition of Θ(·): Let f and g be two functions, we write

f (x) = Θ(g(x)) as x → ∞

if and only if there exists positive constant m, M and x0 such that

M|g(x)| ≥ |f (x)| ≥ m|g(x)| for all x ≥ x0

In short, f (x) = Θ(g(x)) means f has the same order with g up to a constant factor How to show the time complexity Θ(g(x))? Show both Big-O and Big-Omega Count number of operations

Count the total number of operations (+, −, ∗, /, exp, log, if,... ) Only need to count the “order” of operations, and then use the big-O notation Dense vector and sparse vector

m If x, y ∈ R are dense: x + y, x − y, xT y: O(m) operations m If x, y ∈ R , x is dense and y is sparse: x + y, x − y, xT y: O(nnz(y)) operations m If x, y ∈ R and both of them are sparse: x + y, x − y, xT y: O(nnz(y) + nnz(x)) operations Dense Matrix vs Sparse Matrix

m×n Any matrix X ∈ R can be dense or sparse Dense Matrix: most entries in X are nonzero (mn space) Sparse Matrix: only few entries in X are nonzero (O(nnz) space) Dense Matrix Operations

m×n m×n Let A ∈ R , B ∈ R , s ∈ R A + B, sA, AT : O(mn) operations m×n n×1 Let A ∈ R , b ∈ R Ab: O(mn) operations Dense Matrix Operations

m×k k×n Matrix-matrix multiplication: let A ∈ R , B ∈ R , what is the time complexity of computing AB? Dense Matrix Operations

n×n Assume A, B ∈ R , what is the time complexity of computing AB? Naive implementation: O(n3) Theoretical best: O(n2.xxx ) (but slower than naive implementation in practice) Best way: using BLAS (Basic Linear Algebra Subprograms) Dense Matrix Operations

BLAS matrix product: O(mnk) for computing AB where m×k k×n A ∈ R , B ∈ R Compute matrix product block by block to minimize cache miss rate Can be called from C, Fortran; can be used in MATLAB, R, Python, . . . Sparse Matrix Operations

Widely-used format: Compressed Sparse Column (CSC), Compressed Sparse Row (CSR), . . . CSR: three arrays for storing an m × n matrix with nnz nonzeroes 1 val (nnz real numbers): the values of each nonzero elements 2 row ind (nnz integers): the column indices corresponding to the values 3 col ptr (m + 1 integers): the list of value indexes where each column starts Sparse Matrix Operations

m×n m×n If A ∈ R (sparse), B ∈ R (sparse or dense), s ∈ R A + B, sA, AT : O(nnz) operations m×n n×1 If A ∈ R , b ∈ R Ab: O(nnz) operations m×k k×n If A ∈ R (sparse), B ∈ R (dense): AB: O((nnz)n) operations (use sparse BLAS) m×k k×n If A ∈ R (sparse), B ∈ R (sparse): AB: O(nnz(A)nnz(B)/k) in average AB: O(nnz(A)n) worst case The resulting matrix will be much denser