Matrix Norm and Low-Rank Approximation

San José State University Math 253: Mathematical Methods for Data Visualization Matrix norm and low-rank approximation Dr. Guangliang Chen Outline • Matrix norm • Condition number • Low-rank matrix approximation • Applications Matrix norm and low-rank approximation Introduction Recall that a vector space is a collection of objects, called “vectors”, which are endowed with two kinds of operations, vector addition and scalar multiplication, subject to requirements such as associativity, commutativity, and distributivity. Below are some examples of vector spaces: • Euclidean spaces (Rn) • The collection of all matrices of a fixed size (Rm×n) • The collection of all functions from R to R • The collection of all polynomials • The collection of all infinite sequences Dr. Guangliang Chen | Mathematics & Statistics, San José State University3/49 Matrix norm and low-rank approximation Vector norm A norm on a vector space V is a function k · k : V → R that satisfies the following three conditions: • kvk ≥ 0 for all v ∈ V and kvk = 0 if and only if v = 0 • kkvk = |k|kvk for any scalar k ∈ R and vector v ∈ Rd • kv + wk ≤ kvk + kwk for any two vectors v, w ∈ V The norm of a vector can be thought of as the length or magnitude of the vector. Dr. Guangliang Chen | Mathematics & Statistics, San José State University4/49 Matrix norm and low-rank approximation Example 0.1. Below are three different norms on the Euclidean space Rd: • 2-norm (or Euclidean norm): q √ X 2 T kxk2 = xi = x x • 1-norm (Taxicab norm or Manhattan norm): X kxk1 = |xi| • ∞-norm (maximum norm): kxk∞ = max |xi| When unspecified, it is understood as the Euclidean 2-norm. Dr. Guangliang Chen | Mathematics & Statistics, San José State University5/49 Matrix norm and low-rank approximation d Remark. More generally, for any fixed p > 0, the `p norm on R is defined as 1/p X p d kxkp = |xi| , for all x ∈ R Remark. Any norm on Rd can be used as a metric to measure the distance between two vectors: d dist(x, y) = kx − yk, for all x, y ∈ R For example, the Euclidean distance between x, y ∈ Rd (corresponding to the Euclidean norm) is q X 2 distE(x, y) = kx − yk2 = (xi − yi) Dr. Guangliang Chen | Mathematics & Statistics, San José State University6/49 Matrix norm and low-rank approximation Matrix norm A matrix norm is a norm on Rm×n as a vector space (consisting of all matrices of the fixed size). More specifically, a matrix norm is a function m×n k · k : R → R that satisfies the following three conditions: • kAk ≥ 0 for all A ∈ Rm×n and kAk = 0 if and only if A = O • kkAk = |k| · kAk for any scalar k ∈ R and matrix A ∈ Rm×n • kA + Bk ≤ kAk + kBk for any two matrices A, B ∈ Rm×n Dr. Guangliang Chen | Mathematics & Statistics, San José State University7/49 Matrix norm and low-rank approximation The Frobenius norm Def 0.1. The Frobenius norm of a matrix A ∈ Rn×d is defined as s X 2 kAkF = aij i,j Remark. The Frobenius norm on Rn×d is equivalent to the Euclidean 2-norm on the space of vectorized matrices (i.e., Rnd): kAkF = kA(:)k2 Dr. Guangliang Chen | Mathematics & Statistics, San José State University8/49 Matrix norm and low-rank approximation Example 0.2. Let 1 −1 X = 0 1 . 1 0 By direct calculation, kXkF = 2. Dr. Guangliang Chen | Mathematics & Statistics, San José State University9/49 Matrix norm and low-rank approximation Proposition 0.1. For any matrix A ∈ Rn×d, 2 T T kAkF = trace(AA ) = trace(A A) ! a b Proof. We demonstrate the first identity for 2 × 2 matrices A = : c d ! ! ! a b a c a2 + b2 ∗ AAT = = c d b d ∗ c2 + d2 (The full proof is just a direct generalization of the above case) Dr. Guangliang Chen | Mathematics & Statistics, San José State University 10/49 Matrix norm and low-rank approximation n×d Theorem 0.2. For any matrix A ∈ R (with singular values {σi}), q X 2 kAkF = σi Proof. Let the full SVD of A be A = UΣVT . Then AAT = UΣVT · VΣT UT = U(ΣΣT )UT . 2 T Applying the formula kAkF = trace(AA ) gives that 2 T T T T T X 2 kAkF = trace(UΣΣ U ) = trace(ΣΣ U U) = trace(ΣΣ ) = σi . Dr. Guangliang Chen | Mathematics & Statistics, San José State University 11/49 Matrix norm and low-rank approximation The Operator norm A second matrix norm is the operator norm, which is induced by a vector norm on Euclidean spaces. Theorem 0.3. For any norm k · k on Euclidean spaces, the following is a norm on Rm×n: kAxk kAk def= max = max kAuk n x6=0 kxk u∈R :kuk=1 Proof. We need to verify the three conditions of a norm. First, it is obvious that kAk ≥ 0 for any A ∈ Rm×n. Suppose kAk = 0. Then for any x 6= 0, kAxk = 0, or equivalently, Ax = 0. This implies that A = O. (The other direction is trivial) Dr. Guangliang Chen | Mathematics & Statistics, San José State University 12/49 Matrix norm and low-rank approximation Second, for any k ∈ R, kkAk = max k(kA)uk = |k| · max kAuk = |k| · kAk. n n u∈R :kuk=1 u∈R :kuk=1 Lastly, for any two matrices A, B ∈ Rm×n, kA + Bk = max k(A + B)uk = max kAu + Buk n n u∈R :kuk=1 u∈R :kuk=1 ≤ max (kAuk + kBuk) n u∈R :kuk=1 ≤ max kAuk + max kBuk n n u∈R :kuk=1 u∈R :kuk=1 = kAk + kBk. Dr. Guangliang Chen | Mathematics & Statistics, San José State University 13/49 Matrix norm and low-rank approximation Theorem 0.4. For any norm on Euclidean spaces and its induced matrix operator norm, we have m×n n kAxk ≤ kAk · kxk for all A ∈ R , x ∈ R Proof. For all x 6= 0 ∈ Rn, by definition, kAxk ≤ kAk kxk This implies that kAxk ≤ kAk · kxk. Dr. Guangliang Chen | Mathematics & Statistics, San José State University 14/49 Matrix norm and low-rank approximation Remark. More generally, the matrix operator norm satisfies m×n n×p kABk ≤ kAk · kBk, for all A ∈ R , B ∈ R Matrix norms with such a property are called sub-multiplicative matrix norms. The proof of this result is left to you in the homework (you will need to apply the theorem on the preceding slide). Dr. Guangliang Chen | Mathematics & Statistics, San José State University 15/49 Matrix norm and low-rank approximation When the Euclidean norm (i.e., 2-norm) is used, the induced matrix operator norm is called the spectral norm. Def 0.2. The spectral norm of a matrix A ∈ Rm×n is defined as kAxk2 kAk2 = max = max kAuk2 n x6=0 kxk2 u∈R :kuk2=1 Theorem 0.5. The spectral norm of any matrix coincides with its largest singular value: kAk2 = σ1(A). Proof. 2 T T 2 kAxk2 x A Ax T 2 kAk2 = max 2 = max T = λ1(A A) = σ1(A) . x6=0 kxk2 x6=0 x x The maximizer is the largest right singular vector of A, i.e. v1. Dr. Guangliang Chen | Mathematics & Statistics, San José State University 16/49 Matrix norm and low-rank approximation Example 0.3. For the matrix 1 −1 X = 0 1 , 1 0 √ we have kXk2 = 3. Dr. Guangliang Chen | Mathematics & Statistics, San José State University 17/49 Matrix norm and low-rank approximation We note that the Frobenius and spectral norms of a matrix correspond to the 2- and ∞-norms of the vector of singular values (σ = (σ1, . , σr)): kAkF = kσk2, kAk2 = kσk∞ The 1-norm of the singular value vector is called the nuclear norm of A, which is very useful in convex programming. Def 0.3. The nuclear norm of a matrix A ∈ Rn×d is defined as X kAk∗ = kσk1 = σi. √ Example 0.4. In the last example, kXk∗ = 3 + 1. Dr. Guangliang Chen | Mathematics & Statistics, San José State University 18/49 Matrix norm and low-rank approximation MATLAB function for matrix/vector norm norm – Matrix or vector norm. norm(X,2) returns the 2-norm of X. norm(X) is the same as norm(X,2). norm(X,’fro’) returns the Frobenius norm of X. In addition, for vectors... norm(V,P) returns the p-norm of V defined as SUM(ABS(V).ˆP)ˆ(1/P). norm(V,Inf) returns the largest element of ABS(V). Dr. Guangliang Chen | Mathematics & Statistics, San José State University 19/49 Matrix norm and low-rank approximation Condition number of a matrix Briefly speaking, the condition number of a square, invertible matrix is a measure of its near-singularity. Def 0.4. Let A be any square, invertible matrix. For a given matrix operator norm k · k, the condition number of A is defined as κ(A) = kAkkA−1k −1 kIxk Remark. κ(A) ≥ kAA k = kIk = maxx6=0 kxk = 1. Remark. Under the matrix spectral norm, −1 1 σmax(A) κ(A) = kAk2kA k2 = σmax(A) · = σmin(A) σmin(A) Dr. Guangliang Chen | Mathematics & Statistics, San José State University 20/49 Matrix norm and low-rank approximation 1st perspective (for understanding the matrix condition number): Let kAxk M = max = kAk x6=0 kxk kAxk kyk 1 1 m = min = min = −1 = x6=0 kxk y6=0 kA−1yk kA yk kA−1k maxy6=0 kyk which respectively represent how much the matrix A can stretch or shrink vectors. Then M κ(A) = m If A is singular, then there exists a nonzero x such that Ax = 0.

Load more