4. Singular Value Decomposition
Total Page:16
File Type:pdf, Size:1020Kb
L. Vandenberghe ECE133B (Spring 2020) 4. Singular value decomposition • singular value decomposition • related eigendecompositions • matrix properties from singular value decomposition • min–max and max–min characterizations • low-rank approximation • sensitivity of linear equations 4.1 Singular value decomposition (SVD) every m × n matrix A can be factored as A = UΣVT • U is m × m and orthogonal • V is n × n and orthogonal • Σ is m × n and “diagonal”: diagonal with diagonal elements σ1,..., σn if m = n, 2 σ1 0 ··· 0 3 6 7 6 0 σ2 ··· 0 7 6 : : : : 7 2 σ1 0 ··· 0 0 ··· 0 3 6 : : : : : 7 6 7 6 7 6 0 σ2 ··· 0 0 ··· 0 7 Σ = 6 0 0 ··· σn 7 if m > n; Σ = 6 : : : : : : 7 if m < n 6 7 6 : : : : : : : 7 6 0 0 ··· 0 7 6 7 6 : : : 7 6 0 0 ··· σm 0 ··· 0 7 6 : : : 7 4 5 6 7 6 0 0 ··· 0 7 4 5 • the diagonal entries of Σ are nonnegative and sorted: σ1 ≥ σ2 ≥ · · · ≥ σminfm;ng ≥ 0 Singular value decomposition 4.2 Singular values and singular vectors A = UΣVT • the numbers σ1,..., σminfm;ng are the singular values of A • the m columns ui of U are the left singular vectors • the n columns vi of V are the right singular vectors if we write the factorization A = UΣVT as AV = UΣ; ATU = VΣT and compare the ith columns on the left- and right-hand sides, we see that T Avi = σiui and A ui = σivi for i = 1;:::;minfm; ng T • if m > n the additional m − n vectors ui satisfy A ui = 0 for i = n + 1;:::; m • if n > m the additional n − m vectors vi satisfy Avi = 0 for i = m + 1;:::; n Singular value decomposition 4.3 Reduced SVD often m n or n m, which makes one of the orthogonal matrices very large Tall matrix: if m > n, the last m − n columns of U can be omitted to define A = UΣVT • U is m × n with orthonormal columns • V is n × n and orthogonal • Σ is n × n and diagonal with diagonal entries σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0 Wide matrix: if m < n, the last n − m columns of V can be omitted to define A = UΣVT • U is m × m and orthogonal • V is m × n with orthonormal columns • Σ is m × m and diagonal with diagonal entries σ1 ≥ σ2 ≥ · · · ≥ σm ≥ 0 we refer to these as reduced SVDs (and to the factorization on p. 4.2 as a full SVD) Singular value decomposition 4.4 Outline • singular value decomposition • related eigendecompositions • matrix properties from singular value decomposition • min–max and max–min characterizations • low-rank approximation • sensitivity of linear equations Eigendecomposition of Gram matrix suppose A is an m × n matrix with full SVD A = UΣVT the SVD is related to the eigendecomposition of the Gram matrix AT A: AT A = VΣT ΣVT • V is an orthogonal n × n matrix • ΣT Σ is a diagonal n × n matrix with (non-increasing) diagonal elements 2 2 2 σ1; σ2; : : : ; σminfm;ng; 0; 0; ··· ; 0 | {z } n − minfm; ng times T T • the n diagonal elements of Σ Σ are the eigenvalues of A A • the right singular vectors (columns of V) are corresponding eigenvectors Singular value decomposition 4.5 Gram matrix of transpose the SVD also gives the eigendecomposition of AAT: AAT = UΣΣTUT • U is an orthogonal m × m matrix • ΣΣT is a diagonal m × m matrix with (non-increasing) diagonal elements 2 2 2 σ1; σ2; : : : ; σminfm;ng; 0; 0; ··· ; 0 | {z } m − minfm; ng times T T • the m diagonal elements of ΣΣ are the eigenvalues of AA • the left singular vectors (columns of U) are corresponding eigenvectors in particular, the first minfm; ng eigenvalues of AT A and AAT are the same: 2 2 2 σ1; σ2; : : : ; σminfm;ng Singular value decomposition 4.6 Example scatter plot shows m = 500 points from the normal distribution on page 3.34 p 5 1 7 3 µ = ; Σ = p 4 ex 4 3 5 • we define an m × 2 data matrix X with the m vectors as its rows T • the centered data matrix is Xc = X − ¹1/mº11 X x2 sample estimate of mean is 1 5:01 µ = XT1 = b m 3:93 sample estimate of covariance is 1 1:67 0:48 Σ = XT X = b m c c 0:48 1:35 x1 Singular value decomposition 4.7 Example 1 A = p Xc m • eigenvectors of bΣ are right singular vectors v1, v2 of A (and of Xc) • eigenvalues of bΣ are squares of the singular values of A x2 direction v1 direction v2 x1 Singular value decomposition 4.8 Existence of singular value decomposition the Gram matrix connection gives a proof that every matrix has an SVD • assume A is m × n with m ≥ n and rank r • the n × n matrix AT A has rank r (page 2.5) and an eigendecomposition AT A = VΛVT (1) Λ is diagonal with diagonal elements λ1 ≥ · · · ≥ λr > 0 = λr+1 = ··· = λn p • define σi = λi for i = 1;:::; n, and an n × n matrix h i U = u ··· u = 1 Av 1 Av ··· 1 Av u ··· u 1 n σ1 1 σ2 2 σr r r+1 n T where ur+1,..., un form any orthonormal basis for null¹A º • (1) and the choice of ur+1,..., un imply that U is orthogonal • (1) also implies that Avi = 0 for i = r + 1;:::; n • together with the definition of u1,. , ur this shows that AV = UΣ Singular value decomposition 4.9 Non-uniqueness of singular value decomposition the derivation from the eigendecomposition AT A = VΛVT shows that the singular value decomposition of A is almost unique Singular values • the singular values of A are uniquely defined • we have also shown that A and AT have the same singular values Singular vectors (assuming m ≥ n): see the discussion on page 3.14 • right singular vectors vi with the same positive singular value span a subspace • in this subspace, any other orthonormal basis can be chosen • the first r = rank¹Aº left singular vectors then follow from σiui = Avi • the remaining vectors vr+1,..., vn can be any orthonormal basis for null¹Aº T • the remaining vectors ur+1,..., um can be any orthonormal basis for null¹A º Singular value decomposition 4.10 Exercise suppose A is an m × n matrix with m ≥ n, and define 0 A B = AT 0 1. suppose A = UΣVT is a full SVD of A; verify that U 0 0 Σ U 0 T B = 0 V ΣT 0 0 V 2. derive from this an eigendecomposition of B Hint: if Σ1 is square, then 0 Σ 1 II Σ 0 II 1 = 1 Σ1 0 2 I −I 0 −Σ1 I −I 3. what are the m + n eigenvalues of B? Singular value decomposition 4.11 Outline • singular value decomposition • related eigendecompositions • matrix properties from singular value decomposition • min–max and max–min characterizations • low-rank approximation • sensitivity of linear equations Rank the number of positive singular values is the rank of a matrix • suppose there are r positive singular values: σ1 ≥ · · · ≥ σr > 0 = σr+1 = ··· = σminfm;ng • partition the matrices in a full SVD of A as Σ 0 T A = U U 1 V V 1 2 0 0 1 2 T = U1Σ1V1 (2) Σ1 is r × r with the positive singular values σ1,..., σr on the diagonal • since U1 and V1 have orthonormal columns, the factorization (2) proves that rank¹Aº = r (see page 1.12) Singular value decomposition 4.12 Inverse and pseudo-inverse we use the same notation as on the previous page Σ 0 T A = U U 1 V V = U Σ VT 1 2 0 0 1 2 1 1 1 diagonal entries of Σ1 are the positive singular values of A • pseudo-inverse follows from page 1.36: y −1 T A = V1Σ1 U1 Σ−1 T 1 0 U1 = V1 V2 T 0 0 U2 = VΣyUT • if A is square and nonsingular, this reduces to the inverse A−1 = ¹UΣVTº−1 = VΣ−1UT Singular value decomposition 4.13 Four subspaces we continue with the same notation for the SVD of an m × n matrix A with rank r: Σ 0 T A = U U 1 V V 1 2 0 0 1 2 the diagonal entries of Σ1 are the positive singular values of A the SVD provides orthonormal bases for the four subspaces associated with A • the columns of the m × r matrix U1 are a basis of range¹Aº ? T • the columns of the m × ¹m − rº matrix U2 are a basis of range¹Aº = null¹A º T • the columns of the n × r matrix V1 are a basis of range¹A º • the columns of the n × ¹n − rº matrix V2 are a basis of null¹Aº Singular value decomposition 4.14 Image of unit ball define Ey as the image of the unit ball under the linear mapping y = Ax: T Ey = fAx j kxk ≤ 1g = fUΣV x j kxk ≤ 1g x2 x˜2 e v2 x˜ = VT x 2 e1 x1 x˜1 v1 y = Ax y˜ = Σx˜ y2 y˜2 σ u σ2e2 2 2 σ1u1 σ1e1 y1 y˜1 Ey y = Uy˜ Singular value decomposition 4.15 Control interpretation system A maps input x to output y = Ax x A y = Ax • if kxk2 represents input energy, the set of outputs realizable with unit energy is Ey = fAx j kxk ≤ 1g • assume rank¹Aº = m: every desired y can be realized by at least one input • the most energy-efficient input that generates output y is m ¹uT yº2 x = Ayy = AT¹AATº−1y; kx k2 = yT¹AATº−1y = X i eff eff 2 i=1 σi T T −1 • (if rank¹Aº = m) the set Ey is an ellipsoid Ey = fy j y ¹AA º y ≤ 1g Singular value decomposition 4.16 Inverse image of unit ball define Ex as the inverse image of the unit ball under the linear mapping y = Ax: T Ex = fx j kAxk ≤ 1g = fx j kUΣV xk ≤ 1g x2 x˜2 −1 −1 σ e2 σ2 v2 2 x˜ = VT x Ex −1 σ1 e1 x1 x˜1 −1 σ1 v1 y = Ax y˜ = Σx˜ y2 y˜2 e u2 2 u1 e1 y1 y˜1 y = Uy˜ Singular value decomposition 4.17 Estimation interpretation measurement A maps unknown quantity xtrue to observation yobs = A¹xtrue + vº where v is unknown but bounded by kAvk ≤ 1 • if rank¹Aº = n, there is a unique estimate xˆ that satisfies Axˆ = yobs • uncertainty in y causes uncertainty in estimate: true value xtrue must satisfy kA¹xtrue − xˆºk ≤ 1 • the set Ex = fx j kA¹x − xˆºk ≤ 1g is the uncertainty region around estimate xˆ Singular value decomposition 4.18 Frobenius norm and 2-norm for an m × n matrix A with singular values σi: / minfm;ng ! 1 2 X 2 kAkF = σi ; kAk2 = σ1 i=1 this readily follows from the unitary invariance of the two norms: / minfm;ng ! 1 2 T X 2 kAkF = kUΣV kF = kΣkF = σi i=1 and T kAk2 = kUΣV k2 = kΣk2 = σ1 Singular value decomposition 4.19 Exercise y y Exercise 1: express kA k2 and kA kF in terms of the singular values of A Exercise 2: the condition number of a square nonsingular matrix A is defined