Chapter 6 the Singular Value Decomposition Ax=B Version of 11 April 2019
Total Page:16
File Type:pdf, Size:1020Kb
Matrix Methods for Computational Modeling and Data Analytics Virginia Tech Spring 2019 · Mark Embree [email protected] Chapter 6 The Singular Value Decomposition Ax=b version of 11 April 2019 The singular value decomposition (SVD) is among the most important and widely applicable matrix factorizations. It provides a natural way to untangle a matrix into its four fundamental subspaces, and reveals the relative importance of each direction within those subspaces. Thus the singular value decomposition is a vital tool for analyzing data, and it provides a slick way to understand (and prove) many fundamental results in matrix theory. It is the perfect tool for solving least squares problems, and provides the best way to approximate a matrix with one of lower rank. These notes construct the SVD in various forms, then describe a few of its most compelling applications. 6.1 Eigenvalues and eigenvectors of symmetric matrices To derive the singular value decomposition of a general (rectangu- lar) matrix A IR m n, we shall rely on several special properties of 2 ⇥ the square, symmetric matrix ATA. While this course assumes you are well acquainted with eigenvalues and eigenvectors, we will re- call some fundamental concepts, especially pertaining to symmetric matrices. 6.1.1 A passing nod to complex numbers Recall that even if a matrix has real number entries, it could have eigenvalues that are complex numbers; the corresponding eigenvec- tors will also have complex entries. Consider, for example, the matrix 0 1 S = − . " 10# 73 To find the eigenvalues of S, form the characteristic polynomial l 1 det(lI S)=det = l2 + 1. − 1 l " − #! Factor this polynomial (e.g., using the quadratic formula) to get det(lI S)=l2 + 1 =(l i)(l + i), − − where i = p 1. Thus, we conclude that S (a matrix with real entries) − To find the eigenvector associated with has the complex eigenvalues l1, we need to find some nonzero v N(l I S). To do so, solve the l = i, l = i 2 1 − 1 2 − consistent but underdetermined system i1 v 0 with corresponding eigenvectors 1 = . 1i v 0 − 2 i i The first row requires v1 = , v2 = − . " 1 # " 1 # iv1 + v2 = 0, while the second row requires Suppose we want to compute the norm of the eigenvector v1. v1 + iv2 = 0. Using our usual method, we would have − Multiply that last equation by i and − i1 i you obtain the first equation: so if you 2 T 2 satisfy the second equation ( = i ), v1 = v1 v1 = = i + 1 = 1 + 1 = 0. v1 v2 k k h i" 1 # − you satisfy them both. Thus let v iv This result seems strange, no? How could the norm of a nonzero v = 1 = 2 . v2 v2 vector be zero? The specific eigenvector v (given in the This example reveals a crucial shortcoming in our definition of the 1 main text) follows from picking v2 = 1. norm, when applied to complex vectors. Instead of x = x2 + x2 + + x2 = pxTx, k k 1 2 ··· n q we want If z = a + ib C with a, b IR, 2 2 2 2 2 x = x1 + x2 + + xn . then z = pa2 + b2. We call z then k k | | | | ··· | | | | | | magnitude of the complex number z. For real vectors x IR n,q both definitions are the same. For complex 2 vectors x Cn they can be very different. Just as the norm of a real 2 vector has the compact notation v = pvTv, so too does the norm The complex conjugate of z = a + ib is k k of a complex vector: z = a ib, − T v = v v, allowing us to write k k p z 2 = zz =(a ib)(a + i) where v denotes the complex conjugate of v. Now apply this defini- | | − = a2 ib + ib + a2 = a2 + b2. tion of the norm to v1: − 2 T i1 i 2 v1 = v1 v1 = − = i + 1 = 1 + 1 = 2, k k h i" 1 # − a much more reasonable answer than we had before. We will wrap up this complex interlude by proving that the real symmetric matrices that will be our focus in this course can never have complex eigenvalues. 74 6.1.2 The spectral theorem for symmetric matrices Theorem 8 All eigenvalues of a real symmetric matrix are real. Proof. Let S denote a symmetric matrix with real entries, so ST = S (since S is symmetric) and S = S (since S is real). Let (l, v) be an arbitrary eigenpair of S, so that Sv = lv. Without loss of generality, we can assume that v is scaled so that v = 1, i.e., k k vTv = v 2 = 1. Thus Since we do not yet know that v is k k real-valued, we must use the norm l = l v 2 = l(vTv)=vT(lv)=vT(Sv). definition for complex vectors discussed k k in the previous subsection. T Since S is real and symmetric, S = S , and so If z = a + ib and z = z, then a + ib = a ib, i.e., T T T T T T 2 − v (Sv)=v S v = (Sv) v = (lv) v = lv v = l v = l. b = b, k k − which is only possible if b = 0. We have shown that l = l, which is only possible if l is real. It immediately follows that if l is an eigenvalue of the real sym- metric matrix S, then we can always find a real-valued eigenvector v of S corresponding to l, simply by finding a real-valued vector in the null space N(lI S), − since lI S is a real-valued matrix. − Crucially, the eigenvalues of a real symmetric matrix S associated with distinct eigenvalues must be orthogonal. Theorem 9 Eigenvectors of a real symmetric matrix associated with dis- tinct eigenvalues are orthogonal. Proof. Suppose l and g are distinct eigenvalues of a real symmetric matrix S associated with eigenvectors v IR n and w IR n: 2 2 Sv = lv, Sw = gw with l = g. Now consider 6 What if the eigenvalues are not distinct? T T T T T lw v = w (lv)=w (Sv)=w S v, Consider the simple 2 2 identity ⇥ matrix, I. Any nonzero x IR 2 is an 2 where we have used the fact that S = ST. Now eigenvector of I associated with the eigenvalue l = 1, since T T T T T w S v =(Sw) v =(gw) v = gw v. Ix = 1x. We have thus shown that Thus we have many eigenvectors that are not orthogonal. However, we can always find vectors, like lwTv = gwTv. 1 0 v = , v = T 1 0 2 1 Since l = g, this statement can only be true if w v = 0, i.e., if v and 6 w are orthogonal. that are orthogonal. We are ready to collect relevant facts in the Spectral Theorem. 75 n n Theorem 10 (Spectral Theorem) Suppose S IR ⇥ is symmetric, T 2 S = S. Then there exist n (not necessarily distinct) eigenvalues l1,...,ln and corresponding unit-length eigenvectors v1,...,vn such that Svj = ljvj. The eigenvectors form an orthonormal basis for IR n: For example, when n IR = span v1,...,vn 3 1 { } S = − , 13 T T 2 − and vj vk = 0 when j = k, and vj vj = vj = 1. 6 k k we have l1 = 4 and l2 = 2, with As a consequence of the Spectral Theorem, we can write any sym- p2/2 p2/2 v1 = , v2 = . n n p2/2 p2/2 metric matrix S IR ⇥ in the form − 2 Note that these eigenvectors are unit n vectors, and they are orthogonal. We T S = Â ljvjvj . (6.1) can write j=1 T T S = l1v1v1 ++l2v2v2 1/2 1/2 1/2 1/2 This equation expresses S as the sum of the special rank-1 matrices = 4 − + 2 . T 1/2 1/2 1/2 1/2 ljvjvj . The singular value decomposition will provide a similar way − to tease apart a rectangular matrix. n n For the example above, Definition 21 A symmetric matrix S IR ⇥ is positive definite pro- T n2 T n x T 3 1 x vided x Sx > 0 for all nonzero x IR ; if x Sx 0 for all x IR , we say xT Sx = 1 − 1 2 ≥ 2 x 13 x S is positive semidefinite. 2 − 2 = 3x2 2x x + x2 1 − 1 2 2 = 2(x x )2 +(x + x )2. Theorem 11 All eigenvalues of a symmetric positive definite matrix are 1 − 2 1 2 positive; all eigenvalues of a symmetric positive semidefinite matrix are This last expression, the sum of squares, is clearly positive for all nonzero x, so S nonnegative. is positive definite. Proof. Let (lj, vj) denote an eigenpair of the symmetric positive n n 2 T definite matrix S IR ⇥ with vj = v v = 1. Since S is symmetric 2 k k j j Can you prove the converse of this , lj must be real. We conclude that theorem? (A symmetric matrix with positive eigenvalues is positive defi- T T T nite.) Hint: use the Spectral Theorem. lj = ljvj vj = vj (ljvj)=vj Svj, With this result, we can check if S is positive definite by just looking at its which must be positive since S is positive definite and v = 0. j 6 eigenvalues, rather than working out a The proof for positive semidefinite matrices is the same, except we formula for xT Sx, as done above. can only conclude that l = vTSv 0. j j j ≥ 6.2 Derivation of the singular value decomposition: Full rank case We seek to derive the singular value decomposition of a general rect- angular matrix.