1 the Singular Value Decomposition

Innotec Lectures Masaryk University Brno The Singular Value Decomposition Prof. Walter Gander ETH Zurich Decenber 1, 2008 1 The Singular Value Decomposition The singular value decomposition (SVD) of a matrix A is very useful in the context of least squares problems. It also very helpful for analyzing properties of a matrix. With the SVD one x-rays a matrix! Theorem 1.1 (The Singular Value Decomposition, SVD). Let A be an (m × n) matrix with m ≥ n. Then there exist orthogonal matrices U (m × m) and V (n × n) and a diagonal matrix Σ = diag(σ1, . , σn)(m × n) with σ1 ≥ σ2 ≥ ... ≥ σn ≥ 0, such that A = UΣV T holds. If σr > 0 is the smallest singular value greater than zero then the matrix A has rank r. Definition 1.1 The column vectors of U = [u1,..., um] are called the left singular vectors and similarly V = [v1,..., vn] are the right singular vectors. The values σi are called the singular values of A. Proof The 2-norm of A is defined by kAk2 = maxkxk=1 kAxk. Thus there exists a vector x with kxk = 1 such that z = Ax, kzk = kAk2 =: σ. Let y := z/kzk. We have obtained Ax = σy with kxk = kyk = 1. Now we construct vectors orthogonal to x and similarly to y and form the matrices V = [x,V1] and U = [y,U1]. This could e.g. be done in Matlab by the command > V = orth([x, rand(n, n-1)]), U = orth([y, rand(n, n-1)]) 1 Now T T T T T y y Ax y AV1 σ w A1 = U AV = T A [x,V1] = T T = U1 U1 Ax U1 AV1 0 B T T T T T because y Ax = y σy = σy y = σ and U1 Ax = σU1 y = 0 since U1 ⊥ y. T T We claim that w := y AV1 = 0. In order to prove that we compute σ σ2 + kwk2 A = 1 w Bw and conclude from that equation that 2 σ 2 2 2 2 2 2 2 A1 = (σ + kwk )) + kBwk ≥ (σ + kwk ) . w T Now since V and U are orthogonal kA1k2 = kU AV k2 = kAk2 = σ holds and σ 2 2 2 2 2 2 2 A1 w (σ + kwk )) σ = kA1k = max kA1xk ≥ 2 ≥ . kxk6=0 σ σ2 + kwk2 w The last equation reads σ2 ≥ σ2 + kwk2, and we conclude that w = 0. Thus we have obtained σ 0 A = U T AV = . 1 0 B We can now apply the same construction to the sub-matrix B and thus finally end up with a diagonal matrix. Though this proof is constructive the singular value decomposition is not computed in this way. An effective algorithm was designed by Golub and Reinsch [6]. They first transform the matrix by orthogonal Householder-transformations to bidiagonal form. Then the bidiagonal matrix is further diagonalized in a iterative process. Let r = rank(A). Then σr+1 = ... = σn = 0. Partition U = [U1,U2] and V = [V1,V2] where U1 = [u1,..., ur] and V1 = [v1,..., vr] have r columns. Then with Σr := diag(σ1, . , σr): Σ 0 A = [U ,U ] r [V ,V ]T (1) 1 2 0 0 1 2 T = U1ΣrV1 (2) r X T = uivi σi (3) i=1 Equation (1) is the full decomposition with square matrices U and V . When making use of the zeros we obtain the “economy” or “reduced” version of the SVD given in Equations (2) and (3). In Matlab there are two variants to compute the SVD: 2 > [U S V ] = svd(A) % gives the full decomposition > [U S V ] = svd(A,0) % gives an m by n matrix U The call svd(A,0) computes a version between full and economic with a non-square matrix U. Example 1.1 The matrix A has rank one and its economic SVD is given by 1 1 1 1 2 1 1 1 1 √ A = = 2 (2 3) √1 √1 √1 1 1 1 1 3 3 3 2 1 1 1 1 2 2 Applications of the SVD T In Equation (3) we have decomposed the matrix A as a sum of matrices uivi of rank one. Since T 2 T 2 T T T T 2 T 2 ||uivi ||2 = max k|uivi xk = max x viui uivi x = max kvi xk = kvi vik = 1 kxk=1 kxk=1 kxk=1 we see from Equation (3) that the matrix A is computed by a weighted sum of matrices for which the value of the norm is the same. The singular values are the weights. The main contribution in the sum is given by the terms with the largest singular values. We see that we may approximate A by a lower rank matrix by dropping the smallest singular values, i.e., changing their values to zero. In fact let M denote the set of m × n matrices with rank p. The solution of min ||A − X||F X∈M is given by p X T Ap = uivi σi. (4) i=1 A proof of this fact is given e.g. in [2]. Theorem 2.1 If A = UΣV T . The column vectors of V are the eigenvectors of the T 2 matrix A A to the eigenvalues σi , i = 1, . , n. The column vectors of U are the eigenvectors of the matrix AAT . Proof T T T T T T 2 2 A A = (UΣV ) UΣV = V DV ,D = Σ Σ = diag(σ1, . , σn). (5) T 2 T Thus A AV = VD and σi is an eigenvalue of A A. Similarly AAT = UΣV (UΣV T )T = U T ΣΣT U T , (6) T 2 2 where ΣΣ = diag(σ1, . , σn, 0,..., 0). 3 Theorem 2.2 Let A = UΣV T . Then v u n uX 2 kAk2 = σ1, and kAkF = t σi . i=1 T Proof Since U and V are orthogonal we have kAk2 = kUΣV k2 = kΣk2. Now 2 2 2 2 2 2 2 2 2 kΣk2 = max(σ1x1 + ··· + σnxn) ≤ σ1(x1 + ··· + xn) = σ1 kxk=1 and since the max is attained for x = e1 it follows kAk2 = σ1. For the Frobenius norm we have v s u n X 2 p T uX 2 kAkF = aij = trace(A A) = t σi . i,j i=1 Theorem 2.3 Let A = UΣV T . Then the problem kAxk2 = min, subject to kxk2 = 1 (7) has the solution x = vn and the value of the minimum is minkxk2=1 kAxk2 = σn. Proof We make use of the fact that for orthogonal V and V T x = y we have ||x|| = ||VV T x|| = ||V y|| = ||y||: T T min kAxk2 = min kUΣV xk2 = min kUΣ(V x)k2 T kxk2=1 kxk2=1 kVV xk2=1 2 2 2 2 2 = min kΣyk2 = min (σ1y1 + ··· + σnyn) ≥ σn kyk2=1 kyk2=1 The minimum is attained for y = en thus for x = V y = vn. 2.1 The Condition of the Linear Least Squares Problem The principle of Wilkinson states that the result of a numerical computation is the result of an exact computation for a slightly perturbed problem. This result allows us to estimate the influence of finite arithmetic. A problem is said to be well conditioned if the results do not differ too much when solving a perturbed problem. For an ill conditioned problem the solution of a perturbed problem may be very different. Consider a system of linear equations Ax = b with A n × n nonsingular and a perturbed system (A + E)x() = b where is small like e.g. the machine precision. How do the solutions x() and x = x(0) differ? We want to expand x() ' x(0) + x˙ (0). The derivative x˙ (0) is obtained by differentiating: (A + E)x() = b Ex() + (A + E)x˙ () = 0 ⇒ x˙ (0) = −A−1Ex(0) ⇒ x() ' x(0) − A−1Ex(0) ||x() − x(0)|| ' ||A−1|| || E|| ||x(0)||. 4 From the last equation we conclude for the relative error ||x() − x|| || E|| ' ||A−1|| ||A|| . (8) ||x|| ||A|| | {zκ } condition number If we use the 2-norm as matrix norm then ||Ax||2 −1 1 ||A|| := max = σmax(A), kA k = x6=0 ||x||2 σmin(A) and the condition number is computed by σ (A) κ = max = cond(A) in Matlab. σmin(A) Thus according to the principle of Wilkinson we have to expect that the numerical solution may deviate by about κ units in the last digit from the exact solution. A similar but more difficult computation shows (see [1]) for ||b − Ax||2 = min and ||b − (A + E)x||2 = min the estimate ||x() − x|| ||r|| || E|| ' 2κ + κ2 (Golub-Pereyra), (9) ||x|| ||A|| ||x|| ||A|| where σ (A) κ := ||A|| ||A+|| = 1 , σr(A) with A+ the Pseudoinverse, see the next section. Equation (9) tells us again what accuracy we can expect from the numerical solution. We have to distinguish between good and bad models. For good models, i.e. if the residual ||r|| is small, the situation is similar as for linear equations (8): the error in the solution may deviate by about κ units in the last digit from the exact solution. However, when the model is bad, i.e. when ||r|| is large then also the condition is worse and we must expect a larger error in the computed solution. 2.2 Normal Equations and Condition Using the normal equations we have to expect worse numerical results as predicted by Equation (9) also for good models. Forming AT A leads to a matrix with a squared condition number compared to the original matrix A.

1 the Singular Value Decomposition

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support