The Singular Value Decomposition
Total Page:16
File Type:pdf, Size:1020Kb
The Singular Value Decomposition Prof. Walter Gander ETH Zurich Decenber 12, 2008 Contents 1 The Singular Value Decomposition 1 2 Applications of the SVD 3 2.1 Condition Numbers . 4 2.2 Normal Equations and Condition . 7 2.3 The Pseudoinverse . 7 2.4 Fundamental Subspaces . 8 2.5 General Solution of the Linear Least Squares Problem . 9 2.6 Fitting Lines . 10 2.7 Fitting Ellipses . 11 3 Fitting Hyperplanes{Collinearity Test 13 4 Total Least Squares 15 5 Bibliography 18 1 The Singular Value Decomposition The singular value decomposition (SVD) of a matrix A is very useful in the context of least squares problems. It also very helpful for analyzing properties of a matrix. With the SVD one x-rays a matrix! Theorem 1.1 (The Singular Value Decomposition, SVD). Let A be an (m×n) matrix with m ≥ n. Then there exist orthogonal matrices U (m × m) and V (n × n) and a diagonal matrix Σ = diag(σ1; : : : ; σn)(m × n) with σ1 ≥ σ2 ≥ ::: ≥ σn ≥ 0, such that A = UΣV T holds. If σr > 0 is the smallest singular value greater than zero then the matrix A has rank r. Definition 1.1 The column vectors of U = [u1;:::; um] are called the left singular vectors and similarly V = [v1;:::; vn] are the right singular vectors. The values σi are called the singular values of A. Proof The 2-norm of A is defined by kAk2 = maxkxk=1 kAxk. Thus there exists a vector x with kxk = 1 such that z = Ax; kzk = kAk2 =: σ: Let y := z=kzk. We have obtained Ax = σy with kxk = kyk = 1. Now we construct vectors orthogonal to x and similarly to y and form the matrices V = [x;V1] and U = [y;U1]. This could e.g. be done in Matlab by the command > V = orth([x, rand(n, n-1)]), U = orth([y, rand(n, n-1)]) 1 Now T T T T T y y Ax y AV1 σ w A1 = U AV = T A [x;V1] = T T = U1 U1 Ax U1 AV1 0 B T T T T T because y Ax = y σy = σy y = σ and U1 Ax = σU1 y = 0 since U1 ? y. T T We claim that w := y AV1 = 0. In order to prove that we compute σ σ2 + kwk2 A = 1 w Bw and conclude from that equation that 2 σ 2 2 2 2 2 2 2 A1 = (σ + kwk )) + kBwk ≥ (σ + kwk ) : w T Now since V and U are orthogonal kA1k2 = kU AV k2 = kAk2 = σ holds and σ 2 2 2 2 2 2 2 A1 w (σ + kwk )) σ = kA1k = max kA1xk ≥ 2 ≥ : kxk6=0 σ σ2 + kwk2 w The last equation reads σ2 ≥ σ2 + kwk2; and we conclude that w = 0. Thus we have obtained σ 0 A = U T AV = : 1 0 B We can now apply the same construction to the sub-matrix B and thus finally end up with a diagonal matrix. Though this proof is constructive the singular value decomposition is not computed in this way. An effective algorithm was designed by Golub and Reinsch [6]. They first transform the matrix by orthogonal Householder-transformations to bidiagonal form. Then the bidiagonal matrix is further diagonalized in a iterative process. Let r = rank(A). Then σr+1 = ::: = σn = 0. Partition U = [U1;U2] and V = [V1;V2] where U1 = [u1;:::; ur] and V1 = [v1;:::; vr] have r columns. Then with Σr := diag(σ1; : : : ; σr): Σ 0 A = [U ;U ] r [V ;V ]T (1) 1 2 0 0 1 2 T = U1ΣrV1 (2) r X T = uivi σi (3) i=1 Equation (1) is the full decomposition with square matrices U and V . When making use of the zeros we obtain the \economy" and the \reduced" version (Equation (2)) of the SVD. In Matlab there are two variants to compute the SVD: > [U S V ] = svd(A) % gives the full decomposition > [U S V ] = svd(A,0) % gives the economy version with an m-by-n matrix U The economy version computed by Matlab is Equation (2) with r = n. To compute the reduced version (Equation (3)) we need to make a rank decision, that is define the variable r. Example 1.1 The matrix A has rank one and its reduced SVD is given by 0 1 1 0 1 1 1 1 2 1 1 1 B 1 C p A = B C = B 2 C (2 3) p1 p1 p1 B 1 1 1 C B 1 C 3 3 3 @ A @ 2 A 1 1 1 1 2 2 A = ones(4,3) [U,S,V] = svd(A) U = -5.0000e-01 8.6603e-01 -4.1633e-17 0 -5.0000e-01 -2.8868e-01 -5.7735e-01 -5.7735e-01 -5.0000e-01 -2.8868e-01 7.8868e-01 -2.1132e-01 -5.0000e-01 -2.8868e-01 -2.1132e-01 7.8868e-01 S = 3.4641e+00 0 0 0 3.1402e-16 0 0 0 0 0 0 0 V = -5.7735e-01 8.1650e-01 0 -5.7735e-01 -4.0825e-01 -7.0711e-01 -5.7735e-01 -4.0825e-01 7.0711e-01 [U,S,V] = svd(A,0) %%% economy SVD U = -5.0000e-01 8.6603e-01 -4.1633e-17 -5.0000e-01 -2.8868e-01 -5.7735e-01 -5.0000e-01 -2.8868e-01 7.8868e-01 -5.0000e-01 -2.8868e-01 -2.1132e-01 S = 3.4641e+00 0 0 0 3.1402e-16 0 0 0 0 V = -5.7735e-01 8.1650e-01 0 -5.7735e-01 -4.0825e-01 -7.0711e-01 -5.7735e-01 -4.0825e-01 7.0711e-01 S(1)*U(:,1)*V(:,1)' %%% reduced svd ans = 1.0000e-00 1.0000e-00 1.0000e-00 1.0000e+00 1.0000e+00 1.0000e-00 1.0000e+00 1.0000e+00 1.0000e-00 1.0000e+00 1.0000e+00 1.0000e-00 > ans-A ans = -2.2204e-16 -2.2204e-16 -3.3307e-16 0 0 -1.1102e-16 0 0 -1.1102e-16 0 0 -1.1102e-16 2 Applications of the SVD T In Equation (3) we have decomposed the matrix A as a sum of matrices uivi of rank one. Since T 2 T 2 T 2 T 2 jjuivi jj2 = max kui(vi x)k = max (vi x) = (vi vi) = 1 kxk=1 kxk=1 we see from Equation (3) that the matrix A is computed by a weighted sum of matrices for which the value of the norm is the same. The singular values are the weights. The main contribution in the sum is given by the terms with the largest singular values. We see that we may approximate A by a lower rank matrix by dropping the smallest singular values, i.e., changing their values to zero. In fact let M denote the set of m × n matrices with rank p. The solution of min jjA − XjjF X2M is given by p X T Ap = uivi σi: (4) i=1 A proof of this fact is given e.g. in [2]. Theorem 2.1 If A = UΣV T . The column vectors of V are the eigenvectors of the matrix AT A to 2 T the eigenvalues σi , i = 1; : : : ; n. The column vectors of U are the eigenvectors of the matrix AA . Proof T T T T T T 2 2 A A = (UΣV ) UΣV = V DV ;D = Σ Σ = diag(σ1; : : : ; σn): (5) 3 T 2 T Thus A AV = VD and σi is an eigenvalue of A A. Similarly AAT = UΣV (UΣV T )T = U T ΣΣT U T ; (6) T 2 2 where ΣΣ = diag(σ1; : : : ; σn; 0;:::; 0). Theorem 2.2 Let A = UΣV T . Then v u n uX 2 kAk2 = σ1; and kAkF = t σi : i=1 T Proof Since U and V are orthogonal we have kAk2 = kUΣV k2 = kΣk2. Now 2 2 2 2 2 2 2 2 2 kΣk2 = max (σ1x1 + ··· + σnxn) ≤ σ1(x1 + ··· + xn) = σ1 kxk=1 and since the max is attained for x = e1 it follows kAk2 = σ1. For the Frobenius norm we have v s q u n X 2 T uX 2 kAkF = aij = trace(A A) = t σi : i;j i=1 Theorem 2.3 Let A = UΣV T . Then the problem kAxk2 = min; subject to kxk2 = 1 (7) has the solution x = vn and the value of the minimum is minkxk2=1 kAxk2 = σn. Proof We make use of the fact that for orthogonal V and V T x = y we have jjxjj = jjVV T xjj = jjV yjj = jjyjj: min kAxk2 = min kUΣV T xk2 = min kUΣ(V T x)k2 2 2 T 2 kxk2=1 kxk2=1 kVV xk2=1 2 2 2 2 2 2 = min kΣyk2 = min (σ1y1 + ··· + σnyn) ≥ σn kyk2=1 kyk2=1 The minimum is attained for y = en thus for x = V y = vn. 2.1 Condition Numbers The principle of Wilkinson states that the result of a numerical computation is the result of an exact computation for a slightly perturbed problem. This result allows us to estimate the influence of finite arithmetic. A problem is said to be well conditioned if the results do not differ too much when solving a perturbed problem. For an ill conditioned problem the solution of a perturbed problem may be very different. Consider a system of linear equations Ax = b with A n×n nonsingular and a perturbed system (A + E)x() = b where is small like e.g.