Regularized Matrix Computations Andrew E

1 Regularized Matrix Computations Andrew E. Yagle Department of EECS, The University of Michigan, Ann Arbor, MI 48109-2122 H Abstract| We review the basic results on: (1) the ² U is the matrix of eigenvectors for AA ; H singular value decomposition (SVD); (2) sensitivity ² V is the matrix of eigenvectors for A A; and conditioning of solutions of linear systems of equa- H tions; (3) regularization; and (4) iterative solution of ² This is why A = USV instead of A = USV ; H H 2 linear systems of equations. These are applied to the ² AA and A A have same positive eigenvalues σi speci¯c problem of computing a matrix null vector. ² If M 6= N, there are some extra zero eigenvalues. I. SINGULAR VALUE DECOMPOSITION Although this is not how the SVD is actually com- A. Basics puted, it will do here. Using Matlab [U,S,V]=svd(A) changes the signs The singular value decomposition (SVD) of any of the last column of U and the last row of V . You (M £ N) matrix A is should be able to see why this makes no di®erence. 2 3 The null vector of overdetermined A is the last col- A = USV H σ 0 ¢ ¢ ¢ 0 1 umn of V if M > N and the singular values are in U H U = I 6 0 σ ¢ ¢ ¢ 0 7 6 2 7 decreasing order (usually, but check ¯rst!). U is M £ M 6 . 7 6 . .. 7 For convenience, this paper will omit pathological 6 7 S = 6 0 ¢ ¢ ¢ 0 σ 7 (1) cases such as repeated eigenvalues, defective (non- V H V = I 6 N 7 6 0 0 ¢ ¢ ¢ 0 7 diagonalizable) matrices, and the like. V is N £ N 6 7 4 . 5 σ1 ¸ ::: ¸ σN . ¢ ¢ ¢ . II. APPLICATIONS OF THE SVD S is M £ N 0 0 ¢ ¢ ¢ 0 Inserting the SVD A = USV H into linear system Ax = b results in ² S is depicted in (1) for M > N (\tall" matrix A); ² If M < N use the transpose of the S in (1) (for a Ax = (USV H )x = b ! S(V H x) = (U H b) (3) "reclining" matrix A); ² Important: S has the same dimensions as A; This gines the solutions to the following 4 problems. ² This also speci¯es the sizes of U and V . A. Over-Determined Systems Geometrically, the SVD shows that any linear oper- If M > N, the system Ax = b is overdetermined ator can be regarded as the following: (more equations than unknowns). Then there is al- H most surely no solution to the system Ax = b. How- 1. An orthonormal rotation V , followed by ever the x that minimizes (recall U is unitary) 2. Scaling the rotated axes by σi, where 3. Singular values σi ¸ 0, followed by jjAx¡bjj = jjU H (USV H )x¡U H bjj = jjS(V H x)¡U H bjj 4. An orthonormal rotation U. (4) is easily seen to be the solution to the ¯rst N rows of An example of a singular value decomposition: (3), since we can solve the ¯rst N rows exactly, while · ¸ · ¸ · ¸ · ¸T the last M ¡ N rows will always have zero on the left 1:704 0:128 :8 :6 2 0 :96 :28 = side, so the best we can do is replace the right side ¡:928 1:104 ¡:6 :8 0 1 ¡:28 :96 with zero as well. B. Computation To obtain a closed-form solution, premultiply (3) by VSH . This multiplies the last M ¡ N rows by The SVD can be computed by solving the follow- zero, yielding ing two eigenvalue problems: S(V H x) = (U H b) ! (V (SH S)V H )x = (VSH U H )b AAH = (USH V H )(V SU H ) = U(SH S)U H ! (AH A)x = AH b AH A = (VSH U H )(USV H ) = V (SH S)V H H 2 H 2 (5) ! (AA )ui = σi ui and (A A)vi = σi vi (2) which is the pseudo-inverse or Penrose inverse. 2 A special case is ¯nding the null vector of an 2 3 2 3 2 3 2 3 overdetermined system. We solve (AH A)x = 0, i.e., a 110 x 60 ¯nd the eigenvector of AH A associated with its zero 4 b 5 = 4 65 5 ! 4 y 5 ! 4 60 5 (11) eigenvalue. From the de¯nition of SVD above, x is c 47 z 60 H the column VN of V (NOT V ) associated with min- a slight change in the right side yields imum singular value σN = 0. Indeed, the rank of A can be found by counting the number of non-zero σi. 2 3 2 3 2 3 2 3 a 111 x 69 B. Under-Determined Systems 4 b 5 = 4 65 5 ! 4 y 5 ! 4 24 5 (12) c 47 z 90 If M = N and σN > 0, then the system is nonsingular and the solution is Slightly di®erent right sides yield very di®erent solutions! This means: H x = V diag[1/σ1; 1/σ2 ::: 1/σN ]U b (6) ² The solution to this problem will be sensitive to \If real life were only like this!"{Woody Allen line in noise in the data [b c d]'; Annie Hall (Oscar winner for Best Picture of 1977, ² Worse, how do we know whether the true data is beating out Star Wars). b=110 or b=111? If M < N, the system Ax = b is underdetermined ² The very concept of a "right" answer is in question! (more unknowns than equations). Then there is an in¯nite number of solutions. However, the x that In fact, even though the matrix is nonsingular, for H minimizes jjxjj has the ¯nal (N ¡ M) values of V x practical purposes it may as well be singular, since equal to zero, since these values get multiplied by zero there are any number of solutions that could be the in (3) anyways, and having them be nonzero would \right" one. only increase jjxjj. Another example, which illustrates some ideas: To obtain a closed-form solution, premultiply (3) H H H ¡1 · ¸ · ¸ · ¸ · ¸ · ¸ by VS (U U)(SS ) , yielding 1 1000 x 1:00 x 1 = ! = (13) 0 1 y 0:00 y 0 S(V H x) = (U H b) ! V [SH (SSH )¡1S]V H x = [VSH U H ][U(SSH )¡1U H ]b But changing the right side \data" slightly gives (7) · ¸ · ¸ · ¸ · ¸ · ¸ 1 1000 x 1:00 x ¡9 which, with the artful placement of parentheses = ! = (14) 0 1 y 0:01 y 0:01 above, simpli¯es to · ¸ Is data so good that an error of 0.01 is impossible? I 0 V V H x = x = AH (AAH )¡1b (8) 0 0 B. Symptoms of the Problem since S is diagonal and the ¯nal (N ¡ M) values of H That 1000 suggests a problem, but in fact that by V x are constrained to be zero. I haven't seen that itself isn't it. How can we determine when a problem formula since my graduate school days. Implement: will be super-sensitive? These look like symptoms, but aren't: (AAH )z = b; x = AH z ! V H x = SH U H z (9) ² Large elements like 1000? No{plenty of matrices have large elements without being super-sensitive; so the ¯nal (N ¡ M) values of x will indeed be zero. ² The determinant? No{the determinant=1; III. SENSITIVITY AND CONDITIONING ² The eigenvalues? No{both eigenvalues are one (just like the identity matrix). A. What's the Problem? Consider the solution to the linear system These don't look like symptoms, but are: 2 3 2 3 2 3 ² Although the matrix has determinant=1, changing 60 30 20 x b 1 the lower left value from 0 to 0.001 makes the deter- 4 30 20 15 5 4 y 5 = 4 c 5 (10) 60 minant=0 and the eigenvalues 0 and 2; 20 15 12 z d ² The singular values are 0.001 and 1000, and their The matrix is only (3 £ 3), is symmetric, and has in- ratio is 1 million. teger elements. But note that while 3 C. Condition Number A. Truncated SVD We have the following two results on the sensitivity Without loss of generality, scale the problem so of the solution to a linear system of equations to per- σ1 >> 1 >> σN . Going right to left, rewrite (6): turbations in either the right-hand-side or elements of the matrix. De¯ne the condition number XN H x = vi(ui b/σi) (17) ¡1 i=1 ·(A) = cond(A) = jjAjj ¢ jjA jj = σ1/σN (15) which shows that the component of x in the direction jjAjj = jjAjj2 = σ1=maximum singular value of A. vN is most sensitive to noise in the data, since its 1000 6 For the above matrix, ·(A) = 0:001 = 10 . Now: magni¯cation factor 1/σN is largest. However, what is magni¯ed is the component of the data in direction ² Consider the linear system Ax = b: uN , not vN . ² Perturb A to A + ¢A or: Truncated SVD simply truncates the sum in (17) ² Perturb b to b + ¢b resulting in at i = (N ¡K), where the smallest K singular values ² Perturb x to x + ±x so that σN¡K+1 : : : σN < ² for some threshold ². Any noise ² (A + ¢A)(x + ±x) = b or A(x + ±x) = (b + ±b). in the data is magni¯ed so much that these terms H ² Then the normalized=relative=percentage error in vi(ui b/σi) are meaningless anyways, so we may as the computed x is [1,p.194-5]: well eliminate them and accept the loss.

Load more