The Eigenvalue Problem: Perturbation Theory
Total Page:16
File Type:pdf, Size:1020Kb
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just as the problem of solving a system of linear equations Ax = b can be sensitive to pertur- bations in the data, the problem of computing the eigenvalues of a matrix can also be sensitive to perturbations in the matrix. We will now obtain some results concerning the extent of this sensitivity. Suppose that A is obtained by perturbing a diagonal matrix D by a matrix F whose diagonal entries are zero; that is, A = D + F . If is an eigenvalue of A with corresponding eigenvector x, then we have (D − I)x + F x = 0: If is not equal to any of the diagonal entries of A, then D − I is nonsingular and we have x = −(D − I)−1F x: Taking 1-norms of both sides, we obtain −1 −1 kxk1 = k(D − I) F xk1 ≤ k(D − I) F k1kxk1; which yields n −1 X jfijj k(D − I) F k1 = max ≥ 1: 1≤i≤n jdii − j j=1;j6=i It follows that for some i, 1 ≤ i ≤ n, satisfies n X jdii − j ≤ jfijj: j=1;j6=i That is, lies within one of the Gerschgorin circles in the complex plane, that has center aii and radius n X ri = jaijj: j=1;j6=i 1 This is result is known as the Gerschgorin Circle Theorem. Example The eigenvalues of the matrix 2 −5 −1 1 3 A = 4 −2 2 −1 5 1 −3 7 are (A) = f6:4971; 2:7930; −5:2902g: The Gerschgorin disks are D1 = fz 2 C j jz − 7j ≤ 4g;D2 = fz 2 C j jz − 2j ≤ 3g;D3 = fz 2 C j jz + 5j ≤ 2g: We see that each disk contains one eigenvalue. 2 It is important to note that while there are n eigenvalues and n Gerschgorin disks, it is not necessarily true that each disk contains an eigenvalue. The Gerschgorin Circle Theorem only states that all of the eigenvalues are contained within the union of the disks. Another useful sensitivity result that applies to diagonalizable matrices is the Bauer-Fike The- orem, which states that if −1 X AX = diag(1; : : : ; n); and is an eigenvalue of a perturbed matrix A + E, then min j − j ≤ p(X)kEkp: 2(A) That is, is within p(X)kEkp of an eigenvalue of A. It follows that if A is \nearly non- diagonalizable", which can be the case if eigenvectors are nearly linearly dependent, then a small perturbation in A could still cause a large change in the eigenvalues. It would be desirable to have a concrete measure of the sensitivity of an eigenvalue, just as we have the condition number that measures the sensitivity of a system of linear equations. To that end, we assume that is a simple eigenvalue of an n × n matrix A that has Jordan canonical form −1 J = X AX. Then, = Jii for some i, and xi, the ith column of X, is a corresponding right eigenvector. −H −1 H If we define Y = X = (X ) , then yi is a left eigenvector of A corresponding to . From Y H X = I, it follows that yH x = 1. We now let A, , and x be functions of a parameter that satisfy A()x() = ()x();A() = A + F; kF k2 = 1: Differentiating with respect to , and evaluating at = 0, yields F x + Ax0(0) = x0(0) + 0(0)x: 2 Taking the inner product of both sides with y yields yH F x + yH Ax0(0) = yH x0(0) + 0(0)yH x: Because y is a left eigenvector corresponding to , and yH x = 1, we have yH F x + yH x0(0) = yH x0(0) + 0(0): We conclude that 0 H j (0)j = jy F xj ≤ kyk2kF k2kxk2 ≤ kyk2kxk2: However, because , the angle between x and y, is given by yH x 1 cos = = ; kyk2kxk2 kyk2kxk2 it follows that 1 j0(0)j ≤ : j cos j We define 1=j cos j to be the condition number of the simple eigenvalue . We require to be simple because otherwise, the angle between the left and right eigenvectors is not unique, because the eigenvectors themselves are not unique. It should be noted that the condition number is also defined by 1=jyH xj, where x and y are normalized so that kxk2 = kyk2 = 1, but either way, the condition number is equal to 1=j cos j. The interpretation of the condition number is that an O() perturbation in A can cause an O(/j cos j) perturbation in the eigenvalue . Therefore, if x and y are nearly orthogonal, a large change in the eigenvalue can occur. Furthermore, if the condition number is large, then A is close to a matrix with a multiple eigenvalue. Example The matrix 2 3:1482 −0:2017 −0:5363 3 A = 4 0:4196 0:5171 1:0888 5 0:3658 −1:7169 3:3361 has a simple eigenvalue = 1:9833 with left and right eigenvectors x = 0:4150 0:6160 0:6696 T ; y = −7:9435 83:0701 −70:0066 T H such that y x = 1. It follows that the condition number of this eigenvalue is kxk2kyk2 = 108:925. In fact, the nearby matrix 2 3:1477 −0:2023 −0:5366 3 B = 4 0:4187 0:5169 1:0883 5 0:3654 −1:7176 3:3354 3 has a double eigenvalue that is equal to 2. 2 We now consider the sensitivity of repeated eigenvalues. First, it is important to note that while the eigenvalues of a matrix A are continuous functions of the entries of A, they are not necessarily differentiable functions of the entries. To see this, we consider the matrix 1 a A = ; 1 where a > 0. Computing its characteristic polynomial det(A − I) = 2 − 2 + 1 − a p and computings its roots yields the eigenvalues = 1 ± a. Differentiating these eigenvalues with respect to yields d ra = ± ; d which is undefined at = 0. In general, an O() perturbation in A causes an O(1=p) perturbation in an eigenvalue associated with a p×p Jordan block, meaning that the \more defective" an eigenvalue is, the more sensitive it is. We now consider the sensitivity of eigenvectors, or, more generally, invariant subspaces of a matrix A, such as a subspace spanned by the first k Schur vectors, which are the first k columns in a matrix Q such that QH AQ is upper triangular. Suppose that an n × n matrix A has the Schur decomposition H T11 T12 A = QT Q ;Q = Q1 Q2 ;T = ; 0 T22 where Q1 is n × r and T11 is r × r. We define the separation between the matrices T11 and T22 by kT11X − XT22kF sep(T11;T22) = min : X6=0 kXkF It can be shown that an O() perturbation in A causes a O(/sep(T11;T22)) perturbation in the invariant subspace Q1. We now consider the case where r = 1, meaning that Q1 is actually a vector q1, that is also an eigenvector, and T11 is the corresponding eigenvalue, . Then, we have kX − XT22kF sep(, T22) = min X6=0 kXkF H = min ky (T22 − I)k2 kyk2=1 H = min k(T22 − I) yk2 kyk2=1 H = min((T22 − I ) ) = min(T22 − I); 4 since the Frobenius norm of a vector is equivalent to the vector 2-norm. Because the smallest singular value indicates the distance to a singular matrix, sep(, T22) provides a measure of the separation of from the other eigenvalues of A. It follows that eigenvectors are more sensitive to perturbation if the corresponding eigenvalues are clustered near one another. That is, eigenvectors associated with nearby eigenvalues are \wobbly". It should be emphasized that there is no direct relationship between the sensitivity of an eigen- value and the sensitivity of its corresponding invariant subspace. The sensitivity of a simple eigen- value depends on the angle between its left and right eigenvectors, while the sensitivity of an invariant subspace depends on the clustering of the eigenvalues. Therefore, a sensitive eigenvalue, that is nearly defective, can be associated with an insensitive invariant subspace, if it is distant from other eigenvalues, while an insensitive eigenvalue can have a sensitive invariant subspace if it is very close to other eigenvalues. The Symmetric Eigenvalue Problem In the symmetric case, the Gerschgorin circles become Gerschgorin intervals, because the eigenval- ues of a symmetric matrix are real. Example The eigenvalues of the 3 × 3 symmetric matrix 2 −10 −3 2 3 A = 4 −3 4 −2 5 2 −2 14 are (A) = f14:6515; 4:0638; −10:7153g: The Gerschgorin intervals are D1 = fx 2 R j jx − 14j ≤ 4g;D2 = fx 2 R j jx − 4j ≤ 5g;D3 = fx 2 R j jx + 10j ≤ 5g: We see that each intervals contains one eigenvalue. 2 The characterization of the eigenvalues of a symmetric matrix as constrained maxima of the Rayleight quotient lead to the following results about the eigenvalues of a perturbed symmetric matrix. As the eigenvalues are real, and therefore can be ordered, we denote by i(A) the ith largest eigenvalue of A. Theorem (Wielandt-Hoffman) If A and A + E are n × n symmetric matrices, then n X 2 2 (i(A + E) − i(A)) ≤ kEkF : i=1 It is also possible to bound the distance between individual eigenvalues of A and A + E. 5 Theorem If A and A + E are n × n symmetric matrices, then n(E) ≤ k(A + E) − k(A) ≤ 1(E): Furthermore, jk(A + E) − k(A)j ≤ kEk2: The second inequality in the above theorem follows directly from the first, as the 2-norm of the symmetric matrix E, being equal to its spectral radius, must be equal to the larger of the absolute value of 1(E) or n(E).