1 Inverse Iteration and the QR Method 2 Shifting Gears

Bindel, Fall 2019 Matrix Computation 2019-10-30 1 Inverse iteration and the QR method When we discussed the power method, we found that we could improve convergence by a spectral transformation that mapped the eigenvalue we wanted to something with large magnitude (preferably much larger than the other eigenvalues). This was the shift-invert strategy. We already know there is a connection leading from the power method to orthogonal iteration to the QR method, which we can summarize with a small number of formulas. Let us see if we can follow the same path to uncover a connection from inverse iteration (the power method with A−1, a special case of shift-invert in which the shift is zero) to QR. If we call the orthogonal factors in orthogonal iteration Q(k) (Q(0) = I) and the iterates in QR iteration A(k), we have (1) Ak = Q(k)R(k) (2) A(k) = (Q(k))∗A(Q(k)): In particular, note that because R(k) are upper triangular, k (k) (k) A e1 = (Q e1)r11 ; that is, the first column of Q(k) corresponds to the kth step of power iteration starting at e1. What happens when we consider negative powers of A? Inverting (1), we find A−k = (R(k))−1(Q(k))∗ The matrix R~(k) = (R(k))−1 is again upper triangular; and if we look carefully, we can see in this fact another power iteration: ∗ −k ∗ ~(k) (k) ∗ (k) (k) ∗ enA = enR (Q ) =r ~nn (Q en) : That is, the last column of Q(k) corresponds to a power iteration converging to a row eigenvector of A−1. 2 Shifting gears The connection from inverse iteration to orthogonal iteration (and thus to QR iteration) gives us a way to incorporate the shift-invert strategy into QR Bindel, Fall 2019 Matrix Computation iteration: simply run QR on the matrix A − σI, and the (n; n) entry of the iterates (which corresponds to a Rayleigh quotient with an increasingly-good approximate row eigenvector) should start to converge to λ − σ, where λ is the eigenvalue nearest σ. Put differently, we can run the iteration: Q(k)R(k) = A(k−1) − σI A(k) = R(k)Q(k) + σI: If we choose a good shift, then the lower right corner entry of A(k) should converge to the eigenvalue closest to σ in fairly short order, and the rest of the elements in the last row should converge to zero. The shift-invert power iteration converges fastest when we choose a shift that is close to the eigenvalue that we want. We can do even better if we choose a shift adaptively, which was the basis for running Rayleigh quotient iteration. The same idea is the basis for the shifted QR iteration: (k) (k) (k−1) (3) Q R = A − σkI (k) (k) (k) (4) A = R Q + σkI: This iteration is equivalent to computing Yn (k) (k) Q R = (A − σjI) j=1 A(k) = (Q(k))∗A(Q(k)) Q(k) = Q(k)Q(k−1) :::Q(1): What should we use for the shift parameters σk? A natural choice is ∗ (k−1) (k) ∗ (k) to use σk = enA en, which is the same as σk = (Q en) A(Q en), the Rayleigh quotient based on the last column of Q(k). This simple shifted QR iteration is equivalent to running Rayleigh iteration starting from an initial vector of en, which we noted before is locally quadratically convergent. 3 Double trouble The simple shift strategy we described in the previous section gives local quadratic convergence, but it is not globally convergent. As a particularly pesky example, consider what happens if we want to compute a complex Bindel, Fall 2019 Matrix Computation conjugate pair of eigenvalues of a real matrix. With our simple shifting strategy, the QR iteration never produce a complex iterate, a complex shift, or a complex eigenvalue. The best we can hope for is that our initial shift is closer to both eigenvalues in the conjugate pair than it is to anything else in the spectrum; in this case, we will most likely find that the last two columns of Q(k) are converging to a basis for an invariant row subspace of A, and the corresponding eigenvalues are the eigenvalues of the trailing 2-by-2 sub-block. Fortunately, we know how to compute the eigenvalues of a 2-by-2 matrix! This suggests the following shift strategy: let σk be one of the eigenvalues of A(k)(n−1 : n; n−1 : n). Because this 2-by-2 problem can have complex roots even when the matrix is real, this shift strategy allows the possibility that we could converge to complex eigenvalues. On the other hand, if our original matrix is real, perhaps we would like to consider the real Schur form, in which U is a real matrix and T is block diagonal with 1-by-1 and 2-by-2 diagonal blocks that correspond, respectively, to real and complex eigenvalues. If we shift with both roots of A(k)(n − 1 : n; n − 1 : n), equivalent to computing (k) (k) (k−1) (k−1) Q R = (A − σk+I)(A − σk−) A(k) = (Q(k))∗A(k−1)Q(k): There is one catch here: even if we started with A(0) in Hessenberg form, it is unclear how to do this double-shift step in O(n2) time! The following fact will prove our salvation: if we Q and V are both orthogonal matrices and QT AQ and V T AV are both (unreduced) Hessenberg1) and the first column of Q is the same as the first column of V , then all suc- cessive columns of Q are unit scalar multiples of the corresponding columns of V . This is the implicit Q theorem. Practically, it means that we can do any sort of shifted QR step we would like in the following way: 1. Apply as a similarity any transformations in the QR decomposition that affect the leading submatrix (1-by-1 or 2-by-2). 2. Restore the resulting matrix to Hessenberg form without further transformations to the leading submatrix. In the first step, we effectively compute the first columnof Q; in the second step, we effectively compute the remaining columns. Certainly we compute 1 An unreduced Hessenberg matrix has no zeros on the first subdiagonal. Bindel, Fall 2019 Matrix Computation some transformation with the right leading column; and the implicit Q theorem tells us that any such transformation is basically the one we would have computed with an ordinary QR step. Last time, we discussed the Wilkinson strategy of choosing as a shift one of the roots of the trailing 2-by-2 submatrix of A(k) (the one closest to the final entry). We also noted that if we want to convert to real Schur form, the Wilkinson shift has the distinct disadvantage that it might launch us into the complex plane. The Francis shift strategy is to simultaneously apply a complex conjugate pair of shifts, essentially computing two steps together: (k) (k) (k−1) (k−1) Q R = (A − σkI)(A − σ¯kI) (k−1) 2 (k−1) 2 = (A ) − 2<(σk)A + jσkj I A(k) = (Q(k))∗A(k−1)(Q(k)): When the Wilkinson shift is real, we let σk be the same as the Wilkinson shift; when the Wilkinson strategy leads to a conjugate pair of possible shifts, we use both, maintaining efficiency by doing the steps implicitly. Let’s now make this implicit magic a little more explicit by building code for an implicit double-shift QR step. Our first step will be to construct the polynomial associated withthe Francis double-shift. In the case where the trailing 2-by-2 submatrix (or 2- by-2 block Rayleigh quotient, if one prefers) has a complex pair of eigenvalues, we just use its characteristic polynomial. Otherwise, we use the polynomial associated with two steps with a Wilkinson shift. 1 % [b,c] = francis_poly(H) 2 % 3 % Compute b, c s.t. z^2 + b*z + c = (z-sigma)(z-conj(sigma)) 4 % where sigma is the Francis double shift for H. 5 % 6 function [b,c] = francis_poly(H) 7 8 % Get shifts via trailing submatrix 9 HH = H(end-1:end,end-1:end); 10 trHH = HH(1,1)+HH(2,2); 11 detHH = HH(1,1)*HH(2,2)-HH(1,2)*HH(2,1); 12 13 if trHH^2 > 4*detHH % Real eigenvalues 14 15 % Use the one closer to H(n,n) 16 lHH(1) = (trHH + sqrt(trHH^2-4*detHH))/2; Bindel, Fall 2019 Matrix Computation 17 lHH(2) = (trHH - sqrt(trHH^2-4*detHH))/2; 18 if abs(lHH(1)-H(end,end)) < abs(lHH(2)-H(end,end)) 19 lHH(2) = lHH(1); 20 else 21 lHH(1) = lHH(2); 22 end 23 24 % z^2 + bz + c = (z-sigma_1)(z-sigma_2) 25 b = -lHH(1)-lHH(2); 26 c = lHH(1)*lHH(2); 27 28 else 29 30 % In the complex case, we want the char poly for HH 31 b = -trHH; 32 c = detHH; 33 34 end The code francis_poly gives us coefficients bk and ck for a quadratic 2 function sk(z) = z + bkz + ck. We now want to compute (k) (k) (k−1) (k−1) 2 (k−1) Q R = sk(A ) = (A ) + bkA + ckI A(k) = (Q(k))∗A(k−1)(Q(k)): The trick is to realize that all the iterates A(k) are Hessenberg, and the Hessenberg form for a matrix is usually unique (up to signs).

Load more