Section 5.3: Other Krylov Subspace Methods
Total Page:16
File Type:pdf, Size:1020Kb
Section 5.3: Other Krylov Subspace Methods Jim Lambers March 15, 2021 • So far, we have only seen Krylov subspace methods for symmetric positive definite matrices. • We now generalize these methods to arbitrary square invertible matrices. Minimum Residual Methods • In the derivation of the Conjugate Gradient Method, we made use of the Krylov subspace 2 k−1 K(r1; A; k) = spanfr1;Ar1;A r1;:::;A r1g to obtain a solution of Ax = b, where A was a symmetric positive definite matrix and (0) (0) r1 = b − Ax was the residual associated with the initial guess x . • Specifically, the Conjugate Gradient Method generated a sequence of approximate solutions x(k), k = 1; 2;:::; where each iterate had the form k (k) (0) (0) X x = x + Qkyk = x + qjyjk; k = 1; 2;:::; (1) j=1 and the columns q1; q2;:::; qk of Qk formed an orthonormal basis of the Krylov subspace K(b; A; k). • In this section, we examine whether a similar approach can be used for solving Ax = b even when the matrix A is not symmetric positive definite. (k) • That is, we seek a solution x of the form of equation (??), where again the columns of Qk form an orthonormal basis of the Krylov subspace K(b; A; k). • To generate this basis, we cannot simply generate a sequence of orthogonal polynomials using a three-term recurrence relation, as before, because the bilinear form T hf; gi = r1 f(A)g(A)r1 does not satisfy the requirements for a valid inner product when A is not symmetric positive definite. • Therefore, we must use a different approach. • To that end, we make use of Gram-Schmidt Orthogonalization. • Suppose that we have already generated a sequence of orthonormal vectors q1; q2;:::; qk that span K(b; A; k). 1 • To obtain a vector qk+1 such that q1; q2;:::; qk+1 is an orthonormal basis for K(b; A; k + 1), we first compute Aqk, and then orthogonalize this vector against q1; q2;:::; qk. • This is accomplished as follows: k X T vk = Aqk − qj(qj Aqk); j=1 vk qk+1 = : kvkk2 • It can be verified directly that for i = 1; 2; : : : ; k; k T T X T T T T T qi vk = qi Aqk − qi qj(qj Aqk) = qi Aqk − qi qi(qi Aqk) = 0; j=1 so the vectors q1; q2;:::; qk+1 do indeed form an orthonormal set. (0) • To begin the sequence, we define v0 = r1 = b − Ax and let q1 = v0=kv0k2. • If we define T hij = qi Aqj; i = 1; 2; : : : ; j; hj+1;j = kvjk2; it follows from the orthogonalization process that j X T vj = Aqj − qi(qi Aqj) i=1 j X vj = Aqj − qihij i=1 j X vj + qihij = Aqj i=1 j X kvjk2qj+1 + qihij = Aqj i=1 j X Aqj = hijqi + hj+1;jqj+1; j = 1; 2; : : : ; k; i=1 or, in matrix form, from AQk = Aq1 Aq2 ··· Aqk T ~ AQk = QkHk + hk+1;kqk+1ek = Qk+1Hk; (2) where Hk, as well as H~k, is an upper Hessenberg matrix 2 3 h11 h12 ······ h1j 6 h21 h22 ······ h2j 7 6 7 6 .. 7 ~ Hk Hk = 6 0 h32 . 7 ; Hk = T : 6 7 hk+1;ke 6 . .. .. 7 k 4 . 0 . 5 0 ··· 0 hk;k−1 hkk That is, a matrix is upper Hessenberg if all entries below the subdiagonal are zero. 2 • ALWAYS remember this: a linear combination of column vectors can be viewed as a matrix-vector product! Also, a \horizontal stacking" of such linear combinations can be viewed as a matrix-matrix product. • Example: for j = 2; 3, 2 3 j h13 X h12 qihij = q1 q2 ; q1 q2 q3 4 h23 5 h22 i=1 h33 Assemble: 2 3 h12 h13 P2 P3 i=1 qihi2 i=1 qihi3 = q1 q2 q3 4 h22 h23 5 0 h33 (k) (0) • To find the vector yk such that x = x + Qkyk is an approximate solution to Ax = b, we aim to minimize the norm of the residual, (k) krk+1k2 = kb − Ax k2 (0) = kb − Ax − AQkykk2 = kv0 − AQkykk2 = kkv0k2q1 − Qk+1H~kykk2 T ~ = kQk+1(kv0k2Qk+1e1 − Qk+1Hkyk)k2 = kh10e1 − H~kykk2; where h10 = kv0k2. • This is an example of a full-rank least-squares problem, which can be solved using various approaches that are discussed in Chapter 4. • We now present the algorithm for solving Ax = b by minimizing the norm of the residual (k) rk+1 = b − Ax . • This method is the Generalized Minimum Residual (GMRES) Method. 3 n×n Algorithm. (Generalized Minimum Residual Method) Let A 2 R be nonsin- n gular, and let b 2 R . The following algorithm computes the solution x of the system of linear equations Ax = b. (0) n Choose an initial guess x 2 R (0) v0 = b − Ax h10 = kv0k2 j = 0 while hj+1;j > T OL qj+1 = vj=hj+1;j j = j + 1 vj = Aqj for i = 1; 2; : : : ; j T hij = qi vj vj = vj − hijqi end for hj+1;j = kvjk2 (j) (0) x = x + Qjyj where kh10e1 − H~jyjk2 = minimum end while • The iteration that produces the orthonormal basis q1, q2, :::, for the Krylov subspace K(A; b; j) is known as Arnoldi Iteration. • Now, suppose that A is symmetric. From Arnoldi Iteration, we have T AQk = QkHk + βkqk+1ek ; βk = hk+1;k: T • It follows from the orthonormality of the columns of Qk that Hk = Qk AQk, and therefore T T T Hk = Qk A Qk. • Therefore, if A is symmetric, so is Hk, but since Hk is also upper Hessenberg, this means that Hk is a tridiagonal matrix, which we refer to as Tk. • In fact, Tk is the very tridiagonal matrix obtained by applying the Lanczos Iteration to A with initial vector b. • This can be seen from the fact that both algorithms generate an orthonormal basis for the Krylov subspace K(A; v0; k). • Therefore, to solve Ax = b when A is symmetric, we can perform Lanczos Iteration to com- pute Tk, for k = 1; 2;:::; and at each iteration, solve the least squares problem of minimizing (k) (0) kb − Ax k2 = kβ0e1 − T~kykk2 = minimum, where β0 = kb − Ax k2 and T~k is defined in the same way as H~k. • This algorithm is known as the MINRES (Minimum Residual) method. • One key difference between MINRES and GMRES is that in GMRES, the computation of qk+1 depends on q1; q2;:::; qk, whereas in MINRES, it only depends on qk−1 and qk, due to the three-term recurrence relation satisfied by the Lanczos vectors. 4 • It follows that the solution of the least squares problem during each iteration becomes substan- tially more expensive in GMRES, whereas this is not the case for MINRES; the approximate solution x(k) can be updated from one iteration to the next. • The consequence of this added expense for GMRES is that it must occasionally be restarted in order to contain the expense of solving these least squares problems. The Biconjugate Gradient Method • As we have seen, our first attempt to generalize the Conjugate Gradient Method to gen- eral matrices, GMRES, has the drawback that the computational expense of each iteration increases. • To work around this problem, we consider whether it is possible to generalize Lanczos Itera- tion to unsymmetric matrices in such a way as to generate a Krylov subspace basis using a short recurrence relation, such as the three-term recurrence relation that is used in Lanczos Iteration. • This is possible, if we give up orthogonality. (0) T (0) (0) • First, we let r1 = b − Ax and ^r1 = b − A x , where x is our initial guess. • Then, we build bases for two Krylov subspaces, 2 k−1 K(r1; A; k) = spanfr1;Ar1;A r1;:::;A r1g; T T T 2 T k−1 K(^r1;A ; k) = spanf^r1;A ^r1; (A ) ^r1;:::; (A ) ^r1g; that are biorthogonal, as opposed to a single basis being orthogonal. • That is, we construct a sequence q1; q2;::: that spans a Krylov subspace generated by A, and T T a sequence q^1; q^2;::: that spans a Krylov subspace generated by A , such that q^i qj = δij. • In other words, if we define Qk = q1 q2 ··· qk ; Q^k = q^1 q^2 ··· q^k ; ^T then Qk Qk = I. • Since these bases span Krylov subspaces, it follows that their vectors have the form T qj = pj−1(A)r1; q^j =p ^j−1(A )^r1; where pi andp ^i are polynomials of degree i. • Then, using the framework of Gram-Schmidt Orthogonalization, we have j X T βjqj+1 = Aqj − hijqi; hij = q^i Aqj: i=1 • To ensure biorthogonality, we require that pi be formally orthogonal to all polynomials of lesser degree. 5 • That is, hpi; gi = 0 for g 2 Pi−1, where Pn is the space of polynomials of degree at most n, and T hf; gi = ^r1 f(A)g(A)r1 (3) is our bilinear form with which we define formal orthogonality. • We say formal orthogonality because h·; ·i does not actually define a valid inner product, due to A not being symmetric positive definite. • Suppose that i < j − 1. Then T T T T hij = q^i Aqj = ^r1 [^pi−1(A )] Apj−1(A)r1 = hλp^i−1(λ); pj−1(λ)i = 0: • It follows that the qj and q^j can be obtained using three-term recurrence relations, as in Lanczos Iteration.