Section 5.3: Other Krylov Subspace Methods

Jim Lambers

March 15, 2021

• So far, we have only seen Krylov subspace methods for symmetric positive definite matrices.

• We now generalize these methods to arbitrary square invertible matrices.

Minimum Residual Methods • In the derivation of the Conjugate Gradient Method, we made use of the Krylov subspace

2 k−1 K(r1, A, k) = span{r1,Ar1,A r1,...,A r1}

to obtain a solution of Ax = b, where A was a symmetric positive definite and (0) (0) r1 = b − Ax was the residual associated with the initial guess x .

• Specifically, the Conjugate Gradient Method generated a sequence of approximate solutions x(k), k = 1, 2,..., where each iterate had the form

k (k) (0) (0) X x = x + Qkyk = x + qjyjk, k = 1, 2,..., (1) j=1

and the columns q1, q2,..., qk of Qk formed an orthonormal basis of the Krylov subspace K(b, A, k).

• In this section, we examine whether a similar approach can be used for solving Ax = b even when the matrix A is not symmetric positive definite.

(k) • That is, we seek a solution x of the form of equation (??), where again the columns of Qk form an orthonormal basis of the Krylov subspace K(b, A, k).

• To generate this basis, we cannot simply generate a sequence of orthogonal polynomials using a three-term recurrence relation, as before, because the bilinear form

T hf, gi = r1 f(A)g(A)r1 does not satisfy the requirements for a valid inner product when A is not symmetric positive definite.

• Therefore, we must use a different approach.

• To that end, we make use of Gram-Schmidt .

• Suppose that we have already generated a sequence of orthonormal vectors q1, q2,..., qk that span K(b, A, k).

1 • To obtain a vector qk+1 such that q1, q2,..., qk+1 is an orthonormal basis for K(b, A, k + 1), we first compute Aqk, and then orthogonalize this vector against q1, q2,..., qk. • This is accomplished as follows: k X T vk = Aqk − qj(qj Aqk), j=1 vk qk+1 = . kvkk2

• It can be verified directly that for i = 1, 2, . . . , k, k T T X T T T T T qi vk = qi Aqk − qi qj(qj Aqk) = qi Aqk − qi qi(qi Aqk) = 0, j=1

so the vectors q1, q2,..., qk+1 do indeed form an orthonormal set. (0) • To begin the sequence, we define v0 = r1 = b − Ax and let q1 = v0/kv0k2. • If we define T hij = qi Aqj, i = 1, 2, . . . , j, hj+1,j = kvjk2, it follows from the orthogonalization process that j X T vj = Aqj − qi(qi Aqj) i=1 j X vj = Aqj − qihij i=1 j X vj + qihij = Aqj i=1 j X kvjk2qj+1 + qihij = Aqj i=1 j X Aqj = hijqi + hj+1,jqj+1, j = 1, 2, . . . , k, i=1 or, in matrix form, from   AQk = Aq1 Aq2 ··· Aqk T ˜ AQk = QkHk + hk+1,kqk+1ek = Qk+1Hk, (2)

where Hk, as well as H˜k, is an upper Hessenberg matrix   h11 h12 ······ h1j  h21 h22 ······ h2j       .. .  ˜ Hk Hk =  0 h32 . .  , Hk = T .   hk+1,ke  ......  k  . 0 . . .  0 ··· 0 hk,k−1 hkk That is, a matrix is upper Hessenberg if all entries below the subdiagonal are zero.

2 • ALWAYS remember this: a linear combination of column vectors can be viewed as a matrix-vector product! Also, a “horizontal stacking” of such linear combinations can be viewed as a matrix-matrix product.

• Example: for j = 2, 3,   j   h13 X   h12   qihij = q1 q2 , q1 q2 q3  h23  h22 i=1 h33

Assemble:   h12 h13  P2 P3    i=1 qihi2 i=1 qihi3 = q1 q2 q3  h22 h23  0 h33

(k) (0) • To find the vector yk such that x = x + Qkyk is an approximate solution to Ax = b, we aim to minimize the norm of the residual,

(k) krk+1k2 = kb − Ax k2 (0) = kb − Ax − AQkykk2

= kv0 − AQkykk2

= kkv0k2q1 − Qk+1H˜kykk2 T ˜ = kQk+1(kv0k2Qk+1e1 − Qk+1Hkyk)k2

= kh10e1 − H˜kykk2,

where h10 = kv0k2.

• This is an example of a full-rank least-squares problem, which can be solved using various approaches that are discussed in Chapter 4.

• We now present the algorithm for solving Ax = b by minimizing the norm of the residual (k) rk+1 = b − Ax .

• This method is the Generalized Minimum Residual (GMRES) Method.

3 n×n Algorithm. (Generalized Minimum Residual Method) Let A ∈ R be nonsin- n gular, and let b ∈ R . The following algorithm computes the solution x of the system of linear equations Ax = b.

(0) n Choose an initial guess x ∈ R (0) v0 = b − Ax h10 = kv0k2 j = 0 while hj+1,j > T OL qj+1 = vj/hj+1,j j = j + 1 vj = Aqj for i = 1, 2, . . . , j T hij = qi vj vj = vj − hijqi end for hj+1,j = kvjk2 (j) (0) x = x + Qjyj where kh10e1 − H˜jyjk2 = minimum end while

• The iteration that produces the orthonormal basis q1, q2, ..., for the Krylov subspace K(A, b, j) is known as .

• Now, suppose that A is symmetric. From Arnoldi Iteration, we have

T AQk = QkHk + βkqk+1ek , βk = hk+1,k.

T • It follows from the orthonormality of the columns of Qk that Hk = Qk AQk, and therefore T T T Hk = Qk A Qk.

• Therefore, if A is symmetric, so is Hk, but since Hk is also upper Hessenberg, this means that Hk is a tridiagonal matrix, which we refer to as Tk.

• In fact, Tk is the very tridiagonal matrix obtained by applying the Lanczos Iteration to A with initial vector b.

• This can be seen from the fact that both algorithms generate an orthonormal basis for the Krylov subspace K(A, v0, k).

• Therefore, to solve Ax = b when A is symmetric, we can perform Lanczos Iteration to com- pute Tk, for k = 1, 2,..., and at each iteration, solve the least squares problem of minimizing (k) (0) kb − Ax k2 = kβ0e1 − T˜kykk2 = minimum, where β0 = kb − Ax k2 and T˜k is defined in the same way as H˜k.

• This algorithm is known as the MINRES (Minimum Residual) method.

• One key difference between MINRES and GMRES is that in GMRES, the computation of qk+1 depends on q1, q2,..., qk, whereas in MINRES, it only depends on qk−1 and qk, due to the three-term recurrence relation satisfied by the Lanczos vectors.

4 • It follows that the solution of the least squares problem during each iteration becomes substan- tially more expensive in GMRES, whereas this is not the case for MINRES; the approximate solution x(k) can be updated from one iteration to the next.

• The consequence of this added expense for GMRES is that it must occasionally be restarted in order to contain the expense of solving these least squares problems.

The Biconjugate Gradient Method • As we have seen, our first attempt to generalize the Conjugate Gradient Method to gen- eral matrices, GMRES, has the drawback that the computational expense of each iteration increases.

• To work around this problem, we consider whether it is possible to generalize Lanczos Itera- tion to unsymmetric matrices in such a way as to generate a Krylov subspace basis using a short recurrence relation, such as the three-term recurrence relation that is used in Lanczos Iteration.

• This is possible, if we give up orthogonality.

(0) T (0) (0) • First, we let r1 = b − Ax and ˆr1 = b − A x , where x is our initial guess.

• Then, we build bases for two Krylov subspaces,

2 k−1 K(r1, A, k) = span{r1,Ar1,A r1,...,A r1}, T T T 2 T k−1 K(ˆr1,A , k) = span{ˆr1,A ˆr1, (A ) ˆr1,..., (A ) ˆr1},

that are biorthogonal, as opposed to a single basis being orthogonal.

• That is, we construct a sequence q1, q2,... that spans a Krylov subspace generated by A, and T T a sequence qˆ1, qˆ2,... that spans a Krylov subspace generated by A , such that qˆi qj = δij. • In other words, if we define     Qk = q1 q2 ··· qk , Qˆk = qˆ1 qˆ2 ··· qˆk ,

ˆT then Qk Qk = I.

• Since these bases span Krylov subspaces, it follows that their vectors have the form

T qj = pj−1(A)r1, qˆj =p ˆj−1(A )ˆr1,

where pi andp ˆi are polynomials of degree i.

• Then, using the framework of Gram-Schmidt Orthogonalization, we have

j X T βjqj+1 = Aqj − hijqi, hij = qˆi Aqj. i=1

• To ensure biorthogonality, we require that pi be formally orthogonal to all polynomials of lesser degree.

5 • That is, hpi, gi = 0 for g ∈ Pi−1, where Pn is the space of polynomials of degree at most n, and T hf, gi = ˆr1 f(A)g(A)r1 (3) is our bilinear form with which we define formal orthogonality.

• We say formal orthogonality because h·, ·i does not actually define a valid inner product, due to A not being symmetric positive definite.

• Suppose that i < j − 1. Then

T T T T hij = qˆi Aqj = ˆr1 [ˆpi−1(A )] Apj−1(A)r1 = hλpˆi−1(λ), pj−1(λ)i = 0.

• It follows that the qj and qˆj can be obtained using three-term recurrence relations, as in Lanczos Iteration.

• These recurrence relations are

βjqj+1 = vj = (A − αjI)qj − γj−1qj−1,

T γjqˆj+1 = vˆj = (A − αjI)qˆj − βj−1qˆj−1, for j = 1, 2,....

• From these recurrence relations, we obtain T T ˆ ˆ T T AQk = QkTk + βkqk+1ek ,A Qk = QkTk + γkqˆk+1ek , where   α1 γ1  β1 α2 γ2    ˆT  .. ..  Tk = Q AQk =  β2 . .  . k    .. ..   . . γk−1  βk−1 αk

ˆ • This process of generating the bases Qk and Qk, along with the tridiagonal matrix Tk, is known as the Unsymmetric Lanczos Iteration.

• The full algorithm is as follows:

v0 = r vˆ0 = ˆr1 q0 = 0 qˆ0 = 0 T γ0β0 = vˆ0 v0 k = 0 T while vˆk vk 6= 0 do qk+1 = vk/βk qˆk+1 = vˆk/γk k = k + 1 T αk = qˆk Aqk

6 vk = (A − αkI)qk − γk−1qk−1 T vˆk = (A − αkI)qˆk − βk−1qˆk−1 T γkβk = vˆk vk end while

• It should be noted that this algorithm gives flexibility in how to compute the off-diagonal entries γk and βk of Tk.

• A typical approach is to compute q T T βk = |vˆk vk|, γk = vˆk vk/βk.

T • If vk, vˆk 6= 0 but vˆk vk = 0, then the iteration suffers what is known as serious breakdown. • The iteration cannot continue due to division by zero, and no viable approximate solution is obtained.

(k) (0) • If we then seek a solution to Ax = b of the form x = x + Qkyk, then we can ensure that (k) for k ≥ 1, the residual rk+1 = b − Ax is orthogonal to all columns of Qˆk as follows: ˆT ˆT (k) ˆT 0 = Qk rk+1 = Qk (b − Ax ) = Qk (β0Qke1 − AQkyk) = β0e1 − Tkyk,

where for convenience we define β0 = kr1k2.

• Therefore, during each iteration we can solve the tridiagonal system of equations Tkyk = β0e1, as in Lanczos Iteration for solving Ax = b in the case where A is symmetric positive definite.

• However, as before, it would be preferable if we could obtain each approximate solution x(k) through a simple update of x(k−1) using an appropriate search direction.

• Using similar manipulations to those used to derive the Conjugate Gradient Method from the (symmetric) Lanczos Iteration, we can obtain the Biconjugate Gradient Method (BiCG):

7 n×n Algorithm. (Biconjugate Gradient Method) Let A ∈ R be nonsingular, and n let b ∈ R . The following algorithm computes the solution x of the system of linear equations Ax = b.

(0) n Choose an initial guess x ∈ R (0) r1 = b − Ax T (0) ˆr1 = b − A x for k = 1, 2,... do until convergence if k > 1 then T T µk = ˆrk rk/ˆrk−1rk−1 pk = rk + µkpk−1 pˆk = ˆrk + µkpˆk−1 else p1 = r1 pˆ1 = ˆr1 end if vk = Apk T T νk = ˆrk rk/pˆk vk (k) (k−1) x = x + νkpk rk+1 = rk − νkvk T ˆrk+1 = ˆrk − νkA pˆk end for

• Unfortunately, the biconjugate gradient method, as described above, is numerically unstable, even when used in conjunction with preconditioning.

• A variation of BiCG, known as BiCGSTAB, overcomes this instability.

• An excellent online resource for testing iterative methods is the Matrix Market, which, as of this writing, can be found at http://math.nist.gov/MatrixMarket.

• It features approximately 500 sparse matrices from various applications, along with code written in C, FORTRAN and Matlab for reading and writing matrices using its storage format.

8