Course Notes: Week 3 Math 270C: Applied Numerical

1 Lecture 6: Conjugate Gradients (Lanczos Version) (4/11/11)

The Arnoldi iteration is called Lanczos iteration in the case of a symmetric A. We will go a step further in the next few lectures and assume that A is symmetric positive definite: A = AT , xT Ax > 0 ∀x 6= 0 In this case, the Krylov method for solving Ax = b is called Conjugate Gradients. This method is much more simplistic than the GMRES algorithm and does not require the excessive storage of all previous Arnoldi (here Lanczos) vectors qi. Also, the method does not require the increasing number of inner products necessary for constructing Hk. We can derive this method using what we have covered thus far for GMRES.

k kT k k n×k The first thing to note is that if we define T as Q AQ where Q ∈ R are the Lanczos (formerly Arnoldi) vectors, then Tk = TkT since A is symmetric and since this is essentially the upper from the GMRES discussion, we can see that Tk is symmetric tridiagonal:   α1 β1 0 ... 0  ..   β1 α2 β2 . 0  k   T =  ......   0 . . . .     βk−1  0 ... 0 βk−1 αk The addition of zeros above the diagonal that results from the symmetric upper Hessenberg nature will save us the inner product calculations needed in the Graham-Schmidt stage of the GMRES process. However, we will see that we can go even further and avoid storage of Qk altogether. Again, this is noteworthy because this increasingly excessive storage (and the excessive cost of the inner products) that arises when the number of iterations is large are the primary drawbacks of the GMRES algorithm.

In general for the Krylov methods with Lanczos/Arnoldi bases we write xk = Qkλk. It would k k−1 k be nice if we could write this as x = x + λkqk and then only require the generation/storage of the newest Lanczos vector. Unfortunately, for both CG and GMRES, this will not be possible. k k−1 However, for CG we will be able to write x = x + αkpk in terms of a different for the Krylov space. This modification to the algorithm will be what allows us to avoid the storage of the increasingly costly Qk. We can derive this new basis by modifying the functional that we minimize at each iteration (e.g. for GMRES we minimized the L2 norm of the residual over the kth Krylov space) to yield a symmetric positive definite tridiagonal solve at each iteration that we can show leads to the basis vectors pk. I will call this the Lanczos derivation of CG. We will also do steepest descent inspired derivation in the following lectures.

1.1 Minimizing the A Norm of the Error When A is symmetric positive definite, it defines a norm: √ T ||x||A = x Ax.

1 Most importantly for us, it will be easy to evaluate the A-norm of the error. Specifically, we will minimize the A-norm over the Krylov space at each iteration. This will be more useful than simply minimizing the L2 norm of the residual (which is equivalent to minimizing the AT A norm of the error).

The error at step k is ek = x − xk. The residual is related to the error as rk = Aek. We can see the useful equality:

T k 2  k k T kT kT k T k ||e ||A = e Ae = x b − 2x b + x Ax = x b + 2φ(x )

T 1 T where φ(x) = x b + 2 x Ax. So technically in order to know the A-norm of the error we would need to know xT b which we cannot know. However, the above expression shows that the A-norm of the error has the same minimizer as φ. Therefore, we can choose our kth iterate to minimize φ rather than the L2 norm of the residual. We will see that this leads to more favorable properties of the iterative scheme (namely that we do not have to store the Qk).

In order to find xk as the minimizer of φ over Kk, we can use the Lanczos basis and write xk = Qkλk k k k k k where λ solves ∇λf (λ) = 0 with f (λ) = φ(Q λ). Therefore, λ is the solution of T T k k  k k k  k k k ∇λf (λ ) = Q AQ λ − Q b = T − bˆ .

This is particularly useful because it is relatively easy to solve (via LDLT factorization) a sym- metric positive definite tridiagonal system. Notably, it is very difficult to solve this system (via factorization) if the matrix is symmetric but indefinite (i.e. if A has both negative and positive eigenvalues). In other words, this is really only going to be possible for the symmetric positive definite case (even though Tk is tridiagonal even in the case of symmetric indefinite).

2 Lecture 7: Tridiagonal Factorization Properties (4/13/11)

2.1 Tridiagonal Symmetric Positive Definite LDLT The LDLT factorization of Tk can be determined as   1   d1  µ1 1     d2   T Lk =  .. ..  , Dk =   , Tk = LkDk Lk  . .   ..     .   µk−2 1  dk µk−1 1

If you just multiply these out you can see that the µi and di can be determined as d1 ← α1 µ ← β1 1 d1 for i = 2 to k do di ← αi − µi−1βi−1 µ ← βi i di end for In general, the sensitivity of this procedure to roundoff errors will be proportionate to the condition number of A. Also, this process can fail catastrophically if A is symmetric indefinite. Of course, βi th and αi do not change when k > i so we will only need to compute dk and µk−1 at the k iteration.

2 2.2 Forward and Backward Substitution The LDLT factorization is useful because it facilitates forward and backward substitution based solution for λk. To first do the forward substitution, we use pk = LkT λk and solve LkDkpk = bˆk:

s ← ||b||2; pk ← s ; 1 d1 for i = 2 to k do s ← sµi−1; pk ← (−1)i+1 s ; i di end for k Of course, at iteration k all the pi for i = 1, . . . , k − 1 are the same as from the previous iteration k so the only new data is pk. So to sum up, at the kth iteration you need to calculate:

βk k k+1 s dk = αk − µk−1βk−1, µk = , s = sµk−1, pk = (−1) dk dk Unfortunately, although the pk only change in the last component at each iteration, the λk will be completely different at each iteration. We can see this from the expression

 1 µ  1  λk   pk  1 µ 1 1  2  λk pk k k  . .   2   2  L λ =  .. ..   .  =  .     .   .   1 µ       k−1  λk pk 1 k k

k that although the pk is the only new piece of information in the right hand side of the equation it k will unfortunately affect each λi in the back substitution process. Unlike the forward substitution k for the pi where information propagates from small i to large, the back substitution process for the k λi will propagate information from large i to small. And since the change in the right hand side is k k th at pk, every λi will unfortunately be affected. This means that expressing the k iterate in terms of the Lanczos vectors (xk = Qkλk) requires that we store all previous vectors which we will see is suboptimal for symmetric positive definite matrices.

3 Lecture 8: Memory Efficient Basis for the Krylov Space (4/15/11)

The insight needed to yield an algorithm that does not require the storage of all prior search directions is basically related to the derivation of a new basis for the Krylov space Kk. We will not see this completely until we go through the steepest descent derivation of CG, but we will derive the basis vectors now and prove their properties later. The hope for avoiding the storage of the search directions is hinted at in the tridiagonal structure of conjugation of the matrix with the Lanczos basis for Kk. Unfortunately, the behavior of the λk that we just discussed suggests that we need k k k  k k k n×k some other basis for representing x over K . These basis vectors (C = c1, c2,..., ck ∈ R ) are given as Ck = Qk(Lk)−T k k k k k  k k k  k−1 k k We can then see that x = Q λ = C p = c1, c2,..., ck−1 p + pkck. We have already shown that pk only changes in the last component (i.e. that pk−1 is from the previous iteration). There-  k k k  k−1 k k−1 k k fore, if c1, c2,..., ck−1 = C then we can see that x = x + pkck. We will show this here.

3 k Later we will show that span {c1,... ck} = K and so we can think of the vectors {c1,... ck} as a more useful basis for the Krylov space in that they allow us to avoid excessive storage.

The following equality suggests that only the last column of Ck changes at the kth iteration:

k k T h k k k k k k ki k C (L ) = c1, µ1c1 + c2, µ2c2 + c3, . . . , µk−1ck−1 + ck = Q

In other words, k k k c1 = q1, ci = qi − µi−1ci−1 for i = 2, . . . , k th k Therefore at the k iteration, the ci for i = 1, . . . , k = 1 are all the same as the previous iteration. Thus we can drop the super script k and just call these ci and most importantly we can say that

k k−1 k x = x + pkck and that xk is the minimizer of φ over Kk without requiring storage of all previous Lanczos vectors.

3.1 Wrapping Up: Lanczos CG Pseudocode Here I will outline a version of the CG algorithm suggested by the results of the previous three lectures. v ← b; βk−1 ← 0; βk ← ||v||2; qk ← 0; for k = 1 to kmax do qk−1 ← qk; 1 qk ← v; βk v ← Aqk; T αk ← qk v; v ← v − αkqk − βk−1qk−1; βk−1 ← βk; βk ← ||v||2; if k==1 then dk ← αk; ck ← qk; βk−1 ρk ← ; αk x ← ρkqk; else dk−1 ← dk; ρk−1 ← ρk ck−1 ← ck βk−1 µk−1 ← ; dk−1 dk ← αk − βk−1µk−1; ck ← qk − µk−1ck−1 µk−1dk−1ρk−1 ρk ← − dk x ← x + ρkck end if end for

4