Iterative Methods in (part 2)

Stan Tomov

Innovative Computing Laboratory Computer Science Department The University of Tennessee

Wednesday April 23, 2008

CS 594-0004, 04-23-2008 CS 594-0004, 04-23-2008 Outline

Part I Krylov iterative solvers

Part II Convergence and preconditioning

Part III Iterative eigen-solvers

CS 594-0004, 04-23-2008 Part I

Krylov iterative solvers

CS 594-0004, 04-23-2008 Krylov iterative solvers

Building blocks for Krylov iterative solvers covered so far

Projection/minimization in a subspace Petrov-Galerkin conditions Least squares minimization, etc.

Orthogonalization CGS and MGS Cholesky or Householder based QR

CS 594-0004, 04-23-2008 Krylov iterative solvers

We also covered abstract formulations for iterative solvers and eigen-solvers

What are the goals of this lecture? Give specific examples of Krylov solvers Show how examples relate to the abstract formulation Show how examples relate to the building blocks covered so far, specidicly to Projection, and But we are not going into the details!

CS 594-0004, 04-23-2008 Krylov iterative solvers How are these techniques related to Krylov iterative solvers?

Remember projection slides 26 & 27, Lecture 7 (left)

Projection in a subspace is the basis for an iterative method Here projection is in V In Krylov methods V is the Krylov subspace

2 m−1 Km(A, r0) = span{r0, Ar0, A r0,..., A r0}

where r0 ≡ b − Ax0 and x0 is an initial guess.

Often V or W are orthonormalized The projection is ’easier’ to find when we work with an orthonormal basis (e.g. problem 4 from homework 5: projection in general vs orthonormal basis) The orthonormalization can be CGS, MGS, Cholesky or Householder based, etc.

CS 594-0004, 04-23-2008 Krylov Iterative Methods

To summarize, Krylov iterative methods in general

expend the Krylov subspace by a -vector product, and do a projection in it.

Various methods result by specific choices of the expansion and projection.

CS 594-0004, 04-23-2008 Krylov Iterative Methods

A specific example with the

Conjugate Gradient Method (CG)

CS 594-0004, 04-23-2008 Conjugate Gradient Method

The method is for SPD matrices

Both V and W are the Krylov subspaces, i.e. at iteration i

i−1 V ≡ W ≡ Ki (A, r0) ≡ span{r0, Ar0,..., A r0}

The projection xi ∈ Ki (A, r0) satisfies the Petrov-Galerkin conditions

(Axi , φ) = (b, φ), for ∀φ ∈ Ki (A, r0)

CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued)

At every iteration there is a way (to be shown later) to construct a new search direction pi such that

span{p0, p1,..., pi } ≡ Ki+1(A, r0) and (Api , pj ) = 0 for i 6= j.

Note: A is SPD ⇒ (Api , pj ) ≡ (pi , pj )A can be used as an inner product, i.e. p0,..., pi is an (·, ·)A orthogonal basis for Ki+1(A, r0)

⇒ we can easily find xi+1 ≈ x as

xi+1 = x0 + α0p0 + ··· + αi pi s.t.

(Axi+1, pj ) = (b, pj ) for j = 0,..., i

Namely, because of the (·, ·)A orthogonality of p0,..., pi at iteration i + 1 we have to find only αi

(ri , pi ) (Axi+1, pj ) = (A(xi + αi pi ), pi ) = (b, pi ), ⇒ αi = (Api , pi )

Note: xi above actually can be replaced by any x0 + v, v ∈ Ki (A, r0)(Why?)

CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued)

Note: Conjugate Gradient Method One matrix vector product/iteration (at line 9) 1: Compute r0 = b − Ax0 for some initial guess x0 Two inner-products/iteration (lines 3 and 10) 2: for i = 0 to ... do In exact arithmetic r = b − Ax 3: ρ = r T r i+1 i+1 i i i (Apply A to both sides of 11 and subtract from b to 4: if i = 0 then get line 12) 5: p0 = r0 Update for x is as pointed out before, i.e. with 6: else i+1 ρi 7: pi = ri + pi−1 ρi−1 (ri , ri ) (ri , pi ) 8: end if αi = = 9: qi = Api (Api , pi ) (Api , pi ) ρi 10: αi = T pi qi since (ri , pi−1) = 0 (exercise) 11: xi+1 = xi + αi pi 12: ri+1 = ri − αi qi Other relations to be proved (exercise) 13: check convergence; continue if necessary pi s’ span the Krylov space end for 14: pi s’ are (·, ·)A orthogonal, etc.

CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued) To sum it up: In exact arithmetic we get the exact solution in at most n steps, i.e.

x = x0 + α0p0 + ··· + αi pi + αi+1pi+1 + ··· + αn−1pn−1

At every iteration one more term αj pj is added to the current approximation

xi = x0 + α0p0 + ··· + αi−1pi−1

xi+1 = x0 + α0p0 + ··· + αi−1pi−1 +αi pi ≡ xi + αi pi Note: we do not have to solve linear system at every iteration because of the A-orthogonal basis that we manage to maintain and expend at every iteration It can be proved that the error ei = x − xi satisfies !i pk(A) − 1 ||ei ||A ≤ 2 ||e0||A pk(A) + 1

CS 594-0004, 04-23-2008 Building orthogonal basis for a Krylov subspace

Building orthogonal basis for a Krylov subspace We have seen the importance in Defining projections not just for linear solvers Abstract linear solvers and eigen-solver formulations A specific example in CG where the basis for the Krylov subspaces is A-orthogonal (A is SPD)

We have seen how to build it CGS, MGS, Cholesky or Householder based, etc. These techniques can be used in a method specifically designed for Krylov subspaces (general non-Hermitian matrix), namely in the Arnoldi’s Method

CS 594-0004, 04-23-2008 Arnoldi’s Method

Note: This orthogonalization is based on CGS (line 4) Arnoldi’s method: wj = Avj − (Avj , v1)v1 − ... − (Avj , vj )vj Build an orthogonal basis for Km(A, r0) A can be general, non-Hermitian ⇒ up to iteration j vectors

1: v1 = r0 v1, ..., v 2: for j = 1 to m do j 3: hij = (Avj , vi ) for i = 1, .., j are orthogonal 4: wj = Avj − h1j v1 − ... − hjj vj The space of this orthogonal basis grows by taking the 5: hj+1,j = ||wj ||2 next vector to be Avj 6: if hj+1,j = 0 Stop wj If we do not exit at step 6 we will have 7: vj+1 = hj+1,j 8: end for Km(A, r0) = span{v1, v2,..., vm}

(exercise)

CS 594-0004, 04-23-2008 Arnoldi’s Method (continued)

Arnoldi’s method in matrix notation Denote

Vm ≡ [v1,..., vm], Hm+1 = {hij }m+1×m

and by Hm the matrix Hm+1 without the last row.

Note that Hm is upper Hessenberg (0s below the lower second sub-diagonal) and

T AVm = VmHm + wmem T Vm AVm = Hm

(exercise)

CS 594-0004, 04-23-2008 Arnoldi’s Method (continued)

Variations: Explained using CGS Can be implemented with MGS, Householder, etc.

How to use it in linear solvers? Example with the Full Orthogonalization Method (FOM)

CS 594-0004, 04-23-2008 FOM

Look for solution in the form

xm = x0 + ym(1)v1 + ··· + ym(m)vm

≡ x0 + Vmym FOM Petrov-Galerkin conditions will be 1: β = ||r0||2 2: Compute v1,..., vm with Arnoldi T T −1 Vm Axm = Vm b 3: ym = βHm e1 T T 4: xm = x0 + Vmym ⇒ Vm A(x0 + Vmym) = Vm b T T ⇒ Vm AVmym = Vm r0 T ⇒ Hmym = Vm r0 = βe1

which is given by steps 3 and 4 of the algorithm

CS 594-0004, 04-23-2008 Restarted FOM

What happens when m increases? computation grows as at least O(m2)n memory is O(mn)

A remedy is to restart the algorithm, leading to restarted FOM

FOM(m)

1: β = ||r0||2 2: Compute v1,..., vm with Arnoldi −1 3: ym = βHm e1 4: xm = x0 + Vmym. Stop if residual is small enough. 5: Set x0 := xm and go to 1

CS 594-0004, 04-23-2008 GMRES Generalized Minimum Residual Method (GMRES) Similar to FOM Again look for solution

xm = x0 + Vmym

where Vm is from the Arnoldi process (i.e. Km(A, r0)) The test conditions Wm from the abstract formulation (slide 27, Lecture 7) T T Wm AVmym = Wm r0

are Wm = AVm. The difference results in step 3 from FOM, namely

−1 ym = βHm e1 being replaced by

ym = argminy ||βe1 − Hm+1y||2

CS 594-0004, 04-23-2008 GMRES

Similarly to FOM, GMRES can be defined with Various in the Arnoldi process Restart

Note: Solving the least squares (LS) problem

argminy ||βe1 − Hm+1y||2

can be done with QR factorization as discussed in Lecture 7, Slide 25

CS 594-0004, 04-23-2008

Can we improve on Arnoldi if A is symmetric?

Yes! Hm becomes symmetric so it will be just 3 diagonal the simplification of Arnoldi in this case leads to the Lanczos Algorithm Lanczos can be used in deriving CG

The Lanczos Algorithm

Matrix H here is 3-diagonal with diagonal r0 m 1: v1 = , β1 = 0, v0 = 0 ||r0||2 2: for j = 1 to m do hii = αi 3: wj = Avj − βj vj−1 4: αj = (wj , vj ) and off diagonal 5: wj = wj − αj vj 6: βj+1 = ||wj ||2. If βj+1 = 0 then Stop hi,i+1 = βi+1 wj 7: vj+1 = βj+1 8: end for In exact arithmetic vi s’ are orthogonal but in reality orthogonalization gets lost rapidly

CS 594-0004, 04-23-2008 Choice of basis for the Krylov subspace

We saw how different basis for the Krylov spaces is characteristic for various methods, e.g. GMRES uses orthogonal CG uses A-orthogonal

This is true for other methods as well Conjugate Residual (CR; for symmetric problems) uses T A A-orthogonal (i.e. Api ’s are orthogonal) AT A-orthogonal basis can be generalized to the non-symmetric case as well, e.g. in the Generalized Conjugate Residual (GCR)

CS 594-0004, 04-23-2008 Other Krylov methods

We considered various methods that construct a basis for the Krylov subspaces

Another big class of methods is based on biortogonalization (algorithm due to Lanczos) For non-symmetric matrices build a pair of bi-orthogonal bases for the two subspaces

m−1 Km(A, v1) = span{v1, Av1,..., A v1} T T T m−1 Km(A , w1) = span{w1, A w1,..., (A ) w1}

Examples here are BCG and QMR (not to be discussed) These methods are more difficult to analyze

CS 594-0004, 04-23-2008 Part II

Convergence and preconditioning

CS 594-0004, 04-23-2008 Convergence

Convergence can be analyzed by Exploit the optimality properties (of projection) when such properties exist A useful tool is Chebyshev polynomials Depend on the condition number of the matrix, e.g. in CG it is !i pk(A) − 1 ||ei ||A ≤ 2 ||e0||A pk(A) + 1

CS 594-0004, 04-23-2008 Preconditioning

Convergence can be slow or even stagnate for ill-conditioned matrices (with large condition number)

But can be improved with preconditioning

xi+1 = xi + P(b − Axi )

Think of P as a preconditioner, an operator/matrix P ≈ A−1 for P = A−1 it takes 1 iteration

CS 594-0004, 04-23-2008 Preconditioning

Properties desired in a preconditioner:

Should approximate A−1 Should be easy to compute, apply to a vector, and store

Iterative solvers can be extended to support preconditioning (How?)

CS 594-0004, 04-23-2008 Preconditioning

Extending Iterative solvers to support preconditioning The same solver can be used but on a modified problem, e.g. Problem Ax = b is transformed into

PAx = Pb

known as left preconditioning Problem Ax = b is transformed into

APx = b, x = Pu

known as right preconditioning Convergence of the modified problem would depend on k(PA) (e.g. with left preconditioning)

CS 594-0004, 04-23-2008 Preconditioning

Examples: Incomplete LU factorization (e.g. ILU(0)) Jacobi (inverse of the diagonal) Other stationary iterative solvers (GS, SOR, SSOR) Block preconditioners and domain decomposition Additive Schwarz (thing of Block-Jacobi) Multiplicative Schwarz (think of Block-GS)

CS 594-0004, 04-23-2008 Preconditioning

Examples so far: algebraic preconditioners, i.e. exclusively based on the the matrix

Often, for problems coming from PDEs, PDE and discretization information can be used in designing a preconditioner, e.g. FFTs’ can be involved to approximate differential operators on regular grids (as in Fourier space the operators are diagonal matrices) Grid and problem information to define multigrid preconditioners Indefinite problems are often composed of sub-blocks that are definite: used in defining specific preconditioners and even modify solvers for these needs, etc.

CS 594-0004, 04-23-2008 Part III

Iterative eigen-solvers

CS 594-0004, 04-23-2008 Iterative Eigen-Solvers How are iterative eigensolvers related to Krylov subspaces?

Remember projection slides 29 & 30, Lecture 7 (left)

Again, as in linear solvers, Projection in a subspace is the basis for an iterative eigen-solver V and W are often based on Krylov subspaces

2 m−1 Km(A, r0) = span{r0, Ar0, A r0,..., A r0}

where r0 ≡ b − Ax0 and x0 is an initial guess.

Often parts of V or W are orthogonalized For stability The orthogonalization can be CGS, MGS, Cholesky or Householder based, etc. The smaller Rayleigh-Ritz are usually solved with LAPACK routines

CS 594-0004, 04-23-2008 Learning Goals

A brief introduction to Krylov iterative solvers and eigen-solvers Links to building blocks that we have already covered Abstract formulation Projection, and Orthogonalization Specific examples and issues (preconditioning, parallelization, etc.)

CS 594-0004, 04-23-2008