Iterative Methods in Linear Algebra (part 2)
Stan Tomov
Innovative Computing Laboratory Computer Science Department The University of Tennessee
Wednesday April 23, 2008
CS 594-0004, 04-23-2008 CS 594-0004, 04-23-2008 Outline
Part I Krylov iterative solvers
Part II Convergence and preconditioning
Part III Iterative eigen-solvers
CS 594-0004, 04-23-2008 Part I
Krylov iterative solvers
CS 594-0004, 04-23-2008 Krylov iterative solvers
Building blocks for Krylov iterative solvers covered so far
Projection/minimization in a subspace Petrov-Galerkin conditions Least squares minimization, etc.
Orthogonalization CGS and MGS Cholesky or Householder based QR
CS 594-0004, 04-23-2008 Krylov iterative solvers
We also covered abstract formulations for iterative solvers and eigen-solvers
What are the goals of this lecture? Give specific examples of Krylov solvers Show how examples relate to the abstract formulation Show how examples relate to the building blocks covered so far, specidicly to Projection, and Orthogonalization But we are not going into the details!
CS 594-0004, 04-23-2008 Krylov iterative solvers How are these techniques related to Krylov iterative solvers?
Remember projection slides 26 & 27, Lecture 7 (left)
Projection in a subspace is the basis for an iterative method Here projection is in V In Krylov methods V is the Krylov subspace
2 m−1 Km(A, r0) = span{r0, Ar0, A r0,..., A r0}
where r0 ≡ b − Ax0 and x0 is an initial guess.
Often V or W are orthonormalized The projection is ’easier’ to find when we work with an orthonormal basis (e.g. problem 4 from homework 5: projection in general vs orthonormal basis) The orthonormalization can be CGS, MGS, Cholesky or Householder based, etc.
CS 594-0004, 04-23-2008 Krylov Iterative Methods
To summarize, Krylov iterative methods in general
expend the Krylov subspace by a matrix-vector product, and do a projection in it.
Various methods result by specific choices of the expansion and projection.
CS 594-0004, 04-23-2008 Krylov Iterative Methods
A specific example with the
Conjugate Gradient Method (CG)
CS 594-0004, 04-23-2008 Conjugate Gradient Method
The method is for SPD matrices
Both V and W are the Krylov subspaces, i.e. at iteration i
i−1 V ≡ W ≡ Ki (A, r0) ≡ span{r0, Ar0,..., A r0}
The projection xi ∈ Ki (A, r0) satisfies the Petrov-Galerkin conditions
(Axi , φ) = (b, φ), for ∀φ ∈ Ki (A, r0)
CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued)
At every iteration there is a way (to be shown later) to construct a new search direction pi such that
span{p0, p1,..., pi } ≡ Ki+1(A, r0) and (Api , pj ) = 0 for i 6= j.
Note: A is SPD ⇒ (Api , pj ) ≡ (pi , pj )A can be used as an inner product, i.e. p0,..., pi is an (·, ·)A orthogonal basis for Ki+1(A, r0)
⇒ we can easily find xi+1 ≈ x as
xi+1 = x0 + α0p0 + ··· + αi pi s.t.
(Axi+1, pj ) = (b, pj ) for j = 0,..., i
Namely, because of the (·, ·)A orthogonality of p0,..., pi at iteration i + 1 we have to find only αi
(ri , pi ) (Axi+1, pj ) = (A(xi + αi pi ), pi ) = (b, pi ), ⇒ αi = (Api , pi )
Note: xi above actually can be replaced by any x0 + v, v ∈ Ki (A, r0)(Why?)
CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued)
Note: Conjugate Gradient Method One matrix vector product/iteration (at line 9) 1: Compute r0 = b − Ax0 for some initial guess x0 Two inner-products/iteration (lines 3 and 10) 2: for i = 0 to ... do In exact arithmetic r = b − Ax 3: ρ = r T r i+1 i+1 i i i (Apply A to both sides of 11 and subtract from b to 4: if i = 0 then get line 12) 5: p0 = r0 Update for x is as pointed out before, i.e. with 6: else i+1 ρi 7: pi = ri + pi−1 ρi−1 (ri , ri ) (ri , pi ) 8: end if αi = = 9: qi = Api (Api , pi ) (Api , pi ) ρi 10: αi = T pi qi since (ri , pi−1) = 0 (exercise) 11: xi+1 = xi + αi pi 12: ri+1 = ri − αi qi Other relations to be proved (exercise) 13: check convergence; continue if necessary pi s’ span the Krylov space end for 14: pi s’ are (·, ·)A orthogonal, etc.
CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued) To sum it up: In exact arithmetic we get the exact solution in at most n steps, i.e.
x = x0 + α0p0 + ··· + αi pi + αi+1pi+1 + ··· + αn−1pn−1
At every iteration one more term αj pj is added to the current approximation
xi = x0 + α0p0 + ··· + αi−1pi−1
xi+1 = x0 + α0p0 + ··· + αi−1pi−1 +αi pi ≡ xi + αi pi Note: we do not have to solve linear system at every iteration because of the A-orthogonal basis that we manage to maintain and expend at every iteration It can be proved that the error ei = x − xi satisfies !i pk(A) − 1 ||ei ||A ≤ 2 ||e0||A pk(A) + 1
CS 594-0004, 04-23-2008 Building orthogonal basis for a Krylov subspace
Building orthogonal basis for a Krylov subspace We have seen the importance in Defining projections not just for linear solvers Abstract linear solvers and eigen-solver formulations A specific example in CG where the basis for the Krylov subspaces is A-orthogonal (A is SPD)
We have seen how to build it CGS, MGS, Cholesky or Householder based, etc. These techniques can be used in a method specifically designed for Krylov subspaces (general non-Hermitian matrix), namely in the Arnoldi’s Method
CS 594-0004, 04-23-2008 Arnoldi’s Method
Note: This orthogonalization is based on CGS (line 4) Arnoldi’s method: wj = Avj − (Avj , v1)v1 − ... − (Avj , vj )vj Build an orthogonal basis for Km(A, r0) A can be general, non-Hermitian ⇒ up to iteration j vectors
1: v1 = r0 v1, ..., v 2: for j = 1 to m do j 3: hij = (Avj , vi ) for i = 1, .., j are orthogonal 4: wj = Avj − h1j v1 − ... − hjj vj The space of this orthogonal basis grows by taking the 5: hj+1,j = ||wj ||2 next vector to be Avj 6: if hj+1,j = 0 Stop wj If we do not exit at step 6 we will have 7: vj+1 = hj+1,j 8: end for Km(A, r0) = span{v1, v2,..., vm}
(exercise)
CS 594-0004, 04-23-2008 Arnoldi’s Method (continued)
Arnoldi’s method in matrix notation Denote
Vm ≡ [v1,..., vm], Hm+1 = {hij }m+1×m
and by Hm the matrix Hm+1 without the last row.
Note that Hm is upper Hessenberg (0s below the lower second sub-diagonal) and
T AVm = VmHm + wmem T Vm AVm = Hm
(exercise)
CS 594-0004, 04-23-2008 Arnoldi’s Method (continued)
Variations: Explained using CGS Can be implemented with MGS, Householder, etc.
How to use it in linear solvers? Example with the Full Orthogonalization Method (FOM)
CS 594-0004, 04-23-2008 FOM
Look for solution in the form
xm = x0 + ym(1)v1 + ··· + ym(m)vm
≡ x0 + Vmym FOM Petrov-Galerkin conditions will be 1: β = ||r0||2 2: Compute v1,..., vm with Arnoldi T T −1 Vm Axm = Vm b 3: ym = βHm e1 T T 4: xm = x0 + Vmym ⇒ Vm A(x0 + Vmym) = Vm b T T ⇒ Vm AVmym = Vm r0 T ⇒ Hmym = Vm r0 = βe1
which is given by steps 3 and 4 of the algorithm
CS 594-0004, 04-23-2008 Restarted FOM
What happens when m increases? computation grows as at least O(m2)n memory is O(mn)
A remedy is to restart the algorithm, leading to restarted FOM
FOM(m)
1: β = ||r0||2 2: Compute v1,..., vm with Arnoldi −1 3: ym = βHm e1 4: xm = x0 + Vmym. Stop if residual is small enough. 5: Set x0 := xm and go to 1
CS 594-0004, 04-23-2008 GMRES Generalized Minimum Residual Method (GMRES) Similar to FOM Again look for solution
xm = x0 + Vmym
where Vm is from the Arnoldi process (i.e. Km(A, r0)) The test conditions Wm from the abstract formulation (slide 27, Lecture 7) T T Wm AVmym = Wm r0
are Wm = AVm. The difference results in step 3 from FOM, namely
−1 ym = βHm e1 being replaced by
ym = argminy ||βe1 − Hm+1y||2
CS 594-0004, 04-23-2008 GMRES
Similarly to FOM, GMRES can be defined with Various orthogonalizations in the Arnoldi process Restart
Note: Solving the least squares (LS) problem
argminy ||βe1 − Hm+1y||2
can be done with QR factorization as discussed in Lecture 7, Slide 25
CS 594-0004, 04-23-2008 Lanczos Algorithm
Can we improve on Arnoldi if A is symmetric?
Yes! Hm becomes symmetric so it will be just 3 diagonal the simplification of Arnoldi in this case leads to the Lanczos Algorithm Lanczos can be used in deriving CG
The Lanczos Algorithm
Matrix H here is 3-diagonal with diagonal r0 m 1: v1 = , β1 = 0, v0 = 0 ||r0||2 2: for j = 1 to m do hii = αi 3: wj = Avj − βj vj−1 4: αj = (wj , vj ) and off diagonal 5: wj = wj − αj vj 6: βj+1 = ||wj ||2. If βj+1 = 0 then Stop hi,i+1 = βi+1 wj 7: vj+1 = βj+1 8: end for In exact arithmetic vi s’ are orthogonal but in reality orthogonalization gets lost rapidly
CS 594-0004, 04-23-2008 Choice of basis for the Krylov subspace
We saw how different basis for the Krylov spaces is characteristic for various methods, e.g. GMRES uses orthogonal CG uses A-orthogonal
This is true for other methods as well Conjugate Residual (CR; for symmetric problems) uses T A A-orthogonal (i.e. Api ’s are orthogonal) AT A-orthogonal basis can be generalized to the non-symmetric case as well, e.g. in the Generalized Conjugate Residual (GCR)
CS 594-0004, 04-23-2008 Other Krylov methods
We considered various methods that construct a basis for the Krylov subspaces
Another big class of methods is based on biortogonalization (algorithm due to Lanczos) For non-symmetric matrices build a pair of bi-orthogonal bases for the two subspaces
m−1 Km(A, v1) = span{v1, Av1,..., A v1} T T T m−1 Km(A , w1) = span{w1, A w1,..., (A ) w1}
Examples here are BCG and QMR (not to be discussed) These methods are more difficult to analyze
CS 594-0004, 04-23-2008 Part II
Convergence and preconditioning
CS 594-0004, 04-23-2008 Convergence
Convergence can be analyzed by Exploit the optimality properties (of projection) when such properties exist A useful tool is Chebyshev polynomials Depend on the condition number of the matrix, e.g. in CG it is !i pk(A) − 1 ||ei ||A ≤ 2 ||e0||A pk(A) + 1
CS 594-0004, 04-23-2008 Preconditioning
Convergence can be slow or even stagnate for ill-conditioned matrices (with large condition number)
But can be improved with preconditioning
xi+1 = xi + P(b − Axi )
Think of P as a preconditioner, an operator/matrix P ≈ A−1 for P = A−1 it takes 1 iteration
CS 594-0004, 04-23-2008 Preconditioning
Properties desired in a preconditioner:
Should approximate A−1 Should be easy to compute, apply to a vector, and store
Iterative solvers can be extended to support preconditioning (How?)
CS 594-0004, 04-23-2008 Preconditioning
Extending Iterative solvers to support preconditioning The same solver can be used but on a modified problem, e.g. Problem Ax = b is transformed into
PAx = Pb
known as left preconditioning Problem Ax = b is transformed into
APx = b, x = Pu
known as right preconditioning Convergence of the modified problem would depend on k(PA) (e.g. with left preconditioning)
CS 594-0004, 04-23-2008 Preconditioning
Examples: Incomplete LU factorization (e.g. ILU(0)) Jacobi (inverse of the diagonal) Other stationary iterative solvers (GS, SOR, SSOR) Block preconditioners and domain decomposition Additive Schwarz (thing of Block-Jacobi) Multiplicative Schwarz (think of Block-GS)
CS 594-0004, 04-23-2008 Preconditioning
Examples so far: algebraic preconditioners, i.e. exclusively based on the the matrix
Often, for problems coming from PDEs, PDE and discretization information can be used in designing a preconditioner, e.g. FFTs’ can be involved to approximate differential operators on regular grids (as in Fourier space the operators are diagonal matrices) Grid and problem information to define multigrid preconditioners Indefinite problems are often composed of sub-blocks that are definite: used in defining specific preconditioners and even modify solvers for these needs, etc.
CS 594-0004, 04-23-2008 Part III
Iterative eigen-solvers
CS 594-0004, 04-23-2008 Iterative Eigen-Solvers How are iterative eigensolvers related to Krylov subspaces?
Remember projection slides 29 & 30, Lecture 7 (left)
Again, as in linear solvers, Projection in a subspace is the basis for an iterative eigen-solver V and W are often based on Krylov subspaces
2 m−1 Km(A, r0) = span{r0, Ar0, A r0,..., A r0}
where r0 ≡ b − Ax0 and x0 is an initial guess.
Often parts of V or W are orthogonalized For stability The orthogonalization can be CGS, MGS, Cholesky or Householder based, etc. The smaller Rayleigh-Ritz are usually solved with LAPACK routines
CS 594-0004, 04-23-2008 Learning Goals
A brief introduction to Krylov iterative solvers and eigen-solvers Links to building blocks that we have already covered Abstract formulation Projection, and Orthogonalization Specific examples and issues (preconditioning, parallelization, etc.)
CS 594-0004, 04-23-2008