Iterative Methods in Linear Algebra (Part 2)

Iterative Methods in Linear Algebra (part 2) Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 23, 2008 CS 594-0004, 04-23-2008 CS 594-0004, 04-23-2008 Outline Part I Krylov iterative solvers Part II Convergence and preconditioning Part III Iterative eigen-solvers CS 594-0004, 04-23-2008 Part I Krylov iterative solvers CS 594-0004, 04-23-2008 Krylov iterative solvers Building blocks for Krylov iterative solvers covered so far Projection/minimization in a subspace Petrov-Galerkin conditions Least squares minimization, etc. Orthogonalization CGS and MGS Cholesky or Householder based QR CS 594-0004, 04-23-2008 Krylov iterative solvers We also covered abstract formulations for iterative solvers and eigen-solvers What are the goals of this lecture? Give specific examples of Krylov solvers Show how examples relate to the abstract formulation Show how examples relate to the building blocks covered so far, specidicly to Projection, and Orthogonalization But we are not going into the details! CS 594-0004, 04-23-2008 Krylov iterative solvers How are these techniques related to Krylov iterative solvers? Remember projection slides 26 & 27, Lecture 7 (left) Projection in a subspace is the basis for an iterative method Here projection is in V In Krylov methods V is the Krylov subspace 2 m−1 Km(A, r0) = span{r0, Ar0, A r0,..., A r0} where r0 ≡ b − Ax0 and x0 is an initial guess. Often V or W are orthonormalized The projection is ’easier’ to find when we work with an orthonormal basis (e.g. problem 4 from homework 5: projection in general vs orthonormal basis) The orthonormalization can be CGS, MGS, Cholesky or Householder based, etc. CS 594-0004, 04-23-2008 Krylov Iterative Methods To summarize, Krylov iterative methods in general expend the Krylov subspace by a matrix-vector product, and do a projection in it. Various methods result by specific choices of the expansion and projection. CS 594-0004, 04-23-2008 Krylov Iterative Methods A specific example with the Conjugate Gradient Method (CG) CS 594-0004, 04-23-2008 Conjugate Gradient Method The method is for SPD matrices Both V and W are the Krylov subspaces, i.e. at iteration i i−1 V ≡ W ≡ Ki (A, r0) ≡ span{r0, Ar0,..., A r0} The projection xi ∈ Ki (A, r0) satisfies the Petrov-Galerkin conditions (Axi , φ) = (b, φ), for ∀φ ∈ Ki (A, r0) CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued) At every iteration there is a way (to be shown later) to construct a new search direction pi such that span{p0, p1,..., pi } ≡ Ki+1(A, r0) and (Api , pj ) = 0 for i 6= j. Note: A is SPD ⇒ (Api , pj ) ≡ (pi , pj )A can be used as an inner product, i.e. p0,..., pi is an (·, ·)A orthogonal basis for Ki+1(A, r0) ⇒ we can easily find xi+1 ≈ x as xi+1 = x0 + α0p0 + ··· + αi pi s.t. (Axi+1, pj ) = (b, pj ) for j = 0,..., i Namely, because of the (·, ·)A orthogonality of p0,..., pi at iteration i + 1 we have to find only αi (ri , pi ) (Axi+1, pj ) = (A(xi + αi pi ), pi ) = (b, pi ), ⇒ αi = (Api , pi ) Note: xi above actually can be replaced by any x0 + v, v ∈ Ki (A, r0)(Why?) CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued) Note: Conjugate Gradient Method One matrix vector product/iteration (at line 9) 1: Compute r0 = b − Ax0 for some initial guess x0 Two inner-products/iteration (lines 3 and 10) 2: for i = 0 to ... do In exact arithmetic r = b − Ax 3: ρ = r T r i+1 i+1 i i i (Apply A to both sides of 11 and subtract from b to 4: if i = 0 then get line 12) 5: p0 = r0 Update for x is as pointed out before, i.e. with 6: else i+1 ρi 7: pi = ri + pi−1 ρi−1 (ri , ri ) (ri , pi ) 8: end if αi = = 9: qi = Api (Api , pi ) (Api , pi ) ρi 10: αi = T pi qi since (ri , pi−1) = 0 (exercise) 11: xi+1 = xi + αi pi 12: ri+1 = ri − αi qi Other relations to be proved (exercise) 13: check convergence; continue if necessary pi s’ span the Krylov space end for 14: pi s’ are (·, ·)A orthogonal, etc. CS 594-0004, 04-23-2008 Conjugate Gradient Method (continued) To sum it up: In exact arithmetic we get the exact solution in at most n steps, i.e. x = x0 + α0p0 + ··· + αi pi + αi+1pi+1 + ··· + αn−1pn−1 At every iteration one more term αj pj is added to the current approximation xi = x0 + α0p0 + ··· + αi−1pi−1 xi+1 = x0 + α0p0 + ··· + αi−1pi−1 +αi pi ≡ xi + αi pi Note: we do not have to solve linear system at every iteration because of the A-orthogonal basis that we manage to maintain and expend at every iteration It can be proved that the error ei = x − xi satisfies !i pk(A) − 1 ||ei ||A ≤ 2 ||e0||A pk(A) + 1 CS 594-0004, 04-23-2008 Building orthogonal basis for a Krylov subspace Building orthogonal basis for a Krylov subspace We have seen the importance in Defining projections not just for linear solvers Abstract linear solvers and eigen-solver formulations A specific example in CG where the basis for the Krylov subspaces is A-orthogonal (A is SPD) We have seen how to build it CGS, MGS, Cholesky or Householder based, etc. These techniques can be used in a method specifically designed for Krylov subspaces (general non-Hermitian matrix), namely in the Arnoldi’s Method CS 594-0004, 04-23-2008 Arnoldi’s Method Note: This orthogonalization is based on CGS (line 4) Arnoldi’s method: wj = Avj − (Avj , v1)v1 − ... − (Avj , vj )vj Build an orthogonal basis for Km(A, r0) A can be general, non-Hermitian ⇒ up to iteration j vectors 1: v1 = r0 v1, ..., v 2: for j = 1 to m do j 3: hij = (Avj , vi ) for i = 1, .., j are orthogonal 4: wj = Avj − h1j v1 − ... − hjj vj The space of this orthogonal basis grows by taking the 5: hj+1,j = ||wj ||2 next vector to be Avj 6: if hj+1,j = 0 Stop wj If we do not exit at step 6 we will have 7: vj+1 = hj+1,j 8: end for Km(A, r0) = span{v1, v2,..., vm} (exercise) CS 594-0004, 04-23-2008 Arnoldi’s Method (continued) Arnoldi’s method in matrix notation Denote Vm ≡ [v1,..., vm], Hm+1 = {hij }m+1×m and by Hm the matrix Hm+1 without the last row. Note that Hm is upper Hessenberg (0s below the lower second sub-diagonal) and T AVm = VmHm + wmem T Vm AVm = Hm (exercise) CS 594-0004, 04-23-2008 Arnoldi’s Method (continued) Variations: Explained using CGS Can be implemented with MGS, Householder, etc. How to use it in linear solvers? Example with the Full Orthogonalization Method (FOM) CS 594-0004, 04-23-2008 FOM Look for solution in the form xm = x0 + ym(1)v1 + ··· + ym(m)vm ≡ x0 + Vmym FOM Petrov-Galerkin conditions will be 1: β = ||r0||2 2: Compute v1,..., vm with Arnoldi T T −1 Vm Axm = Vm b 3: ym = βHm e1 T T 4: xm = x0 + Vmym ⇒ Vm A(x0 + Vmym) = Vm b T T ⇒ Vm AVmym = Vm r0 T ⇒ Hmym = Vm r0 = βe1 which is given by steps 3 and 4 of the algorithm CS 594-0004, 04-23-2008 Restarted FOM What happens when m increases? computation grows as at least O(m2)n memory is O(mn) A remedy is to restart the algorithm, leading to restarted FOM FOM(m) 1: β = ||r0||2 2: Compute v1,..., vm with Arnoldi −1 3: ym = βHm e1 4: xm = x0 + Vmym. Stop if residual is small enough. 5: Set x0 := xm and go to 1 CS 594-0004, 04-23-2008 GMRES Generalized Minimum Residual Method (GMRES) Similar to FOM Again look for solution xm = x0 + Vmym where Vm is from the Arnoldi process (i.e. Km(A, r0)) The test conditions Wm from the abstract formulation (slide 27, Lecture 7) T T Wm AVmym = Wm r0 are Wm = AVm. The difference results in step 3 from FOM, namely −1 ym = βHm e1 being replaced by ym = argminy ||βe1 − Hm+1y||2 CS 594-0004, 04-23-2008 GMRES Similarly to FOM, GMRES can be defined with Various orthogonalizations in the Arnoldi process Restart Note: Solving the least squares (LS) problem argminy ||βe1 − Hm+1y||2 can be done with QR factorization as discussed in Lecture 7, Slide 25 CS 594-0004, 04-23-2008 Lanczos Algorithm Can we improve on Arnoldi if A is symmetric? Yes! Hm becomes symmetric so it will be just 3 diagonal the simplification of Arnoldi in this case leads to the Lanczos Algorithm Lanczos can be used in deriving CG The Lanczos Algorithm Matrix H here is 3-diagonal with diagonal r0 m 1: v1 = , β1 = 0, v0 = 0 ||r0||2 2: for j = 1 to m do hii = αi 3: wj = Avj − βj vj−1 4: αj = (wj , vj ) and off diagonal 5: wj = wj − αj vj 6: βj+1 = ||wj ||2. If βj+1 = 0 then Stop hi,i+1 = βi+1 wj 7: vj+1 = βj+1 8: end for In exact arithmetic vi s’ are orthogonal but in reality orthogonalization gets lost rapidly CS 594-0004, 04-23-2008 Choice of basis for the Krylov subspace We saw how different basis for the Krylov spaces is characteristic for various methods, e.g.

Iterative Methods in Linear Algebra (Part 2)

Krylov Subspaces

Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product

Limited Memory Block Krylov Subspace Optimization for Computing Dominant Singular Value Decompositions

Power Method and Krylov Subspaces

Section 5.3: Other Krylov Subspace Methods

A Brief Introduction to Krylov Space Methods for Solving Linear Systems

Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods1

Iterative Methods and Sparse Linear Algebra

Krylov Subspace Iteration

Modeling Circulating Cavity Fields Using the Discrete Linear Canonical

A Restarted Krylov Subspace Method for the Evaluation of Matrix Functions∗

Krylov Type Methods for Large Scale Eigenvalue Computations