Iterative methods for Linear System

JASS 2009

Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline

●Basics:  Matrices and their properties  Eigenvalues, Condition Number ●Iterative Methods  Direct and Indirect Methods ● Methods  Ritz Galerkin: CG  Minimum Residual Approach : GMRES/MINRES  PetrovGaelerkin Method: BiCG, QMR, CGS

1st April JASS 2009 2 Basics

●Linear system of equations Ax = b ●A Hermitian (or self-adjoint matrix ) is a square matrix with complex entries which is equal to its own conjugate transpose, that is, the element in the ith row and jth column is equal to the  3 2 + i A =   complex conjugate of the element in the jth row and ith column, for 2 − i 1 all indices i and j 

• Symmetric if a ij = a ji • Positive definite if, for every nonzero vector x XTAx > 0 • Quadratic form: 1 f( x) === xTAx −−− bTx +++ c 2 ∂  f( x) • Gradient of Quadratic form: ∂x  1 1 1 f' (x) =  ⋮  = ATx + Ax − b  ∂  2 2  f( x) ∂xn 

1st April JASS 2009 3 Various quadratic forms

Positivedefinite Negativedefinite matrix matrix

Singular Indefinite positiveindefinite matrix matrix

1st April JASS 2009 4 Various quadratic forms

1st April JASS 2009 5 Eigenvalues and Eigenvectors For any n×n matrix A, a scalar λ and a nonzero vector v that satisfy the equation Av =λv are said to be the eigenvalue and the eigenvector of A. ●If the matrix is symmetric , then the following properties hold: (a) the eigenvalues of A are real (b) eigenvectors associated with distinct eigenvalues are orthogonal ●The matrix A is positive definite (or positive semidefinite) if and only if all eigenvalues of A are positive (or nonnegative).

1st April JASS 2009 6 Eigenvalues and Eigenvectors

Why should we care about the eigenvalues? Iterative methods often depend on applying A to a vector over and over again :

(a) If | λ|<1, then Aiv=λiv vanishes as i approaches infinity

(b) If | λ|>1, then Aiv=λiv will grow to infinity.

1st April JASS 2009 7 Some more terms: Spectral radius of a matrix is: ρ(A)= max| λ |  i Condition number is : K= max min

Error : e = x exact – x app

Residual : r = b-A.x app

1st April JASS 2009 8 Preconditioning Preconditioning is a technique for improving the condition number of a matrix. Suppose that M is a symmetric, positivedefinite matrix that approximates A, but is easier to invert. We can solve Ax = b indirectly by solving M1 Ax = M 1 b

1 Type of preconditioners: ●Perfect preconditioner M = A Condition number =1 solution in one iteration but Mx=b is not useful preconditioner ●Diagonal preconditioner, trivial to invert but mediocre ●Incomplete Cholesky : A LL T ● Not always stable

1st April JASS 2009 9 Stationary and nonstationary methods

Stationary methods for Ax = b : x(k+1) =Rx (k) + c neither R or c depend upon the iteration counter k.

●Splitting of A A = M K with nonsingular M Ax= Mx Kx = b x = M 1 Kx – M 1 b = Rx +c Examples: ● Jacobi method ● GaussSeidel ● Successive Overrelaxation (SOR)

1st April JASS 2009 10 Jacobi Method

●Splitting for Jacobi Method, M=D and K=L +U x(k+1) = D 1 (( L+U )x(k) + b ) solve for x from equation i, assuming other entries fixed i for i = 1 to n for j = 1 to n u (k+1) = (u (k) + u (k) + u (k) + u (k))/4 i,j i-1,j i+1,j i,j-1 i,j+1

1st April JASS 2009 11 GaussSiedel Method and SOR(SuccessiveOverRelaxation)

Splitting for Jacobi Method, M=DL and K=U x(k+1) = (DL )1 (U x(k) + b ) While looping over the equations, use the most recent values x i for i = 1 to n for j = 1 to n u (k+1) = (u (k+1) + u (k) + u (k+1) + u (k))/4 i,j i-1,j i+1,j i,j-1 i,j+1

Splitting for SOR: x(k+1) = ωx (k+1) + (1 ω) x (k) i i OR x(k+1) = (DωL)1 (ωU + (1 ω) D ) x(k) + ω (DωL)1 b

1st April JASS 2009 12 Stationary and nonstationary methods

●Nonstationary methods: ● The constant are computed by taking inner products of residual or other vectors arising from the ● Examples: ● Conjugate gradient (CG) ● Minimum Residual (MINRES) ● Generalized Minimal Residual (GMRES) ● BiConjugate Gradient (BiCG) ● Quasi Minimal Residual (QMR) ● Conjugate Gradient Squared (CGS)

1st April JASS 2009 13 Descent Algorithms

Fundamental underlying structure for almost all the descent algorithms: ● Start with an initial point ● Determine according to a fixed rule a direction of movement ● Move in that direction to a relative minimum of the objective function ● At the new point, a new direction is determined and the process is repeated. ● The difference between different algorithms depends upon the rule by which successive directions of movement are selected

1st April JASS 2009 14 The Method of Steepest Descent

• In the method of steepest descent, one starts with an arbitrary point

x(0) and takes a series of steps x(1) , x(2) , … until we are satisfied that we are close enough to the solution. • When taking the step, one chooses the direction in which f decreases most quickly, i.e.

• Definitions: −−− f' (x(i) ) === b −−− Ax (i)

error vector: e(i) =x(i) x

residual: r(i) =bAx (i) • From Ax =b, it follows that

r(i) = Ae (i) =f’( x(i) ) Residual is direction of Steepest Descent

1st April JASS 2009 15 The Method of Steepest Descent

Starting at Find the (2,2) take point of steps in intersection direction of of these steepest surfaces that descent of f minimizes f

The gradient of the The parabola bottomost is the point is intersection of orthogonal to surfaces gradient of previous step

1st April JASS 2009 16 The Method of Steepest Descent

1st April JASS 2009 17 The Method of Steepest Descent • The algorithm

r(i) = b − Ax (i) rT r Two matrixvector (i) (i) multiplications are α(i) = rT Ar required. (i) (i)

x(i + )1 = x(i) + α(i) r(i) => e(i + )1 = e(i) + α(i) r(i)

• To avoid one matrixvector multiplication, one uses

r(i + )1 = r(i) − α(i) Ar (i)

The disadvantage of using this recurrence is that the residual sequence is

determined without any feedback from the value of x(i) , so that roundoff errors may cause x(i) to converge to some point near x.

1st April JASS 2009 18 Steepest Descent Problem

●The gradient at the minimum of a line search is orthogonal to the direction of that search ⇒ the steepest descent algorithm tends to make right angle turns, taking many steps down a long narrow potential well. Too many steps to get to a simple minimum.

1st April JASS 2009 19 The Method of Conjugate Directions

Basic idea: • Mathematical formulation: • Pick a set of orthogonal 1. For each step we choose a search directions d(0) , d(1) , … , point d(n1) x(i+1) =x(i) +α (i) d(i) • Take exactly one step in each search direction to line up with 2. To find α , we use the fact x (i) that e(i+1) is orthogonal to d(i) • Solution will be reached in n steps

1st April JASS 2009 20 The Method of Conjugate Directions

• To solve the problem of not knowing e(i) , one makes the search directions to be Aorthogonal rather then orthogonal to each other, i.e.: T d )i( Ad )j( = 0

1st April JASS 2009 21 The Method of Conjugate Directions

• The new requirement is now that e(i+1) is Aorthogonal to d(i) d dx f( x ) = ('f x )T (i + )1 = 0 dα (i + )1 (i + )1 dα T r(i + )1 d(i) = 0 T d )i( Ae i( + )1 = 0 T d )i( A()e )i( + α )i( d )i( = 0 dT r (i) (i) α )i( = dT Ad (i) (i)

If the search vectors were the residuals, this formula would be identical to the method of steepest descent.

1st April JASS 2009 22 The Method of Conjugate Directions

• Calculation of the Aorthogonal search directions by a conjugate Gram-Schmidt process

1. Take a set of linearly independent vectors u0, u1, … , un1

2. Assume that d(0) =u0

3. For i>0, take an ui and subtracts all the components from it that are not Aorthogonal to the previous search directions T i−1 u(i) Ad )j( d )i( = u )i( + ∑ βij d )j( , βij = − T j=0 d(j) Ad )j(

1st April JASS 2009 23 The Method of Conjugate Directions

• The method of Conjugate Gradients is simply the method of conjugate directions where the search directions are constructed by conjugation of the residuals, i.e. ui=r(i) • This allows us to simplify the calculation of the new search direction because T T  1 r r r r  )i( )i( = )i( )i( i = j +1 β = T T ij α i( − )1 d i( − )1 Ad i( − )1 r i( − )1 r i( − )1  0 i > j +1 • The new search direction is determined as a linear combination of the previous search direction and the new residual

d(i + )1 = r i( + )1 + βid(i)

1st April JASS 2009 24 The Method of Conjugate Directions

x0 = 0, r 0 = b, d 0 = r 0 for k = 1, 2, 3, . . . T T αk = (r k1 rk1 ) / (d k1 Ad k1 ) step length

xk = x k1 + α k d k1 approx solution

rk = r k1 – α k Ad k1 residual T T βk = (r k rk) / (r k1 rk1 ) improvement

dk = r k + β k d k1 search direction

• One matrixvector multiplication per iteration • Two vector dot products per iteration • Four nvectors of working storage

1st April JASS 2009 25 Krylov subspace Krylov subspace К is the linear combinations of b, Ab,...,A j−1b. j Krylov matrix Kj =[ b Ab A 2b ... A j−1b ] .

Methods to construct a for К : j Arnoldi's method and Lanczos method

Approaches to choosing a good x in К : j j

● RitzGalerkin approach: r =b Ax is orthogonal to К j j j (Conjugate Gradient)

● Minimum Residual approach r has minimum norm for x in К j j j (GMRES and MINRES)

● PetrovGalerkin approach: r is orthogonal to a different j space К (A T) (Biconjugate Gradient) j

1st April JASS 2009 26 Arnoldi´s Method The best basis q ,...,q for the Krylov subspace К is orthonormal. 1 j j Each new q comes from orthogonalizing t = Aq to the basis vectors j j−1 q ,...,q that are already chosen. The iteration to compute these 1 j orthonormal q’s is Arnoldi’s method. q = b / ||b|| % Normalize b to ||q || = 1 1 1 for j = 1,...,n 1 % Start computation of q − j+1 t = Aq % one j for i = 1,...,j % t is in the space K j+1 h = q T % h qT =projection of t on q ij i ij i i t = t h q % Subtract that projection ij i end; % t is orthogonal to q ,...,q 1 j h = ||t|| % Compute the length of t j+1,j q = t / h % Normalize t to ||q ||=1 j+1 j+1,j j+1 end %q ,... q are orthnormal 1 n

AQ = Q H H is upper n1 n n,n1 n,n1

1st April JASS 2009 27 Lanczos Method Lanczos method is specialized Arnoldi iteration, if A is symmetric (real) H = QT A Q n1,n1 n1 n1

H is tridiagonal and this means that in the process, each n1,n1 new vector has to be orthogonalized with respect to the previous two vectors only,since the inner products vanish.

Β =0, q =0, b= arbitrary, q =b / ||b|| 0 0 1 for i = 1,...,n −1 v = Aq j α = q T v i i v = v – Β q αq i1 i1 i Β = ||v|| i q = v / Β j+1 i end

1st April JASS 2009 28 Minimum Residual Methods Problem: If A is not symmetric positive definite , CG is not guaranteed to solve Ax=b.

Solution: Minimum Residual Methods. Choose x in the Krylov subspace К so that || b - A x || is minimal j j j

The first orthonormal vectors q ,...,q go in the columns Q so Q TQ = I 1 j j j j Setting x = Q y j j || r ||= || b - Ax || = || b – AQ y|| = || b – Q H y|| j j j j+1 j+,1,j

Using first j columns of Arnoldi's formula AQ = QH

First j columns of QH =

1st April JASS 2009 29 Minimum Residual Methods

The problem becomes: Choose y to minimize || r ||= || QT b – H y|| j j+1 j+,1,j This is least squares problem. Using zeros in H and Qt b to find a fast algorithm that computes y. j+1

GMRES (Generalised Minimal Residual Approach) A is not symmetric and the upper triangular part of H can be full. All previously computed vectors have to be stored.

MINRES :(Minimal Residual Approach) A is symmetric (likely indefinite) and H is tridiagonal. Avoids storageof all basis vectors for the Krylov subspace

Aim: to clear out the nonzero diagonal below the main diagonal of H. This is done by Givens rotations

1st April JASS 2009 30 GMRES

Algorithm: GMRES q = b / ||b || 1 for j = 1, 2, 3... step j of Arnoldi iteration || r ||= || QT b – H y|| Find y to minimize j j+1 j+,1,j x = Q y j j

Full-GMRES : The upper triangle in H can be full and step j becomes expensive and possibly it is inaccurate as j increases.

GMRES( m): Restarts the GMRES algorithm every m steps However tricky to choose m.

1st April JASS 2009 31 PetrovGalerkin approach

● r is orthogonal to a different space К (A T) j j

● BiCG (BiConjugate Gradient) ● QMR (Quasi Minimum Residual) ● CGS (Conjugate Gradient Squared)

1st April JASS 2009 32 Lanczos BiOrthogonalization Procedure

●Extension of the symmetric

●Builds a pair of bi-orthogonal bases for the two subspaces K (A, v ) and K (AT,w ) m 1 m 1

1st April JASS 2009 33 BiConjugate Gradient (BiCG)

1st April JASS 2009 34 Quasi Minimum Residual (QMR)

●QMR uses unsymmetric Lanczos algorithm to generate a basis for the Krylov subspaces ●The lookahead technique avoids breakdowns during Lanczos process and makes QMR robust.

1st April JASS 2009 35 Conjugate Gradient Squared (CGS)

1st April JASS 2009 36 Summary

●Stationary Iterative Solvers : ● Jacobi, GaussSeidel, SOR ●NonStationary Solvers: ● Krylov subspace methods ● Conjugate Gradient ● Symmetric postive definite systems ● GMRES and MINRES ● Nonsymmetric matrices, but expensive ● BiCG ● Nonsymmetric, two matrixvector product ● QMR ● Nonsymmetric, avoids irregular convergence of BiCG ● CGS ● Nonsymmetric, faster than BICG, does not require transpose

1st April JASS 2009 37 References

●An Introduction to the Conjugate Gradient Method Without the Agonizing Pain Jonathan Richard Shewchuk August 4, 1994 ●Closer to the solution: Iterative linear solvers – Gene h. Golub , Henk A. Van der Vorst ●Krylov subspaces and Conjugate Gradient Gilbert Strang

1st April JASS 2009 38 Thank You !

1st April JASS 2009 39