Chapter 8 KRYLOV SUBSPACE METHODS

Chapter 8 KRYLOV SUBSPACE METHODS A more readable reference is the book by Lloyd N. Trefethen and David Bau. 8.1 Krylov Subspaces 8.2 Arnoldi Orthogonalization 8.3 Generalized Minimal Residual Method 8.4 Conjugate Gradient Method 8.5 Biconjugate Gradient Method 8.6 Biconjugate Gradient Stabilized Method 8.1 Krylov Subspaces Saad, Sections 6.1, 6.2 omitting Proposition 6.3 In a Krylov subspace method i−1 xi − x0 ∈Ki(A, r0)= span{r0,Ar0,...,A r0}. We call Ki(A, r0)aKrylov subspace.Equivalently, i−1 xi ∈ x0 +span{r0,Ar0,...,A r0}, which we call a linear manifold. −1 −1 The exact solution A b = x0 + A r0.Theminimal polynomial of A is the polynomial m m−1 p(x)=x + cm−1x + ···+ c1x + c0 of lowest degree m such that p(A)=0.IfA is diagonalizable, m is the number of distinct eigenvalues. To see this, let A have distinct eigenvalues λ1, λ2, ..., λm, and define m p(λ)=(λ − λ1)(λ − λ2) ···(λ − λm)=cmλ + ···+ c1λ + c0. 167 Writing A = XΛX−1,wehave −1 p(A)=X(Λ − λ1I)(Λ − λ2I) ···(Λ − λmI)X =0. If A has is no eigenvalue equal to zero, A−1 is a linear combination of I,A,...,Am−1, 2 3−1 2 3 2 3 1 1 1 3 1 e.g., 4 1 5 = 4 1 5 − 4 1 5 . 2 2 2 1 2 To see this, write m−1 m−2 −1 A + cm−1A + ···+ c1I + c0A =0. −1 m−1 −1 m−1 Hence A r0 ∈ span {r0,Ar0,...,A r0}, A b ∈ x0 +span{r0,Ar0,...,A r0},and −1 xm = A b. This is the finite termination property. (In practice we do not go this far.) Review questions 1. What is a Krylov subspace? i−1 2. Is x0 +span{r0,Ar0,...,A r0} a Krylov subspace? 3. Which of the following methods choose their search directions from a Krylov subspace: cyclic coordinate descent? steepest descent? For each that does, what is its Krylov subspace? 4. What is the minimal polynomial of a square matrix A? 5. What is the finite termination property? Exercise 1. Use the characteristic polynomial of a nonsingular square matrix A to show that A−1 can be expressed as a polynomial of degree at most n − 1inA. 8.2 Arnoldi Orthogonalization Saad, Section 6.3 For numerical stability we incrementally construct orthonormal bases {q1,q2,...,qk} for the Krylov subspaces. However, rather than applying the Gram-Schmidt process to the k−1 sequence r0 , Ar0, ..., A r0, we use what is known as the Arnoldi process.Itisbased on the fact that each Krylov subspace can be obtained from the orthonormal basis of the Krylov subspace of one dimension less using the spanning set q1, q2, ..., qk, Aqk.Inother words a new direction in the expanded Krylov subspace can be created by multiplying the 168 k−1 most recent basis vector qk by A rather than by multiplying A r0 by A.Weremovefrom this new direction Aqk its orthogonal projection onto q1, q2, ..., qk obtaining the direction vk+1 = Aqk − q1h1k − q2h2k −···−qkhkk where the coefficients hi,k are determined by the orthogonality conditions. This computation should be performed using the modified Gram-Schmidt iteration: t = Aqk; for j = 1,2,..., k do { T / ∗ t =(I − qjqj )t ∗ / T hjk = qj t; t = t − qjhjk; } Normalization produces qk+1 = vk+1/hk+1,k. The coefficients hij have been labeled so that we can write AQk = Qk+1H¯k (8.1) where Qk := [q1,q2,...,qk]andH¯k is a k +1byk upper Hessenberg matrix whose (i, j)th element for j ≥ i − 1ishij. Given the basis q1 , q2, ...,qk, we can express any element of the kth dimensional Krylov subspace as Qk y for some k-vector y. Review questions 1. In the Arnoldi process for orthogonalizing a Krylov subspace Ki(A, r0),howiseach new basis vector qk+1 produced? 2. The relationship among the first k + 1 vectors produced by the Arnoldi process for Ki(A, r0) can be summarized as AQk = Qk+1H¯k where Qk is composed of the first k vectors. What are dimensions of H¯k and what T other property does it have? What is Qk AQk in terms of H¯k? 3. Let Qk be composed of the first k vectors of the Arnoldi process for Ki(A, r0). What is r0 in terms of Qk and kr0k? 169 8.3 Generalized Minimal Residual Method Saad, Sections 6.5.1, 6.5.3–6.5.5. GMRES, “generalized minimal residuals,” is a popular iterative method for nonsymmetric matrices A. It is based on the principle of minimizing the norm of the residual kb − Axk2, since the energy norm is available only for an s.p.d. matrix. It is, however, not a gradient method; rather it chooses for the correction xk − x0 that element of the Krylov subspace k−1 span{r0,Ar0,...,A r0} which minimizes the 2-norm of the residual. For numerical stability we construct orthonormal bases {q1,q2,...,qk} for the Krylov subspaces using the Arnoldi process. We can express any element of the kth dimensional Krylov subspace as Qk y for some k-vector y. Thus, the minimization problem becomes min kb − A(x0 + Qk y)k2. y This is a linear least squares problem involving k+1 unknowns and n equations. The number of equations can be reduced to k + 1 by using eq. (8.1) to get kb − A(x0 + Qk y)k2 = kr0 − AQk yk2 = kρq1 − AQk yk2 where ρ = kr0k = kQk+1(ρe1 − H¯ky)k2 = kρe1 − H¯kyk2. Review questions 1. For GMRES applied to Ax = b,whereA is n by n, from what subset of Rn does one choose the approximation xk given an initial guess x0? 2. How many matrix–vector multiplications are required for each iteration of GMRES? 3. What is the optimality property of GMRES? 4. For the GMRES solution from a k-dimensional subspace, one solves a least squares problem of reduced dimension. What are the dimensions of the coefficient matrix in this problem, and what is its special property? 8.4 Conjugate Gradient Method Saad, Sections 6.7.1, 6.11.3. Considered here is the case where A is symmetric. Let Hk be all but the last row of H¯k. T Then Qk AQk = Hk, which is square and upper Hessenberg. Since A is symmetric, Hk is tridiagonal, and we write H¯k = T¯k,Hk = Tk. 170 Clearly, it is unnecessary to compute elements of Tk known to be zero, thus reducing the cost from O(k2)toO(k). Assuming A is also positive definite, we choose xk to be that element of x0 + Kk(A, r0) −1 which is closest to A b in energy norm. Hence, each iterate xk has the following optimality property: −1 −1 |||xk − A b||| =min{|||x − A b||| : x ∈ x0 + Kk(A, r0). In Section 7.3 this is shown to be equivalent to making the residual b − Axk orthogonal to Kk(A, r0)=R(Qk). Writing xk = x0 + Qk y, this becomes T 0=Qk (b − Axk) T = Qk (r0 − AQk y) T = Qk (ρq1 − AQk y) = ρe1 − Tky, a tridiagonal system to solve for y. Although the conjugate gradient method can be derived from the Lanczos orthogonalization described above, it is more usual to start from method of steepest descent. The conjugate gradient method constructs direction pi from the gradients r0,r1,r2,....These directions are conjugate T pi Apj =0ifi =6 j, or orthogonal in the energy inner product. We skip the details and simply state the result: x0 = initial guess; r0 = b − Ax0; p0 = r0; for i =0, 1, 2,...do { rTr α i i i = T ; pi Api xi+1 = xi + αipi; ri+1 = ri − αiApi; rT r p r i+1 i+1 p i+1 = i+1 + T i; ri ri } The cost per iteration is 1 matrix · vector Api T T 2 vector · vector ri+1ri+1, pi (Api) 3 vector + scalar · vector 171 2 3 −1 for a total cost of 1 matrix–vector product and 5n multiplications. For 4 −14−1 5 −1 the cost of matrix·vector is 2.5n “multiplications.” We note that ri ∈Ki(A, r0)andpi ∈Ki(A, r0). Moreover, it can be shown that the gradients {r0,r1,...,ri−1} constitute an orthogonal basis for the Krylov subspace. convergence rate p !k −1 κ2(A) − 1 −1 |||xk − A b||| ≤ p |||x0 − A b||| κ2(A)+1 p 1/2 κ2(A) 1 n 1 To reduce enery norm of error by ε requires ≈ log iterations, e.g., log . 2 ε π ε Review questions 1. What does it mean for the directions of the conjugate gradient method to be conjugate? 2. What is the optimality property of the conjugate gradient method? 3. How many matrix–vector multiplications are required for each iteration of CG? 4. On what property of the matrix does the rate of convergence of conjugate gradient depend? Exercises 1. Given xi,ri,pi, one conjugate gradient iteration is given by T T αi = ri ri/(pi Api) ri+1 = ri − αiApi rT r x x α p p r i+1 i+1 p i+1 = i + i i i+1 = i+1 + T i ri ri where A is assumed to be symmetric positive definite. Suppose you are given static void ax(float[] x, float[] y) { int n = x.length(); // Given x this method returns y where // y[i] = a[i][0]*x[0] +...+ a[i][n-1]*x[n-1] ..

Load more