AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Arnoldi Methods Dianne P
Total Page:16
File Type:pdf, Size:1020Kb
AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Arnoldi Methods Dianne P. O'Leary c 2006, 2007 Algorithms that use the Arnoldi Basis Reference: Chapter 6 of Saad The Arnoldi Basis ² How to use the Arnoldi Basis ² FOM (usually called the Arnoldi iteration) ² GMRES ² Givens Rotations ² Relation between FOM and GMRES iterates ² Practicalities ² Convergence Results for GMRES ² The Arnoldi Basis Recall: The Arnoldi basis v1; : : : ; vn is obtained column by column from the ² relation AV = VH where H is upper Hessenberg and VT V = I. In particular, hij = (Avj ; vi). ² The vectors vj can be formed by a (modi¯ed) Gram Schmidt process. ² For three algorithms for computing the recurrence, see Sec 6.3.1 and 6.3.2. Facts about the Arnoldi Basis Let's convince ourselves of the following facts: 1 The ¯rst m columns of V form an orthogonal basis for m(A; v ). ² K 1 Let Vm denote the ¯rst m columns of V, and Hm denote the leading ² m m block of H. Then £ T AVm = VmHm + hm+1;mvm+1em: The algorithm breaks down if hm ;m = 0. In this case, we cannot get a ² +1 new vector vm+1. This breakdown means that m (A; v ) = m(A; v ). In other words, ² K +1 1 K 1 m(A; v ) is an invariant subspace of A of dimension m. K 1 How to use the Arnoldi Basis We want to solve Ax = b. We choose x = 0 and v = b=¯, where ¯ = b . 0 1 k k Let's choose xm m(A; v ). 2 K 1 Two choices: Projection: ² Make the residual orthogonal to m(A; v1). (Section 6.4.1) This is the Full OrthogonalizationK Method (FOM). Minimization: ² Minimize the norm of the residual over all choices of x in m(A; v1). (Section 6.5) This is the Generalized Minimum Residual MethoK d (GMRES). FOM (usually called the Arnoldi iteration) The Arnoldi relation: T AVm = VmHm + hm+1;mvm+1em: If xm m(A; v ), then we can express it as xm = Vmym for some vector ym. 2 K 1 The residual is rm = b Axm ¡ = ¯v1 AVmym ¡ T = ¯v (VmHm + hm ;mvm e )ym: 1 ¡ +1 +1 m 2 To enforce orthogonality, we want T 0 = Vmrm T T T T = ¯V v (V VmHm + hm ;mV vm e )ym: m 1 ¡ m +1 m +1 m = ¯e Hmym: 1 ¡ Therefore, given the Arnoldi relation at step m, we solve the linear system Hmym = ¯e1 and set xm = Vmym; rm = b Axm ¡ = b AVmym ¡ T = ¯Vme1 (VmHm + hm+1;mvm+1em)ym ¡ T = Vm(¯e1 Hmym) hm+1;mvm+1emym ¡ T ¡ = hm ;mvm e ym ¡ +1 +1 m = hm ;m(ym)mvm : ¡ +1 +1 Therefore, the norm of the residual is just hm ;m(ym)m , which can be j +1 j computed without forming xm. GMRES The Arnoldi relation: T AVm = VmHm + hm+1;mvm+1em: The iterate: xm = Vmym for some vector ym. De¯ne H¹m to be the leading (m + 1) m block of H. £ The residual: T rm = ¯v (VmHm + hm ;mvm e )ym 1 ¡ +1 +1 m = ¯v Vm H¹mym 1 ¡ +1 = Vm (¯e H¹mym): +1 1 ¡ The norm of the residual: rm = Vm (¯e H¹mym) = ¯e H¹mym : k k k +1 1 ¡ k k 1 ¡ k We minimize this quantity w.r.t. ym to get the GMRES iterate. 3 Givens Rotations The linear system involving Hm is solved using m Givens rotations. The least squares problems involving H¹ m is solved using 1 additional Givens rotation. Relation between FOM and GMRES iterates F FOM: Hmym = ¯e1, or T F T HmHmym = ¯Hme1: G GMRES: minimize ¯e H¹my by solving the normal equations k 1 ¡ mk ¹ T ¹ G ¹ T Hm Hmym = ¯Hm e1: Note that ¹ Hm Hm = T ; hm ;me · +1 m ¸ T ¹ T ¹ so HmHm and Hm Hm di®er by a matrix of rank 2. Therefore (Sec 6.5.7 and Sec 6.5.8), If we compute the FOM iterate, we can easily get the GMRES iterate, and ² vice versa. It can be shown that the smallest residual computed by FOM in iterations ² 1; : : : ; m is within a factor of pm of the smallest for GMRES. There is a recursion for computing the GMRES iterates given the FOM ² iterates: G 2 G 2 F x = s x ¡ c x ; m m m 1 ¡ m m 2 2 where sm + cm = 1 and cm is the cosine used in the last step of the reduction of H¹ m to upper triangular form. Practicalities Practicality 1: GMRES and FOM can stagnate. 4 It is possible (but unlikely) that GMRES and FOM make no progress at all until the last iteration. Example: Let eT 1 A = ; x = e ; b = e : I 0 true n 1 · ¸ (e is the vector with every entry equal to 1.) Then m(A; b) = span e1; : : : ; em , so the solution vector is orthogonal to m(A; b) foKr m < n. f g K This is called complete stagnation. Partial stagnation can also occur. The usual remedy is preconditioning (discussed later). Practicality 2: m must be kept relatively small. As m gets big, The work per iteration becomes overwhelming, since the upper Hessenberg ² part of Hm is generally dense. In fact, the work for the orthogonalization grows as O(m2n) The orthogonality relations are increasingly contaminated by round-o® ² error, and this can cause the computation to compute bad approximations to the true GMRES/FOM iterates. Two remedies: restarting (Sec 6.4.1 and Sec 6.5.5) ² truncating the orthogonalization (Sec 6.4.2 and Sec 6.5.6) ² Restarting In the restarting algorithm, we run the Arnoldi iteration for a ¯xed number of steps, perhaps m = 100. If the resulting solution is not good enough, we repeat, by setting v1 to be the normalized residual. This limits the work per iteration, but removes the ¯nite-termination property. In fact, due to stagnation, it is possible (but unlikely) that the iteration never makes any progress at all! 5 Truncating the orthogonalization In this algorithm, we modify the Arnoldi iteration so that we only orthogonalize against the latest k vectors, rather than orthogonalizing against all of the old vectors. ¹ approx The matrix Hm then has only k 1 nonzeros above the main diagonal, and the work for orthogonalization is reduced¡ from O(m2n) to O(kmn). We are still working with the same Krylov subspace, but The basis vectors v ; : : : ; vm are not exactly orthogonal. ² 1 We are approximating the matrix H¹ m, so we don't minimize the residual ² (exactly) or make the residual (exactly) orthogonal to the Krylov subspace. All is not lost, though. In Section 6.5.6, Saad works out the relation between the GMRES residual norm and the one produced from the truncated method (quasi-GMRES). Note: Quasi-GMRES is not often used. ² Restarted-GMRES is the standard algorithm. ² Practicality 3: Breakdown of the Arnoldi iteration is a good thing. If the Arnoldi iteration breaks down, with hm+1;m = 0, so that there is no new vector vm+1, then Hm = H¹ m, so both GMRES and FOM compute the same iterate. ² G F More importantly, Arnoldi breaks down if and only if xm = xm = xtrue. ² This follows from the fact that the norm of the GMRES residual is hm ;m(ym)m . j +1 j Practicality 4: These algorithms also work in complex arithmetic. If A or b is complex rather than real, the algorithms still work, as long as we remember to use conjugate transpose instead of transpose. ² 6 use complex Givens rotations (Section 6.5.9): ² c¹ s¹ x = s c y 0 · ¡ ¸ · ¸ · ¸ if y x s = ; c = : x 2 + y2 x 2 + y2 j j j j p p Convergence Results for GMRES (Sec 6.11.4) Two de¯nitions: We say that a matrix A is positive de¯nite if xT Ax > 0. for all x = 0. ² 6 { If A is symmetric, this means that all of its eigenvalues are positive. { For a nonsymmetric matrix, this means that the symmetric part of A, 1 T which is 2 (A + A ), is positive de¯nite. The algorithm that restarts GMRES every m iterations is called ² GMRES(m). GMRES(m) converges for positive de¯nite matrices. Proof: The iteration can't stagnate, because it makes progress at the ¯rst iteration. In particular, if we seek x = ®v1 to minimize r 2 = b ®Av 2 k k k ¡ 1k = bT b 2®bT Av + ®2vT AT Av ; ¡ 1 1 1 where b = ¯v1 and ¯ > 0 then T b Av1 ® = T T > 0; v1 A Av1 so the residual norm is reduced. By working a little harder, we can show that it reduced enough to guarantee convergence. [] GMRES(m) is an optimal polynomial method. The iterates take the form ² xm = x0 + qm¡1(A)r0; where qm¡ is a polynomial of degree at most m 1. 1 ¡ 7 Saying that x is in m(A; r0) is equivalent to saying x = qm¡1(A)r0 for ² some polynomial ofKdegree at most m 1. ¡ The resulting residual is ² rm = r qm¡ (A)Ar 0 ¡ 1 0 = (I qm¡ (A)A)r ¡ 1 0 = pm(A)r0; where pm is a polynomial of degree m satisfying pm(0) = 1. since GMRES(m) minimizes the norm of the residual, it must pick the ² optimal polynomial pm. Let A = X¤X¡1, where ¤ is diagonal and contains the eigenvalues of A. ² (In other words, we are assuming that A has a complete set of linearly independent eigenvectors, the columns of X.) Then rm = min pm(A)r0 k k pm(0)=1 k k ¡1 = min Xpm(¤)X r0 pm(0)=1 k k ¡1 X X min pm(¤) r0 · k k k k pm(0)=1 k kk k ¡1 X X min max pm(¸j) r0 : · k k k k pm(0)=1 j=1;:::;n j jk k Therefore, any polynomial that is small on the eigenvalues can give us a bound on the GMRES(m) residual norm, since GMRES(m) picks the optimal polynomial.