Chapter 6

Linear Iterative Methods

6.1 Motivation

In Chapter 3 we learned that, in general, solving the linear system of equations n n n 3 Ax = b with A C ⇥ and b C requires (n ) operations. This is too 2 2 O expensive in practice. The high cost begs the following questions: Are there lower cost options? Is an approximation of x good enough? How would such an approximation be generated? Often times we can find schemes that have a much lower cost of computing an approximate solution to x. As an alternative to the direct methods that we studied in the previous chapters, in the present chapter we will describe so called iteration methods for n 1 constructing sequences, x k 1 C , with the desire that x k x := A b, { }k=1 ⇢ ! as k . The idea is that, given some ">0, we look for a k N such that !1 2 x x " k k k with respect to some norm. In this context, " is called the stopping tolerance. In other words, we want to make certain the error is small in norm. But a word of caution. Usually, we do not have a direct way of approximating the error. The residual is more readily available. Suppose that x k is an ap- proximation of x = A 1b. The error is e = x x and the residual is k k r = b Ax = Ae . Recall that, k k k e r k k k (A)k k k. x  b k k k k r k Thus, when (A) is large, k b k , which is easily computable, may not be a good k k ek indicator of the size of the relative error k x k , which is not directly computable. One must be careful measuring the error.k k

149 150 CHAPTER 6. LINEAR ITERATIVE METHODS

6.2 Linear Iterative Schemes

n n Definition 6.2.1 (iterative scheme). Let A C ⇥ with det(A) =0and 2 6 b Cn. An iterative scheme to find an approximate solution to Ax = b is a 2 process to generate a sequence of approximations x via an iteration of { k }k1=1 the form

x k = '(A; b; x k 1; ; x k r ), ··· n given the starting values x 0, x r 1 C ,where ··· 2 n n n n n '( ; ; ; ):C ⇥ C C C · · ··· · ⇥ ⇥···⇥ ! is called the iteration function.Ifr =1, we say that the process is a two-layer scheme, otherwise we say it is a multilayer scheme.

n n n 1 Definition 6.2.2. Let A C ⇥ with det(A) =0and b C .Setx = A b. 2 6 2 The two-layer iterative scheme

x k = '(A; b; x k 1), 1 is said to be consistent i↵ x = '(A, b, x), i.e., x = A b is a fixed point of '(A, b, ). The scheme is linear i↵ ·

'(A; ↵b1 + b2; ↵x 1 + x 2)=↵'(A; b1; x 1)+'(A; b2; x 2),

n for all ↵, C and x 1, x 2 C . 2 2 n n n Proposition 6.2.3. Let A C ⇥ with det(A) =0and b C . Any two-layer, 2 6 2 linear and consistent scheme can be written in the form

x = x + Cr(x )=Cb +(I CA)x , (6.2.1) k+1 k k n k n n for some matrix C C ⇥ ,wherer(z):=b Az is the residual vector. 2 Proof. A two layer scheme is defined by an iteration function

n n n n '( ; ; ):C ⇥ C C C. · · · ⇥ ⇥ ! Given ', define the operator

Cz := '(A, z, 0). 6.2. LINEAR ITERATIVE SCHEMES 151

This is a linear operator, due to the assumed linearity of the iteration function. Consequently, C can be identified as a square matrix. It follows from this definition, using the consistency and linearity of ',that (I CA)w = w '(A, Aw, 0) n = '(A, Aw, w) '(A, Aw, 0) = '(A, 0, w). Furthermore, by linearity, we can write

x k+1 = '(A, b + 0, 0 + x k )

= '(A, b, 0)+'(A, 0, x k ) = Cb +(I CA)x n k = x + C(b Ax ), k k as we intended to show.

If C is invertible, we can, if we like, write 1 C (x x ) Ax = b. k+1 k k Motivated by this form, we have the following generalization. n n n Definition 6.2.4. Let A C ⇥ with det(A) =0and b C . A scheme of 2 6 2 the form B (x x )+Ax = b, k+1 k+1 k k n n where Bk+1 C ⇥ is invertible is called an adaptive two-layer scheme.If 2 Bk+1 = B,whereB is invertible and independent of k, then the scheme is 1 called a stationary two-layer scheme.IfBk+1 = In,where↵k+1 C?, ↵k+1 2 then we say that the adaptive two-layer scheme is explicit. Remark 6.2.5. Consider a stationary two-layer method and assume that B is invertible, then 1 x = x + B (b Ax ), (6.2.2) k+1 k k 1 from this it follows that, if x k converges, then it must converge to x = A b. { } 1 Of course, this form is equivalent to (6.2.1) with C = B . The matrix B in the stationary, two-layer method is called the iterator. Remark 6.2.6. Let us now consider an extreme case, namely B = A. In this case we obtain 1 1 1 x = x + B (b Ax )=x + A (b Ax )=A b, k+1 k k k k that is, we get the exact solution after one step. 152 CHAPTER 6. LINEAR ITERATIVE METHODS

The choice of an iterator comes with two conflicting requirements: 1. B should be easy/cheap to invert.

2. B should approximate the matrix A well. In essence, the art of iterative schemes is concerned with finding good iterators.

n n n Definition 6.2.7. Let A C ⇥ be invertible and b C . Suppose that 1 2 2 x = A b and consider the stationary two-layer scheme (6.2.2) defined by the n n 1 invertible matrix B C ⇥ . The matrix T := In B A is called the error 2 transfer matrix and satisfies

ek+1 = Tek , where e := x x is the error at step k. k k We need to find conditions on T to guarantee that e converges to zero. { k } 6.3 Contraction Mapping Principle (Optional)

Let us first give a very general convergence theory based on the contrac- tion mapping theorem. This section requires some familiarity with Cauchy sequences and Banach spaces. Definition 6.3.1 (contraction). Let X be a normed space. The map T : X ! X is called a contraction i↵there is a real number q (0, 1),suchthat 2 Tx Ty q x y x,y X. k kX  k kX 8 2 Definition 6.3.2 (fixed point). Let T : X X. The element x X is called ! 2 a fixed point of T i↵ Tx = x. Remark 6.3.3. Notice that if x˜ is a fixed point of a stationary two-layer scheme, then 1 x˜ =˜x B (b Ax˜) Ax˜ = b, , i.e., x˜ is the solution to the model problem. Theorem 6.3.4 (contraction mapping principle). Let X be a complete normed linear space (a Banach space) and T : X X a contraction. Then it has a ! unique fixed point x¯, which can be found as the limit of the iteration

xk+1 = Txk . 6.3. CONTRACTION MAPPING PRINCIPLE (OPTIONAL) 153

Moreover, we have the following error estimates 1 q xk+1 xk X xk x¯ X xk xk 1 X. 1+q k k k k  1 q k k Proof. We use the iterative formula given in the statement and note that

k xk+1 xk X = Txk Txk 1 q xk xk 1 X q x1 x0 X. k k k k k k ··· k k This implies that, for m 2, k m 1 2 x x q (q + + q + q + 1) x x . k k+m k kX  ··· k 1 0kX In addition, since q (0, 1) we have that 2 m m 1 1 q 1 q + + q +1= , ··· 1 q  1 q therefore qk x x x x 0 k k+m k kX  1 q k 1 0kX ! as k . This shows that the sequence of iterates is Cauchy, and so it must !1 converge since the space X is complete. Since T is continuous, the limit must be a fixed point. Let us now show uniqueness. Assuming that x and y are two distinct fixed points we get x y = Tx Ty q x y < x y , k kX k kX  k kX k kX which is a contradiction. Finally, the error estimates are obtained by recalling that qk q x x x x x x . k k+m k kX  1 q k 1 0kX  1 q k 1 0kX Passing to the limit m this yields !1 q q x¯ xk X x1 x0 X x¯ xk X xk xk 1 X. k k  1 q k k ,k k  1 q k k Notice also that

xk xk 1 X xk x¯ X + x¯ xk 1 X (1 + q) x¯ xk 1 X k k k k k k  k k or 1 xk xk 1 X x¯ xk 1 X. 1+q k k k k 154 CHAPTER 6. LINEAR ITERATIVE METHODS

An immediate corollary of this is the following.

Corollary 6.3.5 (convergence). If T is the error transfer matrix of a linear, stationary, two-layer iterative scheme and T is a contraction with respect to some matrix norm, then the iterative scheme is convergent. In particular, if n n T < 1 with respect to some induced matrix norm : C ⇥ R,themT k k k·k ! is a contraction.

Proof. The first statement is simple. For the second, notice that, if : n n k·k C ⇥ R is an induced matrix norm with the property that T < 1, then ! k k by consistency of induced norms,

Tx Ty = T(x y) T x y = q x y , k k k kk kk k k k where q := T < 1. k k What is the moral of the story? We need to guarantee that T < 1! But, k k in what norm?

6.4 Spectral Convergence Theory

In the current section, we come to the same conclusion as in the previous one, but we go a bit deeper. The theory in this section is based on the the spectrum (T) of the error transfer matrix. We start with a very technical result.

n n Theorem 6.4.1. For every matrix A C ⇥ and any ">0, there is a vector 2 norm R such that k·kA," ! A ⇢(A)+", k kA,"  n n where, recall, for any M C ⇥ , 2

Mx A," M ," =supk k =sup Mx ," A n A k k x C? x A," x =1 k k 2 k k k kA," is the induced matrix norm with respect to . k·kA," Proof. Appealing to the Schur factorization, Lemma 1.4.11, there is a unitary n n n n matrix P C ⇥ and an upper triangular matrix B C ⇥ such that 2 2 A = PHBP. 6.4. SPECTRAL CONVERGENCE THEORY 155

The diagonal elements of B are the eigenvalues of A. Let us write

B =⇤+U, where ⇤= diag( , , ), with 1 ··· n (A)= , , = (B), { 1 ··· n} n n and U =[ui,j ] C ⇥ is strictly upper triangular. Let >0 be arbitrary, and 2 define 1 1 n D = diag 1, , , . ··· Next, define 1 C := DBD =⇤+E, where 1 E = DUD . Now, observe that E, like U, must be strictly upper triangular, and the elements of E =[ei,j ] must satisfy

0 j i  ei,j = j i . ( ui,j j>i Consequently, the elements of E can be made arbitrarily small in modulus, depending on our choice of . Now, notice that 1 1 A = P D CDP, and, since DP is non-singular, the following defines a vector norm and an induced matrix norm: for any x Cn, 2 x := DPx , k k? k k2 n n and, for any M C ⇥ , 2

M ? =sup Mx ? . k k x =1 k k k k? Observe that Ay = DPAy = CDPy . k k? k k2 k k2 Define z := DPy. Then

Ay = Cz = zHCHCz. k k? k k2 p 156 CHAPTER 6. LINEAR ITERATIVE METHODS

But CHC = ⇤H + EH (⇤+ E)=⇤H⇤+M(), where M():=EH⇤+⇤HE + EHE. As an exercise, the reader should prove that, for a given matrix A, there is a constant K1 > 0, such that

M() K . k k2  1 for all 0 < 1. Thus, using the definition of the spectral radius and the  Cauchy-Schwartz inequality, and induced norm consistency,

zHCHCz = zH⇤H⇤z + zHM()z 2 H max ` z z + z 2 M()z 2  1 ` n | | k k k k   ⇢(A)2 + M() z 2  k k2 k k2 ⇢(A)2 + K z 2 .  1 k k2 To finish up, note that

1 y = (DP) z = z . k k? ? k k2 Hence, y = 1 i↵ z = 1, and k k? k k2 Ay y =1 = Cz z =1 . {k k? |k k? } {k k? |k k2 } Consequently,

A ? =sup Ay ? k k y =1 k k k k? =sup Cz 2 z =1 k k k k2 2 sup ⇢(A) + K1 z 2  z 2=1 k k k k p 2 = ⇢(A) + K1 ⇢(A)+K ,  p 2 for some K > 0, for all 0 < 1. The result follows on choosing 2  min (1,"/K ) .  2 6.4. SPECTRAL CONVERGENCE THEORY 157

n n Theorem 6.4.2. Suppose that A C ⇥ is diagonalizable. Then, there exists 2 avectornorm : Cn R such that k·k? ! A = ⇢(A). k k? Proof. Exercise.

Theorem 6.4.3. Given any two vector norms , : Cn R,there k·ka k·kb ! exist constants 0

In fact, it is not hard to prove a more general result. But with the more general argument, one may not be able to readily quantify the constants of equivalence. m n Theorem 6.4.4. Any two matrix norms , : C ⇥ R,whether k·k↵ k·k ! induced or not, are equivalent. Proof. This is an immediate consequence of Theorem 4.1.3.

n n Exercise 6.4.5. For any A =[ai,j ] C ⇥ show that 2

A max A n A max , (6.4.1) k k k k1  k k where is the induced matrix -norm and, recall, k·k1 1

A max =max ai,j k k 1 i,j n   is the matrix max-norm. 158 CHAPTER 6. LINEAR ITERATIVE METHODS

m n Definition 6.4.6. Suppose that Bk k1=1 C ⇥ is a sequence of matrices. { } ⇢ m n We say that Bk 1 converges i↵there is a matrix B C ⇥ such that, { }k=1 2 for every ">0, there exists a integer K>0, such that, if k is any integer satisfying k K, then it follows that B B <". k k kmax

We write Bk B. ! n n k We say that the square matrix A C ⇥ is convergent to zero i↵ A n n 2 ! O C ⇥ . 2 Remark 6.4.7. The norm in the last definition does not matter. Because of the equivalence of norms, we can use any matrix norm we like, induced or nor, in the definition of convergence.

n n Theorem 6.4.8. Let A C ⇥ . The following are equivalent. 2 1. A is convergent to zero.

2. For some induced matrix norm

lim Ak =0. k !1 3. For all induced matrix norms

lim Ak =0. k !1 4. ⇢(A) < 1.

5. For all x Cn, 2 lim Ak x = 0. k !1 Proof. ((1) (2)): Since A is convergent to zero, using (6.4.1), ) 1 k k A A max 0. n 1  ! Thus Ak 0. 1 ! k ((2) (1)): Suppose that limk A = 0, for some induced matrix norm. ) !1 Since all matrix norms are equivalent, lim Ak =0. k !1 1 6.4. SPECTRAL CONVERGENCE THEORY 159

But, using (6.4.1) again,

k k A max A 0.  1 ! Therefore, A is convergent to zero. ((2) (4)): We recall two facts. First, (Ak )=k (A), which follows from ) the Schur factorization: if A = UTUH,whereT is upper triangular and U is unitary, then Ak = UTk UH. Second, ⇢(A) A , for any induced matrix norm. Therefore, k k ⇢k (A)=⇢(Ak ) Ak .  Thus, if Ak 0, it follows that ! ⇢k (A) 0. ! This implies ⇢(A) < 1. ((4) (2)): By Theorem 6.4.1, there is an induced matrix norm such ) k·k? that A ⇢(A)+", k k?  for any ">0. Recall that the choice of depends upon A and ">0. k·k? Since, by assumption ⇢(A) < 1, there is an ">0suchthat,and,therefore, an induced norm ,suchthat k·k? A ⇢(A)+" =: ✓<1. k k?  Then, using sub-multiplicativity,

Ak A k ✓k 0. ? k k?  ! Consequently, lim Ak =0. k ? !1 ((2) (3)): This follows from the equivalence of induced norms Theo- , rem 6.4.2. If convergence is observed in one induced norm, it is observed in all induced norms. k ((3) (5)): Suppose that limk A = 0 for all induced matrix norms. ) !1 Let x Cn be arbitrary. Then, 2 Ak x Ak x 0 1  1 k k1 ! 160 CHAPTER 6. LINEAR ITERATIVE METHODS

Since Ak 0. Hence Ak x 0. This implies 1 ! 1 ! lim Ak x = 0. k !1

((5) (1)): Suppose that, for any x Cn, ) 2 lim Ak x = 0. k !1 Then, it follows that, for all x, y Cn, 2 y HAk x 0. !

Now, suppose y = ei and x = ej , then, since

H k H k k y A x = ei A ej = A i,j , it follows that ⇥ ⇤ lim Ak =0. k i,j !1 This implies that ⇥ ⇤ lim Ak =0. k max !1 Hence, A is convergent to zero.

n n Corollary 6.4.9. The matrix M C ⇥ is convergent to zero if for some n n 2 induced matrix norm : C ⇥ R, k·k ! M < 1. k k n n Proof. Recall that, for each and every induced matrix norm : C ⇥ R, k·k ! ⇢(M) M , k k where ⇢(M) is the spectral radius of M.

n n n Theorem 6.4.10. Suppose that A, B C ⇥ are invertible, b, x 0 C are 2 2 given, and x = A 1b. (a) The sequence x defined by the linear, two- { k }k1=1 layer, stationary iterative scheme (6.2.2) converges to x for any starting point x i↵ ⇢(T) < 1,whereT is the error transfer matrix T = I B 1A.(b)A 0 n sucient condition for the convergence of x for any starting point x { k }k1=1 0 is the condition that T < 1, for some induced matrix norm. k k 6.4. SPECTRAL CONVERGENCE THEORY 161

Proof. (a) Before we begin the proof, observe that

2 k ek = Tek 1 = T ek 2 = = T e0. ··· Also, observe that x x = A 1b,ask , i↵ e 0,ask . k ! !1 k ! !1 ( ): Suppose that x x = A 1b,ask ,foranyx . Then e 0, ) k ! !1 0 k ! as k ,foranye .Sete = w,where(, w) is any eigen-pair of T, with !1 0 0 w = 1. Then k k1 k ek = e0, and k k 0 ek = w = . k k1 | | k k1 | | It follows that < 1. Since was arbitrary, ⇢(T) < 1. | | ( ): If ⇢(T) < 1, appealing to Theorem 6.4.8, ( k lim ek = lim T e0 = 0, k k !1 !1 for any e .Hence,x x = A 1b,ask ,foranyx . 0 k ! !1 0 (b) Suppose that T < 1 for some induced matrix norm. Since, for any k k induced matrix norm, ⇢(T) T , k k it follows that ⇢(T) < 1. Again, by Theorem 6.4.8,

k lim ek = lim T e0 = 0, k k !1 !1 for any e0.

n n n Theorem 6.4.11. Let A C ⇥ be invertible and b C . Consider solving 2 2 Ax = b using the stationary, two-layer

1 1 x = I B A x + B b, k+1 n k n n 1 where B C ⇥ is invertible. Let T = In B A denote the error transfer 2 matrix, and suppose that T < 1, for some induced matrix norm. k k n 1. The method converges for any starting value x 0 C . 2 2. And the following estimates hold

x x T k x x , k k kk k k 0k T k x x k k x x . k k k1 T k 1 0k k k 162 CHAPTER 6. LINEAR ITERATIVE METHODS

Proof. (1) This is the same as Theorem 6.4.10(b): the method converges, meaning e 0,ask , i↵ ⇢(T) < 1. Since for every induced ma- k ! !1 trix norm, ⇢(T ) T , it follows that ⇢(T) < 1, and, hence, the method k k converges. k (2) It follows, that ek = T e0. By using the consistency and sub-multiplicativity of the induced matrix norm, we find

e Tk e T k e , k k k k 0kk k k 0k k 1 which proves the first part. To see the second, observe that, ek = T e1, k k and thus Tek = T e1. Subtracting the last expression from ek = T e0,we find (I T) e = Tk (x x ). Since T < 1, Theorem 4.2.6 guarantees n k 1 0 k k that I T is invertible, and n

1 1 (I T) . n  1 T k k Hence, 1 k e =(I T) T (x x ) k n 1 0 and using the consistency and sub-multiplicativity of the norm, we get

1 k 1 k e (I T) T x x T x x . k k k n k k k 1 0k1 T k k k 1 0k k k The result is proven.

6.5 Methods

6.5.1 Jacobi Method

n n Let A =[ai,j ] C ⇥ have non-zero diagonal elements, and consider the 2 following splitting of A: A = L + D + U, where D = diag(a , ,a ) is the diagonal part of A, L is the strictly lower 1,1 ··· n,n triangular part of A,andU, its strictly upper triangular part. We choose as iterator B = D. This yields the so-called Jacobi method. In this case, the error transfer matrix is

1 T = T = I D A, (6.5.1) J n 6.5. MATRIX SPLITTING METHODS 163 so that

0 a1,2 a1,n a1,1 ··· a1,1 . . a2,1 0 .. . 2 a2,2 3 TJ = ...... an 1,n . . . 6 an 1,n 1 7 6 an,1 an,n 1 7 6 0 7 6 an,n ··· an,n 7 4 5 Alternatively, we may write the Jacobi method in component form via

n 1 1 1 xi,k+1 =[TJx k ]i + D b i = ai,j xj,k + bi , ai,i ai,i j=1 ⇥ ⇤ Xj=i 6 where x`,k := [x k ]`. Equivalently,

n a x = a x + b . i,i i,k+1 i,j j,k i j=1 Xj=i 6

n n n Theorem 6.5.1. Let A = ai,j C ⇥ be invertible and b C . Then, the 2 2 classical Jacobi iteration method for approximating the solution to Ax = b ⇥ ⇤ is convergent, for any starting value x 0, if A is strictly diagonally dominant (SDD) of magnitude >0, i.e.,

n a a , i =1,...,n. | i,i | | i,j | 8 j=1 Xj=i 6

Proof. Suppose that A is SDD. Let TJ denote the error tranfer matrix for Jacobi’s method, that is,

1 T = I D A, J where D is the diagonal part of A. Since A is SDD of magnitude >0, it 164 CHAPTER 6. LINEAR ITERATIVE METHODS follows that D is invertible, and, therefore, TJ is well-defined. Then

n n 1 TJ =max [TJ]i,j =max i,j ai,j k k1 1 i n 1 i n ai,i   j=1   j=1 X X n a =max i,j 1 i n ai,i   j=1 Xj=i 6 1 n =max ai,j 1 i n ai,i   | | j=1 Xj=i 6 1 max ( ai,i )  1 i n a | |   | i,i | 1 =1 max . 1 i n a   | i,i |

Hence TJ < 1. By Theorem 6.4.10(b) the method converges. k k1 n n Theorem 6.5.2. Suppose that A C ⇥ is column-wise strictly diagonally 2 dominant (SDD) of magnitude >0, i.e.,

n a a ,j=1, ,n, | j,j | | i,j | ··· i=1 Xi=j 6 and b C is given. Then the sequence x k k1=1 generated by the Jacobi 2 { } 1 iteration method converges, for any starting value x 0,tothevectorx = A b.

Proof. It will suce to prove that

⇢(TJ) < 1,

1 n n where TJ = In D A. Since A C ⇥ is column-wise strictly diagonally 2 dominant of magnitude >0, AH is row-wise SDD of magnitude >0. Therefore, by the last Theorem,

H H H H ⇢ In D A In D A < 1.  1 Define T˜ := I D HAH. Then n H 1 T˜ = I AD , n 6.5. MATRIX SPLITTING METHODS 165 and 1 H 1 D T˜ D = I D A = T . n J Therefore H (TJ)=(T˜ )=(T˜), and 1 >⇢(T˜)=⇢(TJ).

6.5.2 Gauß–Seidel Method Recall that Jacobi method can be written in the form i 1 n ai,j xj,k + ai,i xi,k+1 + ai,j xj,k = bi . Xj=1 j=Xi+1 However, at this stage, we have already computed new approximations for components x , s =1,...,i 1, which we are not using. The Gauß–Seidel s method uses these newly computed approximations to obtain the scheme

i n ai,j xj,k+1 + ai,j xj,k = bi , Xj=1 j=Xi+1 As before, we require a = 0, for all 1 i n, so that the method is i,i 6   well-defined. Let us write this scheme in standard form. Recall the splitting A = L + D + U, then the Gauß-Seidel method is the linear iteration process defined via

(L + D)x k+1 + Ux k = b. Therefore 1 1 x = (L + D) Ux +(L + D) b, k+1 k so that the error transfer matrix is

1 1 T = (L + D) U = (A U) U. (6.5.2) GS From this we obtain.

n n Theorem 6.5.3 (convergence of Gauß-Seidel). Suppose that A =[ai,j ] C ⇥ 2 is strictly diagonally dominant (SDD) and b Cn. Then for any starting 2 value x 0, the sequence generated by the Gauß-Seidel method converges to 1 x = A b. 166 CHAPTER 6. LINEAR ITERATIVE METHODS

Proof. Let us define n n a j=i+1 | i,j | = max i 1 . i=1 ( ai,i ai,j ) | P| j=1 | | Owing to the fact that A is SDD, P

n i 1 n a > a = a + a , | i,i | | i,j | | i,j | | i,j | j=1 j=1 j=i+1 Xj=i X X 6 which implies that i 1 n a a > a , | i,i | | i,j | | i,j | Xj=1 j=Xi+1 and thus <1. We will show convergence of the Gauß-Seidel method by proving that TGS . k k1  Let A = L + D + U be the usual decomposition into lower triangular, diagonal, and upper triangular parts, respectively, and set y = TGSx, i.e., (L + D)y = Ux.Wehave i 1 n a y + a y = a x , 1 i n. i,j j i,i i i,j j   Xj=1 j=Xi+1 Let i be such that yi = y . By the triangle inequality, | | k k1 i 1 i 1 i 1 ai,j yj + ai,i yi ai,i yi ai,j yj ai,i ai,j y . | || | | || |0| | | |1 k k1 j=1 j=1 j=1 X X X @ A Also, we have n n ai,j xj ai,j x .  | |k k1 j+1 j+1 X X Consequently, n a j=i+1 | i,j | TGSx = y i 1 x x . k k1 k k1  a a k k1  k k1 | i,iP| j=1 | i,j | This implies that, the all x Cn, P 2 ? TGSx k k1 , x  k k1 6.5. MATRIX SPLITTING METHODS 167 which shows that TGS <1. k k1 

In fact, we can prove a little bit more than the last result suggests. We have n n Theorem 6.5.4. Suppose that A =[ai,j ] C ⇥ is strictly diagonally dominant 2 (SDD) of magnitude >0 and b Cn. Then 2 TGS TJ < 1, k k1 k k1 where TGS and TJ denote, respectively, the error transfer matrices of the Gauß-Seidel and Jacobi methods. In particular, for any starting value x 0,the sequences generated by the Jacobi and the Gauß-Seidel methods both converge 1 to x = A b. Proof. Set ⌘GS := TGS ,⌘J := TJ , k k1 k k1 and 1 =[1, , 1]T . ··· Our argument will simplify significantly with some new notation, which, incidentally, may not be widely used outside of this proof. Define, for any n n matrix C =[ci,j ] C ⇥ , its absolute value 2 n n n C = ci,j C ⇥ . | | | | i,j=1 2 It follows that ⇥ ⇤ n j=1 c1,j . | | C 1 = . and C 1 = C . | | 2P 3 k| | k1 k| |k1 n c 6 j=1 | n,j |7 4 5 n n n Similarly, for any vectorPc =[ci ] C , c =[ci ]i=1 C .Fora =[ai ], b = n 2 | | | | 2 [bi ] C , we write a b i↵ ai bi , for all i =1, ,n. Similarly, for 2  n n ··· matrices, A =[ai,j ], B =[bi,j ] C ⇥ , we write A B i↵ ai,j bi,j , for all 2   1 i,j n.   Under the assumption that ⌘J = TJ <, it follows that k k1 T 1 ⌘ 1 < 1. | J|  J Let A = L + D + U be the usual decomposition into lower triangular, diagonal, and upper triangular parts, respectively. Then

1 1 T = I D A = D (L + U)= L˜ U˜, J n 168 CHAPTER 6. LINEAR ITERATIVE METHODS

1 1 where L˜ := D L is strictly lower triangular and U˜ := D U is strictly upper triangular. Hence T = L˜ + U˜ . | J| From this it follows that U˜ 1 = T 1 L˜ 1 | J| ˜ ⌘J1 L 1  ˜ = ⌘JIn L 1. Since L˜ and L˜ are strictly lower triangular, it follows that

n n n n L˜ = L˜ = O C ⇥ . 2 Furthermore, In + L˜ is invertible, and

1 n 1 I + L˜ = I L˜ + L˜2 L˜3 + + L˜ . n n ··· Then, it is left for an exercise to show that

1 2 3 n 1 O (I + L˜) = I L˜ + L˜ L˜ + + L˜  n n ··· 2 n 1 I + L˜ + L˜ + + L˜  n ··· 1 = I L˜ . n Now, recall and observe that

1 T = I (L + D) A GS n 1 = (L + D) U 1 1 = (L + D) DD U 1 = I + L˜ U˜. n Thus U˜ 1 ⌘ I L˜ 1  J n and 1 1 I L˜ U˜ 1 I L˜ ⌘ I L˜ 1, n  n J n since 1 O I L˜ .  n 6.5. MATRIX SPLITTING METHODS 169

Finally,

1 T 1 I + L˜ U˜ 1 | GS|  n 1 I L˜ U˜ 1  n 1 I L˜ ⌘I L˜ 1  n J n 1 = I L˜ I L˜ +( ⌘ 1)I 1 n n J n 1 = I +( ⌘ 1) I L˜ 1. n J n n o Now, 1 2 n 1 I I L˜ = I + L˜ + L˜ + + L˜ , n  n n ··· and ⌘J < 1 from Theorem 6.5.1, so that T 1 ⌘ 1, | GS|  J which implies that ⌘GS = TGS ⌘J. k k1 

Let us conclude this discussion by, for a special kind of matrices, linking the convergence of the Jacobi method to that of the Gauß-Seidel method.

n n Theorem 6.5.5 (tridiagonal matrices). Let A C ⇥ be tridiagonal with non- 2 zero diagonal elements. Denote by TJ and TGS the error transfer matrices of the Jacobi and Gauß-Seidel methods, respectively. Then we have

2 ⇢(TGS)=⇢(TJ) . In particular, one method converges i↵the other method converges. Proof. We begin with a preliminary statement about tridiagonal matrices. Suppose that b c 0 0 1 2 ··· .. .. . 2a2 b2 . . . 3 .. .. A = 6 0 . . cn 1 0 7 , 6 7 6 . .. 7 6 . . an 1 bn 1 cn7 6 7 6 0 0 an bn7 6 ··· 7 4 5 where bi = 0, for all 1 i n.Let0= µ C and define 6   6 2 1 M(µ):=DAD , 170 CHAPTER 6. LINEAR ITERATIVE METHODS where D = diag(µ, µ2, ,µn). Then ··· b µ 1c 0 0 1 2 ··· .. .. . 2µa2 b2 . . . 3 .. .. 1 M(µ)=6 0 . . µ cn 1 0 7 . 6 7 6 . .. 1 7 6 . . µan 1 bn 1 µ cn7 6 7 6 0 0 µan bn 7 6 ··· 7 It is left as an exercise4 for the reader to show that 5 det (M(µ)) = det (M(1)) = det (A) . Let now A = L + D + U where, as usual, D is diagonal, and L and U are, respectively, strictly lower and strictly upper triangular. From (6.5.1) we have that T = I D 1A, and, therefore, the eigenvalues of T are the zeros of J n J the characteristic polynomial p ()=det(T I ) J J n 1 =det D (L + U) I n 1 =det D (L + U + D) 1 =det( D )q (), ⇥ J ⇤ where we defined the polynomial

qJ():=det(L + D + U) . On the other hand, from (6.5.2) we have that T = (L + D) 1U,and GS so its eigenvalues are the zeros of 1 p ()=det(T I )=det (L + D) q (), GS GS n GS where qGS():=det(L + D + U).

Notice now that 0 = qGS(0) = qJ(0). In addition, both matrices involved in the definitions of qJ and qGS, respectively, are tridiagonal. By the previous statement about determinants of tridiagonal matrices then we have that, if = 0, 6 2 2 2 qGS( )=det( L + D + U) n 1 = det(L + D + U) = n det(L + D + U) n = qJ(), 6.6. RICHARDSON’S METHOD 171 and so this holds for all C. 2 The previous relation shows that

(T )= 1/2, 1/2 (T ) 2 GS ) 2 J and ( (T ) (T )) = 2 (T ), 2 J , 2 J ) 2 GS as we needed to show.

6.6 Richardson’s Method

n n n Let A C ⇥ be invertible and b C be given. Let us consider now what is 2 2 known as Richardson’s method:

x =(I ↵A)x + ↵b. k+1 n k 1 This is a stationary two-layer method that results from choosing B = ↵ In, where ↵ C?. In this case, TR = In ↵A. 2 n n Theorem 6.6.1 (Convergence of Richardson’s Method). Let A C ⇥ be 2 HPD, (A)= , , , with 0 < . Richardson’s { 1 ··· n} 1  2 ··· n method converges i↵ ↵ (0, 2/ ).If↵ (0, 2/ ), we have the estimate 2 n 2 n e ⇢k e ,⇢=max 1 ↵ , 1 ↵ . k k k2  k 0k2 {| n| | 1|} From this, it also follows that setting 2 ↵opt = 1 + n one optimizes (obtains the minimum) the value of ⇢ and that

n 1 2(A) 1 ⇢opt = ↵(↵opt)= = . n + 1 2(A)+1 Proof. Since A is HPD, we know that the eigenvalues of A are positive real numbers and = A . n k k2 Notice also that T = I ↵A = TH, which implies that the eigenvalues of R n R T are real. Observe that ( , w) is an eigen-pair of A i↵(⌫ := 1 ↵ , w) R i i i is an eigen-pair of TR. Assume that 0 <↵<2/n. Then

i 0 <i ↵<2 ,i=1, ,n, n ··· 172 CHAPTER 6. LINEAR ITERATIVE METHODS which implies

i 1 > 1 i ↵>1 2 1,i=1, ,n. n ··· It follows that

1 >⌫ ⌫ > 1,⌫=1 ↵ . 1 ··· n i i This guarantees that T = ⇢(T ) < 1, which implies convergence. Con- k Rk2 R versely, if ↵ (0, 2/ ), then ⇢(T ) 1, and the method does not converge 62 n R unconditionally. By consistency,

e = Tk e ⇢k e . k k+1k2 k R 0k2  k 0k2 Of course, it is easy to see that

⇢ = ⇢(T )=max ⌫ , ⌫ =max 1 ↵ , 1 ↵ . R {| 1| | n|} {| n| | 1|} Finally, showing optimality amounts to minimizing ⇢. See Figure 6.1. From this we see that the minimum of ⇢ is attained when

1 ↵ = 1 ↵ | 1| | n| or 1 ↵ = ↵ 1 n 1 which implies 2 ↵opt = . 1 + n Therefore

1 + n 21 n 1 2(A) 1 ⇢opt =1 ↵opt1 = = = . 1 + n 1 + n 2(A)+1

Remark 6.6.2 (optimal ↵). Notice that the rate of convergence for the optimal choice ↵ degrades as  (A) gets larger, since ⇢ 1. We will see that opt 2 opt ! this is a major problem for classical iterative methods. 6.6. RICHARDSON’S METHOD 173

⌫ 1

↵ 1 1 n 1

Figure 6.1: Minimizing the spectral radius ⇢ of the error transfer matrix for Richardson’s method. 174 CHAPTER 6. LINEAR ITERATIVE METHODS

6.7 Relaxation Methods

Let us consider one last classical method, namely the relaxation method. Recall that Gauß–Seidel reads

Lx k+1 + Dx k+1 + Ux k = b, where A = L+D+U with the usual assumptions and notation. We will weight the contribution of the diagonal D by introducing the parameter !>0:

1 1 Lx + ! Dx +(1 ! )Dx + Ux = b. k+1 k+1 k k In doing that we obtain the scheme

1 1 (L + ! D)x = (! 1)D U x + b. k+1 k If we choose !>1 the scheme is termed an over relaxation scheme; if 0 < !<1 the scheme is called an under relaxation scheme. From the previous identity we also find that the error transfer matrix is

1 1 1 T =(L + ! D) (! 1)D U ! and the iterator is 1 B! = L + ! D. n n Theorem 6.7.1. Let A C ⇥ have non-zero diagonal entries. A necessary 2 condition for unconditional convergence of the relaxation method is that ! 2 (0, 2).

n n Proof. Since A C ⇥ has non-zero diagonal entries, the relaxation method 2 is well defined. We know that a necessary and sucient condition for con- vergence is ⇢(T!) < 1. Since the eigenvalues are roots of the characteristic polynomial,

n ()=det(T I )=( 1)n ( ), T ! n i Yi=1 n then setting = 0 we obtain T (0) = det(T!)= i=1 i . Therefore, if det(T ) 1 this means that there must be at least one eigenvalue that | ! | Q satisfies 1 and the method cannot converge unconditionally. Conse- | i | quently, a necessary condition for unconditional convergence – convergence for any starting point – is that

det(T ) < 1. | ! | 6.7. RELAXATION METHODS 175

In the case of the relaxation method we have, 1 det((! 1)D U) det(T!)= 1 det(L + ! D) n (! 1 1)d = i=1 i,i n ! 1d Q i=1 i,i 1 n (!Q 1) = n ! n 1 n n = ! (! 1) =(1 !) . If ! (0, 2), then det(T ) 1. In other words, if ! (0, 2) the method 62 | ! | 62 cannot converge unconditionally, meaning ! (0, 2) is a necessary condition 2 for unconditional convergence.

Let us show that, in a particular case, this condition is also sucient.

n n Theorem 6.7.2 (convergence of the relaxation method). Let A C ⇥ be 2 HPD and ! (0, 2). Then the relaxation method converges. In particular, 2 the Gauß-Seidel method converges, since T!=1 = TGS. 1 Proof. Denote B! = L + ! D so that

1 1 T = B (! 1)D U ! ! and

1 1 I T = I B (! 1)D U n n ! 1 1 = B B (! 1)D + U ! ! 1 = B! A.

Now suppose that (, w) is an eigen-pair of T! set y := (I T)w =(1 )w. n The previous computation implies that B!y = Aw. For this reason,

1 1 1 (B A)y =(B A)B Aw =(B B A AB A)w ! ! ! ! ! ! 1 1 =(A AB A)w = A(I B A)w ! n ! = AT!w = Aw. Taking the inner product with y, and using the fact that A is Hermitian, we find (B y, y) =(Aw, y) =(1 ¯)(w, Aw) ! 2 2 2 176 CHAPTER 6. LINEAR ITERATIVE METHODS which implies, using the explicit form of B! that

1 (Ly, y) + ! (Dy, y )=(1 ¯)(w, Aw) . (6.7.1) 2 2 2 Similarly,

(y, (B A)y) = ¯(y, Aw) = ¯(1 )(w, Aw) , ! 2 2 2 which implies

1 (! 1)(Dy, y) (y, Uy) = ¯(1 )(w, Aw), (6.7.2) 2 2 Adding (6.7.1) and (6.7.2) – and observing that, since A is Hermitian (Ly, y)2 = (y, Uy)2 – we obtain

1 2 (2! 1)(Dy, y) =(1 )(w, Aw) . 2 | | 2 Recall that, since A is HPD so is its diagonal D. The expression on the left is positive, provided ! (0, 2). This means that 2 (2! 1 1)(Dy, y) 1 2 = 2 > 0, | | (w, Aw)2 which implies < 1. It follows that ⇢(T ) < 1. | | ! 6.8 Convergence in Energy Norm

In this section, we provide a powerful alternative method for proving conver- gence of some methods. This technique is called the energy method. Recall that, if A is HPD we can define the so-called energy norm of the matrix by

x 2 =(x, x) := (Ax, x) . k kA A 2 We will give sucient conditions for convergence in this norm. Before we do that, though, we need a technical result.

n n Lemma 6.8.1. Suppose that Q C ⇥ is positive definite in the sense that 2 n ((Qy, y) ) > 0, for all y C , < 2 2 ? but Q is not necessarily Hermitian. Then

n w := ((Qw, w) ), for all w C , k kQ < 2 2 q defines a norm. 6.8. CONVERGENCE IN ENERGY NORM 177

Proof. Suppose that Q is not Hermitian, to avoid the simple case. Then

Q = QH + QA, where 1 1 Q := Q + QH and Q := Q QH H 2 A 2 H are the Hermitian and anti-Hermitian parts, respectively. Observe that QH = Q and QH = Q . It follows that (Q y, y) is purely imaginary for any H A A A 2 y Cn,because 2 (Q y, y) = y HQ y = y HQHy = y HQ y = (Q y, y) . A 2 A A A A 2 Therefore, for all y Cn, 2 ? 0 < ((Qy, y) )= ((Q y, y) )+ ((Q y, y) )=(Q y, y) , < 2 < H 2 < A 2 H 2 since (QHy, y)2 is real. Therefore, QH is HPD. Since

n w := (QHw, w) , for all w C , k kQH 2 2 q defines a norm, the result follows.

Theorem 6.8.2 (convergence in energy). Let A be HPD. Suppose that B n n 2 C ⇥ is the invertible iterator describing a two-layer stationary linear iteration method. If Q := B 1 A is positive definite in the sense that 2 n ((Qy, y) ) > 0, y C , < 2 8 2 ? but is not necessarily Hermitian, then scheme (6.2.2) converges.

Proof. Define the error e := x x , as usual, and notice that for any sta- k k tionary two-layer linear iteration method we have

B (e e )+Ae = 0, k+1 k k which is equivalent to Bqk+1 + Aek = 0, where q := e e . Taking the inner product of this identity with q k+1 k+1 k k+1 and using the fact that

e = 1 (e + e ) 1 (e e ) k 2 k+1 k 2 k+1 k 178 CHAPTER 6. LINEAR ITERATIVE METHODS we obtain

0=(Bqk+1, qk+1)2 +(Aek , qk+1)2 1 =(Bq , q ) + (B A)q , q k+1 k+1 2 2 k+1 k+1 ✓ ◆2 1 1 + (Ae , e ) (Ae , e ) 2 k+1 k+1 2 2 k k 2 1 1 =(Qq , q ) + e 2 e 2 . k+1 k+1 2 2k k+1kA 2k k kA Consequently, 1 1 (Qq , q ) + e 2 = e 2 < k+1 k+1 2 2k k+1kA 2k k kA or, in other words, 1 1 e 2 + q 2 = e 2 . 2k k+1kA k k+1kQ 2k k kA This allows us to conclude that the sequence e is non-increasing. Since k k kA it is bounded below, by the monotone convergence theorem, it must have a limit, limk ek A = ↵, say. Passing to the limit in this identity then tells !1 k k us that q 0, as k . k k+1kQ ! !1 But, since A is invertible, 1 e = A Bq k k+1 and this implies e 0. Therefore, the method converges. k ! Let us apply this result to obtain, by di↵erent means, convergence of two methods that we have previously studied. Example 6.8.3 (convergence of Richardson). For Richardson’s method we have Q = 1 I 1 A, ↵>0,and ↵ n 2 1 1 1 A (Qx, x) = x 2 (Ax, x) x k k2 . 2 ↵k k2 2 2 k k2 ↵ 2 ✓ ◆ A sucient condition for Q to be HPD, and, therefore, for the method to converge, is that 2 0 <↵< . A k k2 But, since A is HPD, A = , the largest eigenvalue. If 0 <↵ < 2, k k2 n n Richardson’s method converges by Theorem 6.8.2. 6.8. CONVERGENCE IN ENERGY NORM 179

n n Example 6.8.4 (convergence of the relaxation method). Suppose A C ⇥ 2 is HPD. In the relaxation scheme, the iterator is

1 B = L + D, ! ! where A = L + D + U is the standard matrix splitting and !>0. Therefore

1 1 1 1 1 1 Q = B A = D + L (L + D + LH)= D + L LH , ! ! 2 ! 2 ! 2 2 ✓ ◆ and

1 1 1 ((Q x, x) )= (Dx, x) + ((L LH)x, x) < ! 2 ! 2 2 < 2 2 ✓ ◆ ✓ ◆ 1 1 = (Dx, x) ., ! 2 2 ✓ ◆ where we used the fact that 1 ((L LH) is anti-Hermitian. If 1 1 > 0 or, 2 ! 2 equivalently, !<2, Q! is positive definite – though not Hermitian – and the relaxation method converges.

We can state a theorem similar to the last that can be proven using energy methods.

n n n n Theorem 6.8.5. Suppose that A C ⇥ is Hermitian and B C ⇥ is non- 2 2 singular. Assume that Q = B + BH A is HPD. Then the two-layer stationary linear iteration scheme with error trans- fer matrix T = I B 1A converges unconditionally i↵ A is HPD. n

Proof. Let us prove one direction and save the other for an exercise.

( ): Suppose that A is HPD. Recall that defines a norm on Cn. The ( k·kA error equation is precisely

1 e = I B A e . k+1 k 180 CHAPTER 6. LINEAR ITERATIVE METHODS

Then

2 1 H 1 e = I B A e A I B A e k k+1kA k k H H 1 = e I AB A I B A e k k H H H 1 = e e AB Ae AB Ae k k k k H H H H 1 = e Ae e AB Ae e AB Ae k k k k k k H H 1 + ek AB AB Aek 2 H H 1 H 1 = e e A B + B B AB Ae k k kA k k 2 H H H 1 = e e AB B + B A B Ae k k kA k k 2 1 H H 1 = e B Ae B + B A B Ae . k k kA k k Since B + BH A is HPD and B 1Ae = 0, in general, it follows that k 6 2 1 2 2 e + B Ae = e , k k+1kA k Q k k kA and ek is a decreasing sequence. Therefore, by the Monotone Convergence k kA Theorem, e converges, that is, there is some ↵ [0, ), such that k k kA 2 1

lim ek A = ↵ = lim ek+1 A . k k k k k k !1 !1 This implies 1 lim B Aek =0, k Q !1 which, in turn, implies that e 0 as k . k ! !1 Example 6.8.6 (convergence of the relaxation method, yet again). Suppose n n A C ⇥ is HPD. We once again recall that the iterator matrix for the 2 relaxation method is 1 B = L + D, ! ! where A = L + D + U is the standard matrix splitting and !>0. Therefore

Q = B + BH A ! ! ! 2 = D + L + LH (L + D + LH) ! 2 = 1 D. ! ✓ ◆

If 0

6.9 A Special Matrix

We end this chapter on classical linear iterative methods by studying the prop- erties of a particular matrix that appears very often in applications. Consider (n 1) (n 1) the matrix A R ⇥ defined by 2 2 10 0 . ···. . 2 12 .. .. . 3 .. .. A = 6 0 . . 107 . (6.9.1) 6 7 6 . .. 7 6 . . 12 17 6 7 6 0 0 127 6 ··· 7 4 5 The matrix A is called the sti↵ness matrix because it appears in applications describing elastic media. Theorem 6.9.1 (properties of A). The matrix A, defined in (6.9.1), is SPD. In addition, if we define h := 1/n,thentheset

n 1 n 1 = w k R S { }k=1 ⇢ defined by [w ] = w = sin(k⇡ih),i=1,...,n 1, k i i,k is an orthogonal set of eigenvectors of A. Proof. A direct computation reveals that, for k =1,...,n 1, [Aw ] = sin(k⇡(i 1)h) + 2 sin(k⇡ih) sin(k⇡(i + 1)h) k i = 2 sin(k⇡ih) 2 cos(k⇡h) sin(k⇡ih) =(2 2 cos(k⇡h)) sin(k⇡ih) = 2(1 cos(k⇡h))w . i,k Hence the distinct eigenvalues are =2 2 cos(k⇡h). To see that these k are strictly positive for k =1,...,n 1, observe that cos(k⇡h) < 1. | | This is equivalent to 2 < 2 cos(k⇡h) < 2, which is equivalent to

0 < 2 2 cos(k⇡h) < 4. 182 CHAPTER 6. LINEAR ITERATIVE METHODS

As a consequence A is positive definite. Finally, since A is symmetric, the eigenvectors associated to distinct eigen- values are orthogonal.

Theorem 6.9.2 (convergence). Let A be the sti↵ness matrix defined in (6.9.1). (n 1) (n 1) Denote by TJ, TGS R ⇥ the error transfer matrices of the Jacobi, 2 and Gauß-Seidel methods, respectively. Then ⇢(TJ) < 1 and ⇢(TGS) < 1.

Proof. From the proof of Theorem 6.9.1 we observe that the distinct eigen- values of A are =2 2 cos(k⇡h), where h =1/n,fork =1,...,n 1. k This implies 0 <1 < <n 1 < 4. ··· In addition, we recall that the k-th eigenvector of A, associated with k ,can be defined as wi,k = sin (k⇡ih) . The error transfer matrix of the Jacobi method is

01 0 0 . ···. . 210 .. .. .3 1 1 1 .. .. TJ = D (D A)= (2In 1 A)= 60 . . 107 . 2 2 6 7 6. .. 7 6. . 1017 6 7 60 0107 6 ··· 7 4 5 Then 1 T w = 1 w . J k 2 k k ✓ ◆ In other words, the eigenvalues of T are µ = cos(k⇡h), k =1,...,n 1. J k Hence ⇢(TJ)=µ1 = µn 1 = cos(⇡h) < 1. By Theorem 6.5.5, since A is SPD tridiagonal,

2 2 ⇢(TGS)=⇢ (TJ) = cos (⇡h).

n n Theorem 6.9.3. Suppose that A C ⇥ is HPD and tridiagonal. Then 2 2 ⇢(TGS)=⇢ (TJ) < 1, 6.10. NON-STATIONARY TWO-LAYER METHODS 183 and the optimal choice for ! in the relaxation method is

2 !opt = . 1+ 1 ⇢(T ) GS p With this choice,

!opt 1=⇢(T!opt ) = min ⇢(T!). ! (0,2) 2

Proof. Exercise.

Example 6.9.4. The last theorem shows that, using the optimal choice for the relaxation scheme, this method can converge much faster than the Gauß- Seidel method. Suppose that A is the sti↵ness matrix defined in (6.9.1). Then, we proved that 2 2 ⇢(TGS)=⇢ (TJ)=cos (⇡h).

Suppose that n = 32,sothath =1/32. Then

⇢(T ) = cos2(⇡/32) (0.99518472)2 0.99039264 . GS ⇡ ⇡ But 2 !opt = 1.82146519 . 1+ 1 cos2(⇡/32) ⇡ Therefore, p ⇢(T )=! 1 0.82146519 . !opt opt ⇡ This might not seem like a big deal, but it is. The convergence rate of the relaxation method with the optimal parameter !opt is much better that that for Gauß-Seidel. Incidentally, the relaxation method with the optimal !opt is called the method of successive over-relaxation,orjustSOR.

6.10 Non-Stationary Two-Layer Methods

We now consider a class of non-stationary methods that are usually obtained from a minimization condition. 184 CHAPTER 6. LINEAR ITERATIVE METHODS

6.10.1 Cebyˇsevˇ iterations Let us consider a non-stationary explicit scheme

x k+1 x k + Ax k = b, ↵k+1 where ↵ > 0. This is a non-stationary method defined by setting B = 1 I . k k ↵k n It is like Richardson’s method, except that we will choose the parameter ↵k adaptively, in such a way that the quantity e = x x is minimized at k k k2 k k k2 each step.

n n Theorem 6.10.1 (convergence of Cebyˇsevˇ iterations). Let A C ⇥ be HPD n 2 with spectrum (A)= i , 0 <1 2 n. For a given m N, { }i=1  ··· 2 the quantity e is minimized if k mk2 ↵0 ↵k = ,k=1,...,m, 1+⇢0tk where 2  (A) 1 (2k 1)⇡ ↵ = ,⇢= 2 ,t= cos . 0 + 0  (A)+1 k 2m 1 n 2  In this case we have

m 2⇢1 2(A) 1 em 2 2m e0 2,⇢1 = . k k  1+⇢1 k k p2(A)+1 Proof. Let us merely sketch the proof. The errorp is governed by

e e + ↵ Ae = 0, k+1 k k+1 k so that

e =(I ↵ A)e = =(I ↵ A) (I ↵ A)e . k+1 n k+1 k ··· n k+1 ··· n 1 0 This implies that

em = Tme0, Tm := (In ↵mA)(In ↵m 1A) (In ↵1A). ··· Since A is Hermitian so is T and, therefore, T = ⇢(T ). It follows that m k mk2 m e ⇢(T ) e , k mk2  m k 0k2 and it suces to minimize ⇢(Tm). 6.10. NON-STATIONARY TWO-LAYER METHODS 185

Notice that ⌫ (T ) i↵ ⌫ = m (1 ↵ ), for some (A). Now, 2 m i=1 i 2 since A is HPD Q m n ⇢(Tm)=max (1 ↵j i ) max P (⇣) i=1  ⇣ [ ,n] | | j=1 2 1 Y with m P (⇣)= (1 ↵ ⇣). j Yj=1 Solving this problem is beyond the scope of our present discussion. For now, let us just say this is possible to do. The solution is given by shifted and rescaled roots of the Cebyˇsevˇ polynomials Tm, which will imply all the formulas given above. See Section 7.8 for a brief discussion.

The Cebyˇshevˇ iterations show that, if we have bounds for the spectrum of A, it is possible to make the error as small as possible after a fixed number of iterations. What can we do if such bounds are not known? This gives rise to the following methods.

6.10.2 Method of Minimal Residuals

Given an arbitrary guess x , the residual is defined by r = b Ax . Notice k k k that Ae = A(x x )=b Ax = r . k k k k Let us again consider the non-stationary scheme

x = x + ↵ (b Ax )=x + ↵ r k , k+1 k k+1 k k k+1 where we will choose the iteration parameter to minimize r .Letus k k+1k2 apply A to the scheme to get

k Ax k+1 = Ax + ↵k+1Ar k , which is equivalent to

r = b Ax = b Ax ↵ Ar k . k+1 k+1 k k+1 To sum up, r = T r , T = I ↵ A. k+1 k+1 k k+1 n k+1 186 CHAPTER 6. LINEAR ITERATIVE METHODS

Notice that Tk+1 has a familiar form. Computing the 2-norm of the residual, we find

r =(r ↵ Ar , r ↵ Ar ) k k+1k2 k k+1 k k k+1 k 2 = r 2 + ↵2 Ar 2 2↵ (Ar , r ) , k k k2 k+1k k k2 k+1 k k 2 which, being a positive quadratic in ↵k+1, is clearly minimized by setting

(Ar , r ) ↵ = k k 2 . k+1 Ar 2 k k k2

Thus we arrive at the following algorithm: Let x 0 be arbitrary, then

1. For k 0 • r = b Ax . k k ↵ = (Ar k ,r k )2 . • k+1 Ar 2 k k k2 • x = x + ↵ (b Ax ). k+1 k k+1 k 2. EndFor

This method will converge faster than Richardson’s method, as we show in the following:

Theorem 6.10.2 (convergence of minimal residuals). Let A be HPD with spectrum (A)= n , 0 < ,then { i }i=1 1  2 ··· n

k 2(A) 1 Aek 2 ⇢? Ae0 2,⇢? = . k k  k k 2(A)+1

Proof. Since r k+1 = Tk+1r k in the 2-norm is minimized by the given choice of ↵k+1,wemusthavethat

r 2 = T r Tr , k k+1k` k k+1 k k2 k k k2 where T = In ↵A and any choice of ↵ C. In particular, we can set 2 ↵ = ↵ = 2 to get the error transfer matrix of Richardson’s method with ? 1+n its optimal choice of parameter. In this case, T = ⇢ < 1. Since Ae = r , k k2 ? k k the result follows. 6.10. NON-STATIONARY TWO-LAYER METHODS 187

6.10.3 Method of Minimal Corrections Let us consider now an non-stationary, two-layer scheme of the form 1 S (x k+1 x k )+Ax k = b, ↵k+1

n n where ↵k+1 > 0andS C ⇥ is invertible. This conforms to our standard 2 non-stationary iteration framework by setting B = 1 S. We define the k+1 ↵k+1 1 correction to be w k = S r k and notice that

1 1 x = x + ↵ S (b Ax )=x + ↵ S r = x + ↵ w . k+1 k k+1 k k k+1 k k k+1 k Exercise 6.10.3. Show that the correction satisfies 1 S (w k+1 w k )+Aw k = 0. ↵k+1 Let us now assume that S is HPD so that it has an associated energy norm. The method of minimal corrections then chooses ↵k+1 so as to minimize w 2. k k+1kS n n Theorem 6.10.4 (convergence of minimal corrections). Let A, S C ⇥ be 2 HPD. The eigenvalues, µ, satisfying

1 n S A⇠ = µ⇠, ⇠ C , 9 2 ? are all positive real numbers. Let µ1 and µn be the minimal and maximal generalized eigenvalues. Then

1 k (S A) 1 1 µn Ae 1 ⇢ Ae 1 ,⇢= ,(S A):= . k S 0 0 S 0 1 k k  k k (S A)+1 µ1

Proof. Since S is HPD, the object (x, y)S =(x, Sy)2 defines an inner product, and x = (x, x) defines a norm. Since the matrix S 1A is self-adjoint k kS S and positive definite with respect to the inner product ( , )S, we leave it as p · ·1 an exercise to the reader to show that the eigenvalues of S A are all real and positive. Furthermore, we can define the square root S1/2. In particular, since S n n is HPD, there exist a unitary matrix U C ⇥ and a diagonal matrix D = 2 diag(⌫ , ,⌫ ), with positive real diagonal entries, such that 1 ··· n S = UDUH. 188 CHAPTER 6. LINEAR ITERATIVE METHODS

Consequently, we define D1/2 = diag(p⌫ , , p⌫ ), and set 1 ··· n S1/2 = UD1/2UH. Clearly S1/2 is HPD and S1/2S1/2 = S. Let us introduce then the change of 1/2 variables v k = S w k and notice that, from Exercise 6.10.3, it follows that 1 (v k+1 v k )+Cv k = 0, (6.10.1) ↵k+1 1/2 1/2 with C = S AS . Finally, note that we must choose ↵k+1 in order to minimize

w k+1 2 =(Sw ,w ) =(S1/2w , S1/2w ) = v 2. k kS k+1 k+1 2 k+1 k+1 2 k k+1k2 We can repeat some steps from the proof of Theorem 6.10.2 to obtain (Cv , v ) ↵ = k k 2 . k+1 Cv 2 k k k2 But observe the following identities

1/2 1/2 1/2 (Cv k , v k )2 =(S AS w k , S w k )2 =(Aw k , w k )2, 2 1/2 1/2 2 Cv k 2 =(S Aw k , S Aw k )2 = Aw k 1 , k k k kS 2 v k+1 2 =(Sw k+1, w k+1)2 = Aek+1 1 , k k k kS from which we obtain the desired error estimate.

6.11 Problems

n n 1. Let A C ⇥ . Define 2 S := I + A + + Ak . k n ··· (a) Prove that the sequence S 1 converges if and only if A is { k }k=0 convergent to zero. (b) Prove that if A is convergent to zero, then I A is non-singular and 1 lim Sk =(I A) . k !1 2. Show that if A < 1, for some induced matrix norm, then I A is k k non-singular and

1 1 1 (I A) . 1+ A   1 A k k k k 6.11. PROBLEMS 189

n n 3. Let A =[ai,j ] C ⇥ and define 2

A max := max ai,j . k k 1 i n | | 1jn   Prove

(a) A > 0 for all A = O,and A = 0 if and only if A = O; k kmax 6 k kmax n n n (b) ↵A = ↵ A for all ↵ C and all A C ⇥ ; k kmax | |k kmax 2 2 n n (c) A + B A + B for all A, B C ⇥ ; k kmax k kmax k kmax 2 n n (d) and, finally, A max A n A max for all A C ⇥ . k k k k1  k k 2 n n 4. Suppose A C ⇥ is diagonalizable. Prove that there exists an induced 2 matrix norm , the choice of which depends upon A,suchthat k·k? A = ⇢(A). k k? n n 5. Let A C ⇥ . Prove that A is convergent to zero if and only if 2 k n limk A x = 0 for any vector x C . !1 2 m m 1 6. Let A, B C ⇥ , with A and A BA B being invertible. Consider 2 solving the linear system

Ax + By = b1, Bx + Ay = b2,

for the unknowns x and y.

(a) Use the Schur complement to show that this system has a unique solution. 1 (b) Show that ⇢(A B) < 1 is a necessary and sucient condition for convergence of the iteration scheme:

Ax = b By , Ay = b Bx , k+1 1 k k+1 2 k with an arbitrary initial guess. (c) Consider the following slightly modified iteration scheme:

Ax = b By , Ay = b Bx , k+1 1 k k+1 2 k+1 Does the conclusion of part (a) still hold? Why or why not? 190 CHAPTER 6. LINEAR ITERATIVE METHODS

n n 7. Define A R ⇥ via 2 2 10 0 . ··· . 2 12.. . 3 .. .. A = 6 0 . . 107 . 6 7 6 . 7 6 . 121 7 6 7 6 0 0 127 6 ··· 7 4 5 (a) Find the eigenvalues and eigenvectors of A. (b) Suppose A = D + U + UT ,whereA is the diagonal part of A and U is the strict upper triangular part. Find the eigenvalues and eigenvectors of T = I D 1A, the error transfer matrix for the J n Jacobi method. (c) The damped Jacobi method has the error transfer matrix

1 T T(!)= !D (U + U )+(1 !)I , 0

so that T(1) = TJ. Write out the corresponding linear, stationary, two-layer scheme. n n (d) Find the eigenvalues and eigenvectors of T(!) C ⇥ . 2 th (e) Let µk (!)bethek eigenvalue of T(!), with the ordering

µ µ µ . 1  2 ··· n (You will observe in this problem that we may extend the definition of the eigenvalues in a natural way, so that µ0 =1andµn+1 = 1 2!.) Assume that n + 1 is even. Prove that the quantity

S(!):= max µk (!) n+1 k n+1 | | 2  

2 is minimized by ! = !0 = 3 ,and 1 µ (! ) , | k 0 |3

for all n+1 k n + 1. 2   6.11. PROBLEMS 191

n n n 8. Let A C ⇥ be HPD and b C . Consider solving Ax = b using the 2 2 linear iterative method given by

1 x = x + B (b Ax ) , k+1 k k n n H where B C ⇥ is invertible. Suppose that Q := B + B A is positive 2 definite. Let e := x x be the error of the kth iteration. Show that k k each step of this method reduces the A-norm of e ,whenevere = 0. k k 6 Recall, the A-norm of any y Rn is defined via 2 y := y T Ay. k kA p Prove that this linear iteration method converges.

n n n 9. Let A C ⇥ be Hermitian and b C . Consider solving Ax = b using 2 2 the linear iterative method

1 x = x + B (b Ax ) , k+1 k k n where B is invertible, and x 0 C is arbitrary. 2 (a) If A and B+BH A are HPD, prove that the method is convergent. (b) Conversely, suppose that B+BH A is HPD and the linear iterative method above is convergent. Prove that A is HPD. (c) Prove that the Gauss-Seidel method converges when A is HPD. (d) Prove that the Jacobi method converges when A and 2D A are HPD, where D is the diagonal part of A.

10. Suppose that A Cn is an upper triangular, nonsingular matrix and 2 b Cn. Show that both the Jacobi and Gauss-Seidel iteration methods 2 always converge when used to solve Ax = b, and, moreover, they will converge in finitely many steps.

11. Let X be a finite dimensional vector space. Assume that the mapping f : X X is such that there is an n N for which f n is a contraction. Show ! 2 that f has a unique fixed point. Notice that f n(x)=f (f (...f( x))).

n times 12. Let X be a finite dimensional vector space and f ,f| : {zX } X two 1 2 ! mappings that commute, that is f (f (x)) = f (f (x)), for all x X. 1 2 2 1 2 Show that if f2 has a unique fixed point x0,thenx0 is a fixed point of f1. 192 CHAPTER 6. LINEAR ITERATIVE METHODS

n n 13. Let A C ⇥ have the property that there is q (0, 1) for which 2 2 n q a > a , 1 i n. | i,i | | i,j |   j=1 Xj=i 6 Show that the Gauß–Seidel method converges and that we have the error estimate k ek q e0 . k k1  k k1 n n 14. Let A C ⇥ . Prove that 2 ⇢(A) = inf A , {k k} where the infimum is taken over all induced matrix norms. 15. Which norm axioms are satisfied by the spectral radius? Which ones are not? Provide examples.

100 100 16. Given the matrix A R ⇥ assume that its eigenvalues are known 2 via the formula = k2,k=1, , 100. k ··· The system Ax = b is to be solved by the following non-stationary Richardson method

x =(1 ⌧ A)x + ⌧ b, k+1 k k k for some positive parameters ⌧k . Find a particular set of iteration param- eters ⌧ , ,⌧ such that x = x, the exact solution of Ax = b. { 0 ··· 99} 100 Hint: Verify that e =(I ⌧ A)e , then expand the initial error k+1 n k k e = ↵ u + + ↵ u , 0 1 1 ··· 100 100 where the ui are the eigenvectors of A. Choose the iteration parameters to eliminate exactly one term of the expansion of e0.

n n 17. Show that if A C ⇥ is HPD, then the relaxation method converges 2 for 0

• The map is said to be consistent (with Ax = b) i↵the solution J x is a fixed point, i.e., x = (x,b). J • The map is linear i↵ J (↵x + y,↵f + g)=↵ (x,f)+ (y,g). J J J Show that any linear and consistent iterative scheme can be written in the form x = x k + B(b Ax ). k+1 k where B : X X is a linear operator. Hint: Define Bf = (0,f)and ! J (I BA)x = (x,0). J n n n 19. Suppose that A C ⇥ is HPD and b C . Consider Richardson 2 2 method with an optimal choice of iterative parameter. How many itera- tions are needed to reduce the error by a predetermined factor? In other words, given >0, how many iterations are needed to guarantee that

e e . k k k2  k 0k2

20. Suppose

1 1 1 3 A = 2 , A = 4 , 1 1 1 2 1 1  2  12

and let T1 and T2 be the associated error transfer matrices for Jacobi’s method for the respective coecient matrices. Show that ⇢(T1) > ⇢(T2), thereby refuting the claim that greater diagonal dominance im- plies faster convergence.

21. This problem deals with an acceleration procedure called extrapolation. Consider a linear stationary scheme

x k+1 = Tx k + c, (6.11.1)

which can be embedded in a one-parameter family of methods

x = (Tx + c)+(1 )x = T x + c, k+1 k k k with T = T +(1 )I and = 0. 6

(a) Show that any fixed point of T is a fixed point of T. 194 CHAPTER 6. LINEAR ITERATIVE METHODS

(b) Assume that we know that (T) [a, b]. Show that (T ) ⇢ ⇢ [a +1 ,b +1 ]. (c) Show that, from the previous item, it follows that

⇢(T) max +1 .  a b | |   (d) Show that if 1 / [a, b] then the choice ? =2/(2 a b) minimizes 2 ⇢(T) and that, in this case,

? ⇢(T ? )=1 d, | | where d is the distance between 1 and [a, b]. This shows that, using extrapolation, we can turn a non-convergent scheme into a convergent one. (e) Suppose A is HPD. Consider Richardson method with ↵ = 1. If (A) [m, M], show that choosing =2/(m + M) we obtain ⇢ M m ⇢(T )= , M + M thus Richardson method with extrapolation converges.

22. This problem deals with Cebyˇsevˇ acceleration. Consider, again, the linear stationary scheme (6.11.1) and assume that we have computed x , , x . We will compute a linear combination of them 0 ··· k k uk = ak,i x i , Xi=0

in the hopes that uk is a better approximation to x.

(a) Show that if x 0 = x then x k = x. This tells us that we want k i=0 ak,i = 1. Why? (b) ShowP that u x = p(T)e , k 0 where p(z)= k a z i . Consequently u x p(T) e . i=0 k,i k k kk kk 0k (c) Show that, if (T) [a, b], P ⇢ ⇢(p(T)) max p(z)  z [a,b] | | 2 6.11. PROBLEMS 195

(d) Thus, we need to find a polynomial of degree k such that p(1) = 1 (why?) and minimizes

max p(z) . z [a,b] | | 2 It can be shown that shifted Cebyˇsevˇ polynomials solve this problem and that, moreover, 2 u = [Tu + c]+(1 )u ,= 1 0 0 2 b a and

uk = ⇢k [(Tuk 1 + c)+(1 )uk 1]+(1 ⇢k )uk 2, 1 2 with ⇢1 = 2, ⇢k =(1 ↵⇢k 1) and ↵ = (2(2 b a)/(b a)) . 23. Show that, for the method of minimal corrections,

(Aw k , w k ) ↵k+1 = 1 . (B Aw k , w k ) 196 CHAPTER 6. LINEAR ITERATIVE METHODS