Linear Systems - Iterative Methods

Numerical Analysis Profs. Gianluigi Rozza-Luca Heltai

2019-SISSA mathLab Trieste

Linear Systems – Iterative Methods – p. 1/62 (Sec. 5.9 of the book) Solve the linear system Ax = b using an consists in building a series of vectors x(k), k 0, in Rn that converge at the exact solution x, i.e.: ≥ lim x(k) = x k→∞ for any initial vector x(0) Rn. ∈ We can consider the following recurrence relation:

x(k+1) = Bx(k) + g, k 0 (1) ≥ where B is a well chosen matrix (depending on A) and g is a vector (that depends on A and b), satisfying the relation (of consistence)

x = Bx + g. (2)

Linear Systems – Iterative Methods – p. 2/62 Given x = A−1b, we get g = (I B)A−1b; the iterative method is therefore − completely defined by the matrix B, known as iteration matrix. By defining the error at step k as

e(k) = x x(k), − we obtain the following recurrence relation:

e(k+1) = Be(k) and thus e(k+1) = Bk+1e(0), k = 0, 1,....

Linear Systems – Iterative Methods – p. 3/62 (k) (0) (0) We can show that limk→∞ e = 0 for all e (and thus for all x ) if and only if ρ(B) < 1, or ρ(B) is the spectral radius of the matrix B, defined by

ρ(B) = max λi(B) | | and λi(B) are the eigenvalues of the matrix B. The smaller the value of ρ(B), the less iterations are needed to reduce the initial error of a given factor.

Linear Systems – Iterative Methods – p. 4/62 Construction of an iterative method

A general way of setting up an iterative method is based on the decomposition of the matrix A: A = P (P A) − − where P in an called of A. Hence, Ax = b P x = (P A)x + b ⇔ − which is of the form (2) leaving

B = P −1(P A) = I P −1A and g = P −1b. − −

Linear Systems – Iterative Methods – p. 5/62 We can define the corresponding iterative method

P (x(k+1) x(k)) = r(k) k 0 − ≥ where r(k) represents the residual at the iteration k: r(k) = b Ax(k) − We can generalise this method as follows:

(k+1) (k) (k) P (x x ) = αkr k 0 (3) − ≥

(k) where αk = 0 is a parameter that improves the convergence of the series x . The method (3) is called Richardson’s method. The matrix P has to be chosen in such a way that renders the cost of solving (3) small enough. For example a diagonal or triangular P matrix would comply with this criterion.

Linear Systems – Iterative Methods – p. 6/62

If the elements of the diagonal of A are non-zero, we can write

P = D = diag(a11, a22, . . . , ann)

D with the diagonal part of A being:

0 si i = j Dij = aij if i = j.

The Jacobi method corresponds to this choice with αk = 1 for all k. We deduce: Dx(k+1) = b (A D)x(k) k 0. − − ≥

Linear Systems – Iterative Methods – p. 7/62 By components:

n (k+1) 1 (k) xi = bi aij xj , i = 1, . . . , n. (4) aii  −  j=1,j=i   The Jacobi method can be written under the general form

x(k+1) = Bx(k) + g, with −1 −1 −1 B = BJ = D (D A) = I D A, g = gJ = D b. − −

Linear Systems – Iterative Methods – p. 8/62 Gauss-Seidel method

This method is defined as follows:

i−1 n (k+1) 1 (k+1) (k) xi = bi aij xj aij xj , i = 1, . . . , n. aii − − j=1 j=i+1

This method corresponds to (1) with P = D E and αk = 1 ( k 0) where E − ∀ ≥ is the lower

Eij = aij if i > j − Eij = 0 if i j ≤

(lower triangular part of A without the diagonal and with its elements’ sign inverted).

Linear Systems – Iterative Methods – p. 9/62 We can write this method under the form (3), with the iteration matrix B = BGS given by −1 BGS = (D E) (D E A) − − − and −1 gGS = (D E) b. −

Linear Systems – Iterative Methods – p. 10/62 Example 1. Given the matrix

1 2 3 4 5 6 7 8 A =   . 9 10 11 12   13 14 15 16     We have then 1 0 0 0 0 6 0 0 D =   ; 0 0 11 0   0 0 0 16     Thus, the iteration matrix for the Jacobi method is

0 2 3 4 − − − −1 −1 5/6 0 7/6 4/3 BJ = D (D A) = I D A =  − − −  . − − 9/11 10/11 0 12/11  − − −   13/16 14/16 15/16 0  − − −   Linear Systems – Iterative Methods – p. 11/62 For defining the matrix A and extracting its diagonal D and its lower triangular part E (without the diagonal and the sign inverted) with Matlab/Octave, we use the commands

>> A = [1,2,3,4;5,6,7,8;9,10,11,12;13,14,15,16]; >> D = diag(diag(A)); >> E = - tril(A,-1);

These allow us, for exemple, to compute the iteration matrix BGS for the Gauss-Seidel method in the following way:

>> B_GS = (D-E)\(D-E-A);

We find: 0.0000 2.0000 3.0000 4.000 − − − 0.0000 1.6667 1.3333 2.0000 BGS =   . 0.0000 0.1212 1.2424 0.3636   0.0000 0.0530 0.1061 1.1591     

Linear Systems – Iterative Methods – p. 12/62 Convergence

We have the following convergence results: • (Prop 5.3) If the matrix A is strictly diagonally dominant by row, i.e.,

aii > aij , i = 1, . . . , n, | | | | j=1,...,n;j=i then the Jacobi and the Gauss-Seidel methods converge. • If A is symmetric positive definite, then the Gauss-Seidel method converges (Jacobi maybe not). • (Prop 5.4) Let A be a tridiagonal non-singular matrix whose diagonal elements are all non-null. Then the Jacobi and the Gauss-Seidel methods are either both divergent or both convergent. In the latter case, 2 ρ(BGS ) = ρ(BJ ) .

Linear Systems – Iterative Methods – p. 13/62 Richardson method

(Sec. 5.10) Let consider the following iterative method:

(k+1) (k) (k) P (x x ) = αkr , k 0. (5) − ≥

If αk = α (a constant) this method is called stationary preconditioned Richardson method; otherwise dynamic preconditioned Richardson method when αk varies during the iterations. The matrix P is called preconditioner of A.

Linear Systems – Iterative Methods – p. 14/62 If A and P are symmetric positive definite, then there are two optimal criteria to choose αk: 1. Stationary case:

2 αk = αopt = , k 0, λmin + λmax ≥

where λmin and λmax represent the smaller and the larger eigenvalue of the matrix P −1A. 2. Dynamic case: (z(k))T r(k) αk = , k 0, (z(k))T Az(k) ≥

where z(k) = P −1r(k) is the preconditioned residual. This method is also called preconditioned .

Linear Systems – Iterative Methods – p. 15/62 If P = I and A is symmetric definite positive, we get the following methods: • the Stationary Richardson if we choose:

2 αk = αopt = . (6) λmin(A) + λmax(A) • the Gradient method if :

(r(k))T r(k) αk = , k 0. (7) (r(k))T Ar(k) ≥

Linear Systems – Iterative Methods – p. 16/62 The gradient method can be written as: Let x(0) be given, set r(0) = b Ax(0), then for k 0, − ≥ P z(k) = r(k) (z(k))T r(k) α = k (z(k))T Az(k) (k+1) (k) (k) x = x + αkz (k+1) (k) (k) r = r αkAz . −

We have to apply once A and inverse P at each iteration. P should then be such that the resolution of the associated system results easy (i.e. it requires a reasonable amount of computing cost). For example, we can choose a diagonal P (Like in the gradient or the stationary Richardson cases) or triangular.

Linear Systems – Iterative Methods – p. 17/62 Convergence of Richardson method

When A and P are s.p.d. and with the two optimal choices for α, we can show that the preconditioned Richardson Method converges to x when k , and → ∞ that −1 k (k) K(P A) 1 (0) x x A − x x A, k 0, (8) − ≤ K(P −1A) + 1 − ≥

T −1 −1 where v A = √v Av and K(P A) is the condition number of P A. Remark If A et P are s.p.d., we have that

λ K(P −1A) = max . λmin

Linear Systems – Iterative Methods – p. 18/62 Demonstration The iteration matrix of the method is given by −1 Rα = I αP A, where the eigenvalues of Rα are of the form 1 αλi. The − − method is convergent if and only if 1 αλi < 1 for i = 1, . . . , n, therefore | − | 1 < 1 αλi < 1 for i = 1, . . . , n. As α > 0, this is the equivalent to − − 1 < 1 αλmax, from where the necessary and sufficient condition for − − convergence remains α < 2/λmax. Consequently, ρ(Rα) is minimal if 1 αλmin = αλmax 1, i.e., for αopt = 2/(λmin + λmax). By substitution, we − − obtain

2λmin λmax λmin ρopt = ρ(Ropt) = 1 αoptλmin = 1 = − − − λmin + λmax λmin + λmax what allows us to complete the proof. 

Linear Systems – Iterative Methods – p. 19/62 In the dynamic case, we get a result that allows us to optimally choose the iteration parameters at each step, if the matrix A is symmetric definite positive: Theorem 1 (Dynamic case). If A is symmetric definite positive, the optimal choice for αk is given by

(r(k), z(k)) αk = , k 0 (9) (Az(k), z(k)) ≥ where z(k) = P −1r(k). (10)

Linear Systems – Iterative Methods – p. 20/62 Demonstration On the one hand we have

r(k) = b Ax(k) = A(x x(k)) = Ae(k), (11) − − − and thus, using (10), P −1Ae(k) = z(k), (12) − where e(k) represents the error at the step k. On the other hand

e(k+1) = e(k+1)(α) = (I αP −1A) e(k). − Rα

Linear Systems – Iterative Methods – p. 21/62 We notice that, in order to update the residual, we have the relation

r(k+1) = r(k) αAz(k) = r(k) αAP −1r(k). − −

Thus, expressing as A the vector norm associated to the scalar product 1/2 (x, y)A = (Ax, y), what means, x A = (Ax, x) we can write e(k+1) 2 = (Ae(k+1), e(k+1)) = (r(k+1), e(k+1)) A − = (r(k) αAP −1r(k), e(k) αP −1Ae(k)) − − − = (r(k), e(k)) + α[(r(k),P −1Ae(k)) + (Az(k), e(k))] − α2(Az(k),P −1Ae(k)) −

Linear Systems – Iterative Methods – p. 22/62 (k+1) Now we choose α as the αk that minimises e (α) A :

d (k+1) e (α) A = 0 dα α=αk We then obtain

1 (r(k),P −1Ae(k)) + (Az(k), e(k)) 1 (r(k), z(k)) + (Az(k), e(k)) αk = = − 2 (Az(k),P −1Ae(k)) 2 (Az(k), z(k)) − and using the equality (Az(k), e(k)) = (z(k),Ae(k)) knowing that A is symmetric definite positive, and noting that Ae(k) = r(k), we find − (r(k), z(k)) α = k (Az(k), z(k))



Linear Systems – Iterative Methods – p. 23/62 For the stationary case and for the dynamic one we can prove that, if A and P are symmetric definite positive, the series x(k) given by the Richardson { } method (stationary and dynamic) converges towards x when k , and → ∞

−1 k (k) K(P A) 1 (0) x x A − x x A, k 0, (13) − ≤ K(P −1A) + 1 − ≥

T −1 −1 where v A = √v Av and K(P A) is the conditioning of the matrix P A.

Linear Systems – Iterative Methods – p. 24/62 Remark. In the case of the gradient method or the Richardson stationary method the error estimation becomes

k (k) K(A) 1 (0) x x A − x x A, k 0. (14) − ≤ K(A) + 1 − ≥

Remark. If A and P are symmetric definite positive, we have

−1 −1 λmax(P A) K(P A) = −1 . λmin(P A)

Linear Systems – Iterative Methods – p. 25/62 The conjugate gradient method

(Sec. 5.11) When A and P are s.p.d, there exists a very efficient and effective method to iteratively solve the system: the conjugate gradient method Let x(0) be given; we compute r(0) = b Ax(0), z(0) = P −1r(0), − p(0) = z(0),then for k 0, ≥

T p(k) r(k) αk = p(k)T Ap(k) (k+1) (k) (k) x = x + αkp (k+1) (k) (k) r = r αkAp − P z(k+1) = r(k+1) T (Ap(k)) z(k+1) βk = (Ap(k))T p(k) (k+1) (k+1) (k) p = z βkp . −

Linear Systems – Iterative Methods – p. 26/62 The error estimate is given by

k −1 (k) 2c (0) K2(P A) 1 x x A x x A , k 0 où c = − . 2k −1 − ≤ 1 + c − ≥ K2(P A) + 1 (15)

1.4

1.2

1

0.8

0.6 G GC 0.4

0.2

0

−0.2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Linear Systems – Iterative Methods – p. 27/62 Example 2. Let consider the following linear system:

2x + x = 1 1 2 (16) x1 + 3x2 = 0

2 1 whose matrix is A = is s.p.d. The solution to this system is 1 3 x1 = 3/5 = 0.6 et x2 = 1/5 = 0.2. − −

Linear Systems – Iterative Methods – p. 28/62 Preliminary convergence studies

• A is strictly diagonal dominant by row. Hence Jacobi and Gauss-Seidel methods converge. • A is regular, tridiagonal with non-zero diagonal elements. Then 2 ρ(BGS ) = ρ(BJ ) . Therefore we expect a quicker convergence of Gauss-Seidel w.r.t. Jacobi. • A is s.p.d., hence the gradient and the conjugate gradient methods converge. Moreover (see error estimates), the CG shall converge faster.

Linear Systems – Iterative Methods – p. 29/62 We want to approximate the solution with an iterative method starting with

(0) 1 (0) x x = 1 = 1 . x(0)   2 2   We can see that 3 r(0) = b Ax(0) = − 2 − 5 − 2 and √ (0) (0) T (0) 34 r 2 = (r ) r = 2.9155. 2 ≈

Linear Systems – Iterative Methods – p. 30/62 Jacobi method

(k+1) (k) −1 −1 x = BJ x + gJ , k 0, where BJ = I D A and gJ = D b. ≥ − We have

1 1 1 0 2 0 2 1 0 2 BJ = = − 0 1 − 0 1 1 3 1 0 3 − 3 1 1 2 0 1 2 gJ = 1 = 0 3 0 0 and ρ(BJ ) = max λi(BJ ) = max(abs(eig(BJ ))) = 0.4082. | |

Linear Systems – Iterative Methods – p. 31/62 For k = 0 (first iteration) we find:

1 1 1 (1) (0) 0 2 1 2 4 0.25 x = BJ x + gJ = − + = . 1 0 1 0 1 ≈ 0.3333 − 3 2 − 3 − Notice that

(1) (1) 0.8333 (1) r = b Ax = and r 2 = 1.1211. − 0.75

Linear Systems – Iterative Methods – p. 32/62 Gauss-Seidel method

(k+1) (k) −1 x = BGS x + gGS , k 0, where BGS = (D E) (D E A) ≥ − − −

−1 and gGS = (D E) b. − We have

−1 1 1 2 0 0 1 2 0 0 1 0 2 BGS = − = − = − 1 3 0 0 1 1 0 0 0 1 − 6 3 6 1 1 2 0 1 2 gGS = = 1 1 0 1 − 6 3 − 6

In this case ρ(BGS ) = max λi(BGS ) = max(abs(eig(BGS ))) = 0.1667. | 2| We can verify that ρ(BGS ) = ρ(BJ ) .

Linear Systems – Iterative Methods – p. 33/62 For k = 0 (first iteration) we find:

1 1 1 (1) (0) 0 2 1 2 4 0.25 x = BGS x + gGS = − + = . 0 1 1 1 1 ≈ 0.0833 6 2 − 6 − 12 − We have

(1) (1) 0.5833 (1) r = b Ax = and r 2 = 0.5833. − 0

Linear Systems – Iterative Methods – p. 34/62 Preconditioned gradient method with P = D 1 2 1 1 3 We set r(0) = b Ax(0) = = − 2 . − 0 − 1 3 1 5 2 − 2 For k = 0, we have:

3 P z(0) = r(0) z(0) = P −1r(0) = − 4 ⇔ 5 − 6 (z(0))T r(0) 77 α = = 0 (z(0))T Az(0) 107

(1) (0) (0) 0.4603 x = x + α0z = 0.0997 −

(1) (0) (0) 0.1791 (1) r = r α0Az = and r 2 = 0.2410. − 0.1612 −

Linear Systems – Iterative Methods – p. 35/62 Conjugated preconditioned gradient method with P = D We set r(0) = b Ax(0), z(0) = P −1r(0) and p(0) = z(0). For k = 0, we have: − (p(0))T r(0) (z(0))T r(0) α = = 0 (p(0))T Ap(0) (z(0))T Az(0) (1) (0) (0) (0) (0) x = x + α0p = x + α0z (1) (0) (0) (0) (0) r = r α0Ap = r α0Az . − − We see that the first iteration x(1) matches with the one obtained by the preconditioned gradient method.

Linear Systems – Iterative Methods – p. 36/62 We then complete the first iteration of the preconditioned conjugate gradient method:

0.0896 P z(1) = r(1) z(1) = P −1r(1) = ⇔ 0.0537 − (Ap(0))T z(1) (Az(0))T z(1) β0 = = = 0.0077 (Ap(0))T Ap(0) (Az(0))T z(0) −

(1) (1) (0) (1) (0) 0.0838 p = z β0p = z β0z = . − − 0.0602 −

Linear Systems – Iterative Methods – p. 37/62 At the second iteration, with the four different methods, we have: (2) (2) (2) Method x r r 2 0.6667 0.2500 Jacobi − 0.4859 0.0833 0.4167 − −

0.5417 0.0972 Gauss-Seidel 0.0972 0.1806 0 −

0.6070 0.0263 PG − 0.0511 0.1877 0.0438 − −

0.60000 0.2220 PCG − 10−15 4.4755 10−16 0.2000 0.3886 − −

Linear Systems – Iterative Methods – p. 38/62 Behavior of the relative error applied to the system (16) :

0 10 Jacobi Gauss−Seidel Grad. Grad. conj. −5 10 −x|| (0) −10 10 −x||/||x (k) ||x

−15 10

−20 10 0 5 10 15 20 k

Linear Systems – Iterative Methods – p. 39/62 Example 3. Let now consider another example:

2x + x = 1 1 2 (17) x1 + 3x2 = 0 − whose solution is x1 = 3/7, x2 = 1/7.

Linear Systems – Iterative Methods – p. 40/62 Preliminary convergence studies 2 1 The associatd matrix is A = . 1 3 − • A is strictly diagonal dominant by row. Hence Jacobi and Gauss-Seidel methods converge. • A is regular, tridiagonal with non-zero diagonal elements. Then 2 ρ(BGS ) = ρ(BJ ) . Therefore we aspect a quicker convergence of Gauss-Seidel w.r.t. Jacobi. • A is not s.p.d., therefore we have no idea if the gradient or the conjugate gradient converge.

Linear Systems – Iterative Methods – p. 41/62 We approximate the solution with an iterative method starting from

(0) 1 (0) x x = 1 = 1 . x(0)   2 2   x(k) x The following figure shows the value of − for the Jacobi, x(0) x − Gauss-Seidel, Richardson stationary (preconditioned with α = 0.5 and 2 0 P = D = ), and the preconditioned (with P = D) conjugate gradient 0 3 methods. Remark that this time the preconditioned conjugate gradient method doesn’t converge.

Linear Systems – Iterative Methods – p. 42/62 Behavior of the relative error applied to the system (17) :

0 10

−5 10 −x|| (0) −x||/||x (k)

||x −10 10

Jacobi Gauss−Seidel Richard. (stat.)

−15 Grad. conj. 10 0 5 10 15 20 k

Linear Systems – Iterative Methods – p. 43/62 Convergence Criteria

(Sec. 5.12) We have the following error bound: If A is s.p.d, then x(k) x r(k) − K(A) . (18) x ≤ b The relative error at the iteration k is bounded by the condition number of A times the residual scaled with the right hand side. We can also use another relation in case of a preconditioned system:

x(k) x P −1r(k) − K(P −1A) . x ≤ P −1b

Linear Systems – Iterative Methods – p. 44/62 Some examples of convergence of iterative methods applied to linear systems of the kind Ax = b.

Example 4. Lets start with the matrix

5 7 A = (19) 7 10

The condition number of this matrix is K(A) 223. We consider the ≈ Gauss-Seidel method and stationary preconditioned Richardson method with α = 0.5 and P = diag(A). We have that ρ(BGS ) = 0.98 and ρ(BRich) = 0.98, −1 where BRich = I αP A is the iterative matrix for the stationary Richardson method.−

Linear Systems – Iterative Methods – p. 45/62 This figure shows the relative error behavior for Gauss-Seidel method and stationary preconditioned Richardson method and the cojugate gradient preconditioned with the same matrix

5 10 Gauss−Seidel Richardson (stat., α=0.5) Grad. conj.

0 10

−5 10 − x || / || x || (k) || x −10 10

−15 10 0 20 40 60 80 100

k

Linear Systems – Iterative Methods – p. 46/62 Comparison between error and residual (Gauss-Seidel) (19). Recall that K(A) ≈ 223

Comparison between error and residual (Gauss−Seidel) for an ill−conditioned matrix (K(A)=223) 2 10 ||x(k)−x||/||x|| ||r(k)||/||b||

1 10

0 10 Ratio = 150 (approx.)

−1 10

−2 10 0 20 40 60 80 100 k

Linear Systems – Iterative Methods – p. 47/62 We can compare the previous figure with the same curves for well conditioned matrix 2 × 2 (K(A) ≈ 2.6):

Comparison between error and residual (Gauss−Siedel) for a well−conditioned 2 x 2 matrix (K(A)=2.6) 2 10

0 10

−2 (k) 10 ||x −x||/||x|| ||r(k)||/||b||

−4 10

−6 10

−8 10

−10 10

−12 10

−14 10

−16 10 0 2 4 6 8 10 12 14 16 18 20 k Comparison between the relative error and the residual, for a well conditioned matrix.

Linear Systems – Iterative Methods – p. 48/62 Example 5. We take the Hilbert matrix and we solve a linear system for different size n. We set P =diag(A) and use the preconditioned gradient method. The stoping tolerance is set to 10−6 on the relative residual. We take x(0) = 0. We evaluate the relative error

for j=2:7; n=2*j; i=j−1; nn(i)=n; A=hilb(n); x_ex=ones(n,1); b=A*x_ex; Acond(i)=cond(A); tol=1.e−6; maxit=10000; R=diag(diag(A)); x0=zeros(n,1); [x,iter_gr(i)]=gradient(A,b,x0,maxit,tol,R); error_gr(i)=norm(x−x_ex)/norm(x_ex); end

Linear Systems – Iterative Methods – p. 49/62 For the iterative method, we set a tolerance of 10−6 on the relative residual. Because the matrix is ill conditioned, it is to be expected to get a relative error in the solution greater than 10−6 (see inequality (18)). Gradient method n K(A) Error Iterations Residual 4 1.55e+04 8.72e-03 995 1.00e-06 6 1.50e+07 3.60e-03 1813 9.99e-07 8 1.53e+10 6.30e-03 1089 9.96e-07 10 1.60e+13 7.99e-03 875 9.99e-07 12 1.67e+16 5.09e-03 1355 9.99e-07 14 2.04e+17 3.91e-03 1379 9.98e-07

Linear Systems – Iterative Methods – p. 50/62 Some observations

Linear Systems – Iterative Methods – p. 51/62 Memory and computational costs

Computational cost (flops) and used memory (bytes): we consider the Cholesky an the conjugate gradient methods for sparse matrices of size n (originated in the approximation of solutions to the Poisson equation in the finite elements method) with m non zero elements.

Cholesky Conjugate gradient flops(Chol.)/ Mem(Chol.)/ n m/n2 flops Memory flops Memory flops(GC) Mem(GC) 47 0.12 8.05e+03 464 1.26e+04 228 0.64 2.04 83 0.07 3.96e+04 1406 3.03e+04 533 1.31 2.64 150 0.04 2.01e+05 4235 8.86e+04 1245 2.26 3.4 225 0.03 6.39e+05 9260 1.95e+05 2073 3.27 4.47 329 0.02 1.74e+06 17974 3.39e+05 3330 5.15 5.39 424 0.02 3.78e+06 30815 5.49e+05 4513 6.88 6.83 530 0.01 8.31e+06 50785 8.61e+05 5981 9.65 8.49 661 0.01 1.19e+07 68468 1.11e+06 7421 10.66 9.23

Linear Systems – Iterative Methods – p. 52/62 Iterative methods: Jacobi, Gauss-Seidel

Behaviour of the error for a well conditioned matrix (K = 20)

1 10 Jacobi

0 Gauss−Seidel 10

−1 10

−2 10 residual −3 10

−4 10

−5 10

−6 10 0 20 40 60 80 100 Iterations

Linear Systems – Iterative Methods – p. 53/62 Iterative methods: G-S, Gradient, Gradient Conjugate

Number of iterations as function of the size n of a stiffness matrix, tolerance 10−6

4 10 7334

1896 3615

3 10 940

2

k 10

13 25

1 10 Gradient Gauss−Seidel Gradient Conjugué

0 10 5 10 15 20 25 30 35 40 45 50 n

Linear Systems – Iterative Methods – p. 54/62 Preconditioning

The convergence conjugate gradient and the preconditioned conjugate gradient methods with a diagonal preconditioner (K = 4 108)

2 10 CG PCG, diagonal preconditioner

0 10

−2 10

−4 10 relative residual

−6 10

−8 10 1 1.5 2 2.5 3 3.5 4 4.5 5 iterations

Linear Systems – Iterative Methods – p. 55/62 Preconditioning

Role of the preconditioning over the condition number of a matrix (approximation by the finite element method of the Laplacian operator)

PCG iterations 0 10 A P−1A 1 −2 P−1A 10 2 P−1A 3

−4 10

−6 10

Residue −8 10

−10 10

−12 10

−14 10 0 10 20 30 40 50 60 Iterations

Linear Systems – Iterative Methods – p. 56/62 Preconditioning

The convergence of the conjugate gradient method and the preconditioned conjugate gradient method with a preconditioner based on an Cholesky incomplete decomposition for a originated from the finite element method (K = 1.5 103)

4 10 IC PCG CG

2 10

0 10 Relative residual −2 10

−4 10 0 20 40 60 80 100 120 140 Iterations

Linear Systems – Iterative Methods – p. 57/62 Pre-conditioning

Comparison between the non zero elements of the sparse matrix A from the previous example, its Cholesky factor R and the matrix R˜ obtained by the Cholesky incomplete decomposition:

A R R˜

Linear Systems – Iterative Methods – p. 58/62 Non-linear systems

Example 6. Lets consider the following system of non-linear equations:

2 x1 2x1x2 = 2 − 2 (20) x1 + x = 1. 2 − This system can be written in the form:

f (x , x ) = 0 f(x) = 0 i.e. 1 1 2 f2(x1, x2) = 0

2 where f = (f1, f2), f1(x1, x2) = x1 2x1x2 2 and 2 − − f2(x1, x2) = x1 + x2 + 1.

Linear Systems – Iterative Methods – p. 59/62 4

3 f (x ,x ) = 0 2 1 2 f (x ,x ) = 0 2 1 1 2

1 α 2

x 0 β −1

−2

−3

−4 −6 −5 −4 −3 −2 −1 0 1 2 x 1

Curves f1 = 0 and f2 = 0 in the square 6 x1 2, 4 x2 4 − ≤ ≤ − ≤ ≤

Linear Systems – Iterative Methods – p. 60/62 We want to generalise the Newton method for the case of non-linear systems. To do this, we define the Jacobian matrix of the vector f:

∂f1 ∂f1 ∂x1 ∂x2 2x1 2x2 2x1 Jf (x = (x1, x2)) =   = − − . ∂f2 ∂f2 1 2x2  ∂x1 ∂x2    (k) If Jf (x ) is invertible, the Newton method for non linear systems is (0) (0) (0) written : Let x = (x1 , x2 ), we compute for k = 0, 1, 2,...

(k+1) (k) (k) −1 (k) x = x [Jf (x )] f(x ), k = 0, 1, 2 .... (21) −

We can write (21) as

(k) (k+1) (k) (k) [Jf (x )](x x ) = f(x ), k = 0, 1, 2 .... (22) − −

Linear Systems – Iterative Methods – p. 61/62 Description of the first step of the : given the vector x(0) = [1, 1]T . We calculate:

(0) 0 2 Jf (x ) = − . 1 2

We determine x(1) as solution of the equation :

(0) (1) (0) (0) [Jf (x )](x x ) = f(x ). − − And we continue with (22).

Linear Systems – Iterative Methods – p. 62/62