<<

Chapter 3

Methods for the Solution of Linear Systems Deriving from Elliptic PDEs

A finite volume discretisation of the elliptic PDEs (Partial Differential Equations) used to describe a fluid mechanics problem generates a large sparse set of linear equations. Typically a CFD algorithm will involve the repetitive solution of a Poisson pressure equation along with scalar transport equations for momentum, enthalpy, concentration, and any other fields of interest. Normally the program will spend most of its execution time in solving these linearised equations, and so the efficiency of the linear solvers underpins the efficiency of the solution method as a whole. Therefore a crucial aspect of any efficient solution of a fluid mechanics problem is the speed that these linear(ised) equations can be solved.

In this chapter a number of different algorithms for the solution of linear equations are discussed and their resource use is compared, both in terms of speed and memory use. Whilst there are many papers comparing two or three linear solvers, comparisons of several classes of linear solver are rare in the literature. Ferziger and Peric’´ s book[43] compares a number of methods, but the comparisons are made in terms of number of iterations to converge instead of time to converge, a rather meaningless measure due to the variation in computation effort per iteration amongst the solvers. Botta et al[14] compare a number of methods, however the solvers were written by different groups and so there is the possibility that some of the variation in performance is due to different coding standards. In both cases the number of methods covered is less than in the current study.

This chapter is divided into two parts; the first describes the linear solvers, whilst the second discusses their suitability for the solution of the equations that result from a finite volume discretisation of elliptic PDEs.

3.1 A Description of the Linear Solvers

Methods for solving linear equations can be divided into two classes, direct methods (i.e., those which execute in a predetermined number of operations) and iterative methods (i.e., those which attempt to converge to the desired answer in an unknown number of repeated steps). Direct methods are often used for small dense problems, but for the large sparse problems which are typically encountered in the solution of PDEs, iterative methods are usually more efficient.

The direct methods which are discussed here are Gauss Jordan elimination, LU (Lower Upper) fac- torisation, Cholesky factorisation, LDL (Lower Diagonal Lower ) decomposition, Tridiagonal and Block Tridiagonal methods.

52 CHAPTER 3. LINEAR SOLVERS 53

Four classes of iterative solver are also discussed;

¡ simple iterative methods, such as Jacobi iteration, SOR (Successive Over Relaxation), Red- Black SOR, and SSOR (Symmetric Successive Over Relaxation).

¡ incomplete factorisation schemes, such as Incomplete Cholesky (IC), ILU (Incomplete LU fac- torisation), SIP (Strongly Implicit Procedure, also known as Stone’s method), and MSI (Modified Strongly Implicit procedure).

¡ Krylov space methods, such as the CG (Conjugate Gradient), CGS (Conjugate Gradient Squared), GMRES (General Minimalised Residual), QMR (Quasi-Minimalised Residual), BiCG (Bi-Conjugate Gradient) and BiCGSTAB (Bi-Conjugate Gradient Stabilised) methods.

¡ multigrid schemes.

The distinction between the classes of iterative solvers blurs somewhat since multigrid methods can be considered as an acceleration technique to improve the performance of the simple iterative and incom- plete factorisation methods, and Krylov-space methods can use the simple iterative and incomplete factorisation methods as preconditioners. A Krylov space method using an incomplete factorisation smoothed multigrid preconditioner is a solver that combines three of the classes of iterative solver.

3.1.1 Linear Equations Resulting from a Finite Volume Discretisation of a PDE

The general form of a set of linear equations can be written,

¢¤£¦¥¨§ © (3.1)

or considering the individual  equation



 ¥!

  (3.2)

 #"

For a set of equations resulting from a finite volume discretisation of a three dimensional PDE on a structured mesh, the coefficient ¢ will typically take on a hepta-diagonal structure, with the non-zero components occupying only seven diagonals of the matrix. For a two dimensional PDE there will be only five diagonals which are non zero, and for a one dimensional PDE there are three non- zero diagonals. This regular structure enables a considerable reduction in memory use and the number of operations performed since only these seven diagonals need to be stored and operated upon. Using the compass notation of the equations discussed previously in Section 2.1 the above linear equation

becomes

¥- .©

 

 (3.3)

$ $&%'( ()%*,+ +

for a discretisation of a one dimensional PDE,

¥! 0©

 

   

 (3.4)

$ $ ( ( + +

 %* %' %* %' / /

for a discretisation of a two dimensional PDE, and

¥- 0©

 

     

 (3.5)

$ $ ( ( + +

 %' %' %* %' / /1%* %*2 2

for a three dimensional PDE. The subscript 3 refers to the point at which the equation is centred,

©657©8¦©69:©<; = and the 4 and subscripts refer to the neighbouring East, West, North, South, Top and CHAPTER 3. LINEAR SOLVERS 54

Bottom points respectively (see Figure 2.2). As a notational simplification equations (3.3) to (3.5) can

be rewritten as



¥- 0©

  >@?

>@? (3.6)

$ $

 % 

>@?

©C57©C8¦©C9D©<;

4 = where AB represents the neighbouring points and .

For some systems the equations are symmetric,

¢E¥¤¢ (3.7) " This is commonly the case in diffusion problems, or in the solution of a pressure correction equation. Some of the linear solvers can take advantage of this symmetry at the cost of being only applicable to symmetric systems. In the more general case the equations are non-symmetric, which is the case with a transport equation with advection.

3.1.2 Direct Methods

Direct solvers are not commonly used in solving finite volume equations for the very good reason that they scale poorly with problem size, both in terms of memory and number of operations. A notable exception is the Thomas tridiagonal algorithm which is unfortunately only applicable to one- dimensional PDEs. However a multidimensional block tridiagonal method can be developed.

LU (Lower Upper) Factorisation

For solving dense systems of equations LU factorisation is commonly used. It has a similar operation count to Gaussian elimination but allows the efficient solution of many right hand sides. The following description of such a solver follows that of Press et al[133]. A more thorough description of the method can be found in Golub and Van Loan[51].

A matrix ¢ can be factored into lower and upper triangular components such that, ¥-¢

FHG (3.8) "

This decomposition can be used to solve the equation,

¢¤£&¥JI £&¥ I £ ¥¤§M©

F G K FHGLK (3.9)

by first solving for the vector N such that

¥¤§M© F

N (3.10)

and then solving

£)¥ G

N (3.11) " The solution of the upper and lower triangular set of equations is trivial, since equation (3.10) can be

solved by forward substitution,



¥ ©

O



P

 

RS







P © ¥¨Z,©[\© 8¦© ¥ Q

V

O O

 @XY P

(3.12)

UT W  "]"^" 

  CHAPTER 3. LINEAR SOLVERS 55

where N is the number of equations, followed by the solution of equation,(3.11) by backward substi-

tution,

O



¥ ©





_

 

 RS





Q

¥ © ¥-8 ©8 Z

O _

 ,@XY

Q Q

(3.13)

_

 UT    T T "]"^" "

 

^` ¢

The only problem remaining is factoring into FaG , a process that can be performed using Crout’s

¥ ©6Z © 8 Q

algorithm. For each column of the matrix, b ,

"]"]"

¥ ©

_

   

  





¥ © ¥-Z ©C[ ©

V

_ _

c c

  

Q

 T  "^"]" T

   

c



¥ ©

_

 

Q (3.14)



 



V

P ¥ P ©

_

c c

   gf

Q0dHe

  T  

c



 



V

P ¥ P © ¥ © Z 8 P

_

f c c

    

e Q

b b

 T % % "^"]" "

    

c

 ¢

The algorithm for factoring the array into the FaG arrays is given in Figure 3.1, with the solution

I £h¥i§

of the system FHGLK being given in Figure 3.2.

¥ ©6Z 8 Q

for b

¥

"^"]"

_

   

¥-Z,©[



 

Q b

for



¥

"]"]" T

_ _

c c

  

c

W

V

¥

 T'j 

   

_

 

Q

 



P ¥ P

_ _

c c

     gm

c



Q.dlk

V

j

  T   

¥ © Z © 8

Q

b b

for

 

% % "]"]"

P ¥¨P P

_

c c

    nm

c



k

V

T'j 

     ¢

Figure 3.1: Factoring into FaG .

P ¥¨

O

   

d

¥!Z ©C[ 8 

for



"^"]"

P P ¥

O O

 om



d k

V

j

    UT

¥

_ O

   



d

¥-8 ©C8 Z



Q Q

for



T T "]"^"

¥

O _ _

 g,nm

 

k d

j

 UT   

^`

I £h¥i§ Figure 3.2: Solving the system FHGpK .

For a system of 8 equations (which would correspond to the N points on a finite volume mesh) the

8&q 8 I8&s K

storage requirement for an LU factorisation is , whilst the number of operations is of r %

CHAPTER 3. LINEAR SOLVERS 56

I8&q K

for factorisation, and r for solution. For a sparse system of equations such a scheme is rather ¢ inefficient since 8 is likely to be large, and most of is zero. A more efficient band diagonal version where the array is stored as a band only wide enough to store the farthest off-band diagonal is

implemented in Press et al[133].

For symmetric matrices where ¢t¥¤¢ a more efficient factorisation which takes advantage of the

¢u¥

symmetry is the Cholesky factorisation FHF (sometimes referred to as the square root of the

¢E¥ matrix), and the LDL (Lower Diagonal Lower ) factorisation FHv-F [51]. Both give a threefold reduction in the number of operations compared to LU factorisation, but the LDL method is to be preferred since the Cholesky factorisation requires 8 square root operations, which can be a slow operation on many computers.

For a general system of equations pivoting, where the equations of the system are reordered, is nec- essary to ensure numerical stability. However, for the equations resulting from a discretised elliptic PDE, in particular the equations resulting from the pressure equation given in Equation (4.23), the system is diagonally dominant and no pivoting is necessary[51, 133].

Thomas Tridiagonal Solver

A finite volume discretisation of a one dimensional PDE results in a system of equations with only the diagonal and one sub- and super-diagonal having non-zero values, commonly referred to as a tridi- agonal system of equations. These can be solved extremely easily and efficiently using the Thomas algorithm[169].

A tridiagonal system of equations ¢¤£&¥¤§ can be written as,

Rw Rw Rw



 yn  

w w w

w w w



y

X{z X{z X{z

w w w

q q q q q

z z z

w w w



¥

z z z

S S

Spx . . . . .

z z

...... z (3.15)

z z z

x

  Y  Y   Y}|

     

y

V V V  V V

   





x

~  € For the finite volume discretisation of a PDE givx en in (3.3) the vectors , and correspond to ,

and respectively. ,+

$ ( To solve such a set of equations requires only a forward and a backward substitution, the algorithm is

given in Figure 3.3. ‚

‚

¥



‚

¥-

O

 

d

‚

¥-Z,©[ 8

x

for ‚

¥

"]"]"

ƒ



y

d

¥

 V

ƒ

¥JI

K

UT„ 

O O



d

¥

T7

   V

O

  

x

¥¨8 ©8 Z

Q Q

for

¥

T T "^"]"

O ƒ

 

 

T

  ]` ^`

Figure 3.3: The Thomas Tridiagonal Algorithm.

8 I8 K

The Thomas tridiagonal solver requires words of memory and takes r operations, and so is about as efficient a solver as can be hoped for. Unfortunately it can only be implemented for one- dimensional PDEs. A one-dimensional PDE with cyclic boundary conditions is not truly tridiagonal, CHAPTER 3. LINEAR SOLVERS 57

and cannot be solved with the generic Thomas Tridiagonal algorithm. However an implementation for cyclic boundary conditions is given in Press et al[133].

Block Tridiagonal Solvers

A finite volume discretisation of a multidimensional PDE results in equations that do not have the tridiagonal structure that allows the use of the efficient Thomas algorithm. However, they do have a tridiagonal block structure that allows a similar scheme, with dense submatricies being used instead of scalars.

For a finite volume discretisation of a two dimensional PDE as described by equation (3.4), the matrix

structure can be viewed as,

Rw Rw Rw

£ §

v

 ‡†ˆ 

w w w

w w w

£ ¢ §

v

†

X{z X{z X{z

w w w

q q q q q

z z z

w w w

¥

z z z

S S

. . . . S .

z z

...... z . . (3.16)

z z z

¢ £ §

v

   Y  Y  Y

    

†

¢ £ §

v

V V V V V

   



¢ § v

where the submatricies , , † and have the structure,

   

w

R

w

w



/Š‰Œ‹ 

z

X

S

¢ ¥

z



/.Ž<‹ 

. z (3.17) Y

 ..



/..‹ 



w

R

w

w

$ (

‰Œ‹  ‰Œ‹ 

z

X

S

¥

v

z

,+ $ (

Ž<‹  Ž<‹  Ž#‹ 

. . . z (3.18) Y

 ......

,+ $

‹  ‹ 



w

R



w

w



‰‘‹ 



z

X

S

¥

z

†



Ž<‹ 

. z (3.19) Y

 ..





‹ 



Rw



w

w

 

X{z

q

S

§ ¥

z

 

. z (3.20) Y}|

 .

  In a manner similar to the Thomas Tridiagonal algorithm these equations can be solved by a forward and a backwards block substitution. A similar block structure can be used to solve three dimensional

PDEs.

 “

The matrix inverse V used in Figure 3.4, which contains the block tridiagonal algorithm, is purely

for notational purposes. Instead of calculating the inverse and performing a as



¥¤•–¢ ¢ ¥-• ¢

F G

” ”

in V , the system can be solved as by factoring into its and components,

I £&¥¨§ •

FHGLK — and then solving for each column of ” and . Unfortunately the sub-matrices and

“ are dense and so cannot be solved by a tridiagonal solver but instead must be solved with a dense solver such as LU decomposition. For symmetric systems a Cholesky or LDL decomposition can be used instead of LU factorisation, reducing the number of operations.

CHAPTER 3. LINEAR SOLVERS 58

¥

v



“



¥ §

 

N “

V

¥-Z,©[ 8

for



¥

"]"]"



†

— “

V

¥ ¢

v

 V

“ —



¥JI§ ¢

K

UT  



N N “

V

£ ¥

T

   V

 

N

¥¨8 ©8 Z

Q Q

for

£ ¥ £

T T "^"]"

 

N —

T

  ^` ^`

Figure 3.4: The Block Tridiagonal Algorithm.

s6˜Cq q

8 I8 K

The two dimensional version of the solver uses words of storage and takes r operations

(assuming a square mesh), whilst the three dimensional version uses 8&™6˜Cs words of storage and

Iš8&›6˜s K

r operations. Whilst this is better than LU factorisation, it is not as good as the one dimen- sional tridiagonal solver, and is less efficient than most iterative schemes. However for cases where one dimension of the problem is much greater than the others the dense submatricies can be made smaller and the efficiency of the method (both in terms of storage and number of operations) can be greatly improved1.

3.1.3 Iterative Methods

An iterative linear solver works by repeatedly apply a series of operations to an approximate solution to the linear system, with the error in the approximate solution being reduced by each application of the operations. Semantically, each application of the operations is an iteration, and the schemes iterate

towards a solution. The basic form of all iterative methods is given in Figure 3.5. £

set £Uœ to an initial estimate of

ž¥

Q  Ÿ¡Š¢D£¤

while and error ¥ tolerance

c c



£ ¥¨¦:I£ K

perform some operations V

§¥!

Q %

Figure 3.5: A generic iterative method.

The three classes of iterative methods discussed in this chapter are

¡ simple iterative methods, such as Jacobi iteration,

¡ incomplete factorisation schemes, such as SIP (Strongly Implicit Procedure, also known as Stone’s method),

¡ Krylov space methods, such as the Conjugate Gradient method, and

¡ multigrid schemes.

1

±}°M¨®² ³ ´]±µ¨·¶o¸

For a ¨h©«ª¬¨®­6¯<¨7°M± three dimensional mesh, the storage required is , and the operation count of order . ½¾º<¿ By changing from a cubic mesh to one where ±7¹»º<¼6¨ , the storage requirement is reduced by a factor of , and the

operation count is reduced by a factor of ÀoÁ@Âoà . CHAPTER 3. LINEAR SOLVERS 59

Convergence of an Iterative Scheme

Some method must be made to rank the fitness of the approximate solutions so that the decision to

c £

terminate the solvers iterative loop can be made. If the solution at iteration  is and the exact 

solution to the equations is £ , then the error at iteration is given by

c c

¥¤£ £

Ä (3.21)

T " Of course the exact solution £ is unknown and thus so is the error. However we can easily calculate

the residual at any step and the residual is proportional to the error, so if the residual decreases by a Q¾Å

factor of VÆ so does the error. The residual of the system (3.1) is defined by

c

c

¥¨§ ¢¨£

Ç (3.22)

T " Most iterative linear solvers include a calculation of the residual (or a close approximation) as part of the solution algorithm, and so with these methods the convergence of the solver can be monitored at no extra computational cost.

A scalar measure of the residual vector Ç ’s length is given by its norm. A family of norms is given by

É

É



˜



DÊ Ë Ê

È È6É

¥

Ç

f

8 e

j 

 (3.23) "

The one, two and infinity norms are,



DÊ Ë Ê

È È

¥ ©

Ç



8

j 

 (3.24)



q

ÍË

È È

¥»Ì ©

Ç

q

8

j 

 (3.25)

and |Ò|Ò|

È ÈgÎ

¥ ϞÐ.Ñ

Ç

Ê Ë Ê 

W (3.26)

 "  For the tests described in the latter sections of this chapter the infinity norm was used as a measure of convergence.

The Residual Form of the Equations

c c



¥»£ £

Ó Ó

If is the difference between successive solutions by an iterative method, ` , then the system T ¢¤£¦¥¨§ © (3.27)

can be rewritten as

c

¢JI£ ¥Ô§ ©

K

Ó

c

%

¢¨£ ¢ ¥Ô§ ©

Ó (3.28)

c

%

¢ ¥Ô§ ¢¨£

Ó

" T 

The right hand side of the last equation in (3.28) is the residual of the  iteration, and thus we can

rewrite equation (3.27) in its residual form,

c

c

¥Ô§ ¢¤£ ©

Ç

c

T

¢ ¥ © Ç

Ó (3.29)

c c



¥Ô£ £

Ó

`

% " CHAPTER 3. LINEAR SOLVERS 60

This residual form of the system gives a greater accuracy than the original system when using finite

precision arithmetic, since the differences between the Ó values are likely to be of the same magnitude £

as the Ó values themselves, in contrast to the terms where differences can be several orders of magnitude smaller than the values of £ . Thus the rounding errors of the residual formulation will be less than for the original form of the equations.

3.1.4 Simple Iterative Methods

The simple iterative methods are characterised by the basic operation

c



c

£ ¥-•'£ © €

V (3.30) % which is applied iteratively to the system to be solved. Jacobi’s method, Gauss–Seidel, Successive Over Relaxation (SOR), Symmetric Successive Over Relaxation (SSOR) and Red-Black Successive Over Relaxation (RBSOR) are all linear solvers of this form. These methods are the simplest to implement, but are typically the slowest to converge to a solution. However they can be effectively used as preconditioners for Krylov space methods, or smoothers for multigrid schemes.

Jacobi’s Method

In Jacobi’s method each equation is solved in isolation, so the equation (3.6)



¥! 0©

  >@?

>@? (3.31)

$ $

 %  >@?

is modified into



¥

Q

f

 

>@? >@?

e (3.32)

$

T  "

>@? $

This suggests an iterative method defined by,



c

c 

¥ Q ©

f

 

>@?

e >@?

V (3.33)

T 

$

>@?

$  where the  terms on the right hand side of the equation are all from the previous iteration. This algorithm is given in Figure 3.6.

set £Uœ

ž¥

Q

 ÕiŠ¢D£¤ ¢D£¤

Ä Ä

while and ¥

c



£¦¥¨§ ¢¤£

Ó

V

c c



£ ¥¨£ £

T

Ó

V

Î

È È

¥ £

%×Ö

Ä

Ó

ˆ¥-

Q %

Figure 3.6: The Jacobi method.

The Jacobi method requires 8 words of storage over that required for the storage of the linear system and its solution. Note that there is no intrinsic order in which to update each equation, so the method is trivial to parallelise. CHAPTER 3. LINEAR SOLVERS 61

Successive Over Relaxation (SOR)

If the equations in (3.32) are solved in sequence so that the updated values of  are used as soon as

they are available, then we obtain the Gauss–Seidel algorithm,

 

c



c c

© ¥

Q

  

>@? >@?

>@?

>@?JÛÜ

V (3.34)

T  T 



 

$

>@? >@?

$ÙØÚ



( +

   / 2 assuming a case where the equations are arranged in increasing order from west to east, south to north, bottom to top. If the iterative update is under- or over-relaxed then we arrive at the Successive Over

Relaxation scheme,

 

c c

 

c c

I ¥

K

   

>@? >@?

>@?

Ü Q

>@?JÛ

Ö V

V (3.35)

T  T  T T)Ö "



 

$

$

>@? >@?

Ú

$ÝØ



+ (

 / 2   c The equations now contain a data dependency, with the new value of  depending on the previous equations updated values. This limits the ability to parallelise or vectorise$ the algorithm. The depen- dency can be removed by changing the order in which the equations are solved (such as is done with RBSOR) but this in turn will affect the rate of convergence, typically for the worse.

set £Uœ

ž¥

Q

 Õ¡Š¢D£¤ ¢D£¤

Ä Ä

while and ¥

¥ ©6Z 8 Q

for





c





c

"]"^"

¥

m U  





 

k

Ó

Þ

V

V

j j

  T  T

c



c



«‹ 

¥

  

Ó

V

È ÈnÎ

¥ £

%×Ö



Ä





Ó

§¥!

Q %

Figure 3.7: The SOR algorithm.

The choice of the value of is crucial to the convergence of the SOR method. A theorem due to

Kahan[74] shows that SOR Ö fails to converge if is outside the interval (0,2). If the relaxation term

¥ Ö

Q then SOR reduces to the Gauss-Seidel method. Ö

Symmetric SOR and Red-Black SOR utilise the same algorithm but with changes made to the order

¥ ©6Z 8 Q

of the operations. With Symmetric SOR every sweep through the equations in the order

¥ß8&©8 ©

"^"]"

Q Q

is followed by a sweep in the reverse order, . With Red-Black SOR the operations

¥ ©[ 8

T "^"]"

Q Q

are done in two passes, the first pass operating on the odd elements of the array, ,

¥-Z ©à 8

"]"^" T

followed by the second pass which operates on the even elements, . "^"]" The advantages of Red-Black SOR is that dependencies between adjacent array values are removed, enabling the method to be vectorised or parallelised. However, as is seen in Figures (3.22) to (3.25), this is at the cost of a greatly reduced rate of convergence.

The SOR methods require no storage above that of the original system.

3.1.5 Incomplete Factorisation Methods

A second class of iterative methods is developed by using an approximate decomposition of the ¢

array into it’s FaG factors. Instead of doing a complete factorisation only a restricted number of off- G diagonal elements of the F and matrices are allowed to take on a non-zero value, typically only CHAPTER 3. LINEAR SOLVERS 62

the diagonals that are non-zero in the ¢ matrix and one or two further bands. For each iteration a forward and backward substitution process using this incomplete factorisation is applied to the residual formulation of the system of equations.

These methods are efficient in their own right, but also have value as preconditioners for the Krylov space solvers, and as smoothers for the multigrid schemes. The most commonly encountered im- plementations are Incomplete Cholesky factorisation (commonly used as a preconditioner for the Conjugate Gradient method) and the Strongly Implicit method of Stone[163].

Incomplete Cholesky Factorisation (IC)

The simplest incomplete factorisation scheme is the Incomplete Cholesky method, where a symmetric

¢ ¥

FHF žá

matrix is factored into a system, with infill only in the locations where Å . 

set £Uœ

ž¥

Q

 Õ¡Š¢D£¤ ¢D£¤

Ä Ä

while and ¥

c



c



¥¤§ ¢¤£

Ç

V

V

c



È ÈoÎ

¥

T

Ç

Ä

V

P⥠©CZ

for Q

¥ ©6Z

"^"]"<ãä Q

for b

¥ ©CZ

"^"]"<ãå Q

for

¥¨è I è

O O

"]"]"<ãæ

      

Ë

è

T},+

  ç   ç   ç V   ç V   ç

‹ é ‹ ê

O

   

è

K

T}

/  V  ç  V  ç

«‹ é ‹ ê

O

   

P⥠© ©

T}2   ç«V   ç«V

«‹ é ‹ ê Q

for Q

¥ © ©

ã ã T "]"^"

ä ä

Q Q

for b

¥ © ©

ãå ãå6T "^"]"

Q Q

for

¥¨è I è I

O O O

ãæ ãænT "]"^"

     

(

  ç   ç   çÍT   ç  ]`   ç

«‹ é ‹ ê

O



 

K

%·  `  ç

‹ é ‹ ê

O

 

c c



£ ¥¤£

%·   ç{`

‹ 隋 ê

N

V

§¥!

%

Q %

Figure 3.8: The Incomplete Cholesky method.

P⥠©6Z

for Q

¥ ©CZ

"]"^"ã

ä Q

for b

¥ ©6Z

"]"^"ãå Q

for

"^"]"<ãæ

I è q

K

 

è q I

K

è ¥

$ +

 T  V   ç

«‹ é ‹ ê ‹ 隋 ê

 



Q0dìëî í

è q I

K

í

T 

 V  ç /

  ç

‹ é ‹ ê

 

í

T 

  ç«V 2

«‹ é ‹ ê

Figure 3.9: The Incomplete Cholesky and Incomplete LU factorisations.

If the original is factored into diagonal, lower and upper triangular parts as

© ¢u¥

F F

v (3.36)

% %

then the incomplete factorisation ï can be written as

¥ßIšð Išð ©

FMK F K

ï (3.37)

% % CHAPTER 3. LINEAR SOLVERS 63

where the ð array only has non-zero components on the diagonal, the values being

è ¥ ©

Q ñ

 (3.38)

I è

K

q

 

  



V

j

 UT   whilst the off-diagonal values of the F matrix have the same values as the corresponding elements in the ¢ array.

The system of equations is then solved iteratively by the following process; for each iteration  the

residual is calculated,

c



c



¥¨§ ¢¨£ ©

Ç V

V (3.39) T

then an update is calculated by solving

c



Ið ¥ ©

FMK

Ç

N

V

Ið ¥ ©

F K ò

% (3.40)

N %

with the update then being added to the previous iterations solution,

c c



£ ¥¨£ ò

V (3.41)

% "

The solution algorithm is given in Figure 3.8, with the factorisation being given in Figure 3.9.

set £Uœ

ž¥

Q

 Õ¡

¢D£¤ ¢D£¤

Ä Ä

while and ¥

c



c



¥¤§ ¢¤£

Ç

V

V

c



È ÈoÎ

¥

T

Ç

Ä

V

P⥠©CZ

for Q

¥ ©6Z

"^"]"<ãä Q

for b

¥ ©CZ

å

"^"]"<ã Q

for

¥¤è I

O æ O

"]"]"<ã

    

Ë

+

  ç   ç   ç T} V   ç

‹ é ‹ ê

O

 

K

T} /  V  ç

«‹ é ‹ ê

O

 

P⥠© ©

T}2   ç«V

«‹ é ‹ ê Q

for Q

¥ © ©

ãä ãäT "]"^"

Q Q

for b

¥ © ©

å å

ã ã T "^"]"

Q Q

for

¥ è I

O O O

æ æ

ã ã T "]"^"

    

T (

  ç   ç   ç ]`   ç

‹ 隋 ê

O



 

K

%·

 `  ç

«‹ é ‹ ê

O

 

c c



£ ¥¤£

%·

  ç^`

«‹ é ‹ ê

N

V

§¥!

%

Q %

Figure 3.10: The Incomplete LU method.

The method is not as fast as the SIP and MSI formulations described below, and is not often used as a solver. However it has frequently been used as a preconditioner for the Conjugate Gradient method, the combination being referred to as the Incomplete Cholesky–Conjugate Gradient method or ICCG. The solver can be modified to remove the need to take square roots in the factorisation of the equations (the Incomplete LDL method), and to be applied to non-symmetric systems (the Incomplete LU method (ILU)). The factorisation and solution algorithms for the ILU solver are given in Figures 3.10 and 3.11.

The factorisation step of the IC and ILU solvers only requires the storage of the diagonal of the ð matrix, whilst the solvers themselves require the storage of the residual Ç . Therefore the memory usage of the method is only Z8 words.

CHAPTER 3. LINEAR SOLVERS 64

P⥠©6Z

for Q

¥ ©CZ

"]"^"ãä Q

for b

¥ ©6Z

å

"]"^"ã Q

for

è

æ

"^"]"<ã

 

è

è ¥

( + $

V   ç  T} 

«ó‰Œ‹ é ‹ ê «‹ é ‹ ê ‹ 隋 ê



 



Ü

Û Q0d

è

T}/   V  ç

  ç

«‹ é ‹ ê ‹ é#󁉑‹ ê

 

Ú

Ø

T} 

2   ç«V

‹ 隋 ê ‹ é ‹ ê^ó‰

Figure 3.11: The Incomplete LU factorisation.

Strongly Implicit Procedure (SIP)

The Strongly Implicit Procedure (SIP) of Stone[163] is a more advanced version of the Incomplete LU scheme, but unlike the IC and ILU methods it is only applicable to the equations resulting from a

finite volume discretisation of a PDE. The approximate LU decomposition of ¢ that is used is

¥ ¥¤¢ © FôG

ï (3.42) %*õ

where is the error between the exact and approximate factorisations.

õ

¢ G

If F and are constrained to be only non-zero in the locations that is non-zero then the matrix

¢ ¢

ï will have more non-zero diagonals than ; two extra non-zero diagonals if is a five- resulting from a two dimensional PDE, or six extra diagonals if it is a seven-diagonal matrix

from a three dimensional PDE. ¢

To make ï a good approximation of , the array is set such that

õ £)ö

Å (3.43)

õ " This is done by recognising that the system being solved is from a finite volume approximation of

a PDE. Thus the values of the £ field in the extra diagonals of can be approximated by a second £ order extrapolation of the values of . By putting the terms forõ the extrapolation into the elements

of and cancelling with the values of £ in the extra diagonals of then the system can be made to õ approximateõ equation (3.43). Finally, to make the LU factorisation unique the diagonal elements of G are set to 1.

The system of equations is then solved iteratively in a similar manner to the solution of the Incomplete

Cholesky system; for each iteration  the residual is calculated,

c



c



¥¨§ ¢¨£ ©

Ç V

V (3.44) T

then an update is calculated by solving

c



¥ ©

F

Ç

N

V

¥ ©

G-ò (3.45) N

with the update then being added to the previous iterations solution,

c c



£ ¥¨£ ò

V (3.46)

% "

As is seen in Figures (3.22) to (3.25), the SIP solver is much faster than the simpler ILU and IC

schemes, and so is suitable as a solver in its own right, as well as a smoother with other iterative

8 8 ø methods. It requires ÷ words of storage for the solution of two dimensional PDEs, and words for three dimensional PDEs. CHAPTER 3. LINEAR SOLVERS 65

set £Uœ

§¥

Q

¢D£¤  Õ¡Š¢D£<¤

Ä Ä

while and ¥

c



c



¥¨§ ¢¨£

Ç

V

V

c



È ÈoÎ

¥

T

Ç

Ä

V

P⥠©CZ

for Q

¥ ©CZ

"]"^"ãä Q

for b

¥ ©6Z

å

"]"]"<ã Q

for

è ¥!¦ I

O O

æ

"]"^"ã

     

Ë

T

  ç   ç V   ç   ç   ç

O

  

K

T

  ç  V  ç

O

y   

P⥠© ©

T

  ç   ç«V

x Q

for Q

¥ © ©

ã ã T "^"]"

ä ä

Q Q

for b

¥ © ©

ãå ãå6T "]"]"

Q Q

for

¥

O O O

ãæ ãægT "^"]"

    

  ç   çTúù¾  ç ^`   ç

_ O

  

T   ç  `  ç

O

  

c c



£ ¥¨£

Tüû  ç   ç{`

N

V

ž¥-

%

Q %

Figure 3.12: The Strongly Implicit Procedure (SIP) of Stone.

P¥ ©CZ

for Q

¥ ©CZ

"]"]"<ãä Q

for b

¥ ©6Z

å

"^"]"<ã Q

for

I I ¥

K

æ

"]"^"ã

_

    y 

d Q

‚

I I ¥

K

% %'ý ù 

  çV   ç«V   ç 2

«‹ é ‹ ê

_

    

d Q

è ¥ I I

K

%*ý ù % 

 V  ç  V  ç   ç /

‹ é ‹ ê

_

    

d Q

¥

,+ %'ý ù %

  ç V   ç V   ç

«‹ é ‹ ê

   y   

x

‚

¥ è

% ù ù

  ç  V  ç   ç   ç«V

ƒ

     

¥¤è

û % ù

  ç  V  ç   ç V   ç

þ _

     

y

x

‚

I Iè

K

  ç V   ç\%   çŠù¾  ç«V

ƒ þ

  

x

¦ ¥



Q0dˆÿ

$ %·ý % % T ù

  ç V   ç

«‹ é ‹ ê

_

   y   

  ç

¥-¦ I

K

% % û

  ç  V  ç   ç   çV

 

¥-¦ I

K

ù ( T„ý

  ç   ç

«‹ é ‹ ê

ƒ _



 

x

I ¥-¦

K

T7ý 

  ç   ç

‹ é ‹ ê

þ

 

T7ý  û

 ç   ç 

«‹ é ‹ ê

Figure 3.13: The incomplete LU factorisation used in SIP. CHAPTER 3. LINEAR SOLVERS 66

Modified Strongly Implicit procedure (MSI)

The Strongly Implicit Procedure was further developed by Schneider and Zedan[151, 185] into the G Modified Strongly Implicit procedure (MSI). In this method the F and arrays chosen in the de- composition in equation (3.42) are allowed to have more non-zero elements than the equation matrix ¢ .

set £Uœ

ž¥

Q

 ÕiŠ¢D£<¤ ¢D£¤

Ä Ä

while and ¥

c



c



¥¤§ ¢¤£

Ç

V

V

c



È ÈoÎ

¥

T

Ç

Ä

V

P¥ ©CZ

for Q

¥ ©CZ

"]"]"<ã

ä Q

for b

¥ ©6Z

"^"]"<ãå Q

for

¥£¢ I

O O O

"]"^"ãæ

       y   

Ë

è ¦

T T

  ç ]`   ç«V   ç   ç   ç   ç   ç«V

O O

      

K

T T

  ç  `  ç«V   ç  V  ç

O O

      

x

P¥ © ©

T¥¤Š  ç ]`  V  çT§¦\  ç V   ç Q

for Q

¥ © ©

ã ã T "^"]"

ä ä

Q Q

for b

¥ © ©

ãå ãå@T "]"]"

Q Q

for

¥

O O O O

æ æ

ã ã T "^"]"

        

  ç   çT©¨g  ç ^`   ç Túù¾  ç V  `  ç

O _ O

      

T #  ç  `  ç T   ç  V  ç{`

O O

      

c c



£ ¥¤£

Tüû  ç V   ç{` T ü  ç   ç{`

N

V

ž¥!

%

Q %

Figure 3.14: The Modified Strongly Implicit procedure (MSI) of Schneider and Zedan.

The code for the MSI method is more complicated than for the SIP scheme and it requires more mem-

8 àµ8 Q ory, using ø and words of storage for two and three dimensional PDEs respectively. However it converges much faster than the other incomplete factorisation methods. It is infrequently reported in the literature and seems to be less frequently used, presumably due the increased complexity of the implementation and the extra storage used.

CHAPTER 3. LINEAR SOLVERS 67

P⥠©CZ

for Q

¥ ©CZ

"]"]"<ãä Q

for b

¥ ©CZ

å

"^"]"<ã Q

for

¥ I I I

K

æ

"]"^"ã

          

y

d Q

I I

K K

  ç 2 %·ý ù¾  ç«V T£¨g  çV ¨g]`   ç«V % #^`   çV

‹ 隋 ê

            

¥

T #  ç«V T×ùn]`   ç«V ¨g  çV ù¾ `  ç«V % # `  ç«V

   

y

è ¥

  ç T   ç¨n  ç«V

       

y

¦ ¥–I

  ç T   ç #  çV T   çŠù¾]`   ç«V

_

   

y

x

I I Z

K

  ç   ç  / T   ç«V

‹ é ‹ ê

_ _

           

x

I

K

%·ý   ç ]`   ç ¨n^`  V  ç % ^`  V  ç\% ü]`  V  ç

_

    

y

I IšZ

+

T V   ç  T   ç0û  ç«V

«‹ é ‹ ê

_ _

       

x

d Q

I Z

K

%*ý  V  ç %×û V  çUT V   çù¾ V  ç

_

          

¥ ¦

T©¨g V _  ç ¨n^`  V  ç% ^`  V  ç,% ü]`  V  ç

       

¥–I ¦

¤Š  ç T   ç ^`   çV T   ç¨g V  ç

 y      

I è

KK

¦  + T û T ù

  ç   ç   ç«V   ç  V  ç

«‹ é ‹ ê

   y       

x

I I Z© Z

KK

û T}ý ù % ù %

  ç  V  ç   ç   ç«V   ç  `  çV

_

     

d Q



¥

%*ý ù % % û

V   ç V   ç V   ç

    

x



¥

¨

  ç ^`   ç«V

y   

q



è ¥

ù

  ç   ç«V

       

x

s



¥-è

% ¨

  ç V   çV   ç  `  ç«V

  



¥-è

  ç.ù¾  ç«V

   

™ x



¥

  ç # `  ç«V

   



¥

Æ ¤Š  ç¨g]`  V  ç

  

›



¥!¦

¦\  çù¾V   ç

_

  



¥

  ç  V  ç

_

   



¥-¦

¤Š  ç ]`  V  ç

_

      

œ



¥

  ç¾û V  ç %¦\  ç V   ç

C    



¥

¤Š  ç ü]`  V  ç

   

q

¥¢ I

¦ û

  ç V   ç

         

    

I Z,I

K K

(

¨n  ç   ç  T   ç ü^`   ç«V T¤Š  ç #]`  V  ç

‹ 隋 ê

 

s

¥ ¢ Iè

K

T}ý % Æ % % %

        

x

¥£¢ I è

ù¾  ç T   ç   ç ü `  ç«V %¦\  ç #V   ç



     

    

I Z,I 

KK

 T û

  ç   ç   ç  `  ç«V

«‹ é ‹ ê

q s › ™

¥ ¢ Iš¦

K

T}ý % % % %

_ _

        

¥ ¢

T %¤

  ç   ç   ç  V  ç   ç ^`  V  ç

    

    

¥ ¢ I I

KK

û T ¦

  ç   ç   ç V   ç

   C 

œ q

ü  ç   ç  T7ý % % % %

«‹ é ‹ ê

Figure 3.15: The incomplete LU factorisation used in the MSI method. CHAPTER 3. LINEAR SOLVERS 68

3.1.6 Krylov Space Methods

In Krylov space methods the solution of the system of equations is viewed as an optimisation problem, with the goal being the minimisation of the residual of the system. The Conjugate Gradient method is the oldest of the Krylov space solvers, the classical description of the method being given in the joint paper by Hestenes and Stiefel[60]. It was originally developed as a direct method, but Reid[138] popularised the solver as an iterative scheme for problems.

Since the development of the Conjugate Gradient scheme a number of other Krylov space schemes have been devised, with good summaries of the methods being given in the books by Barrett et al[7] and Golub and Van Loan[51]. A more intuitive introduction to the methods is given in the paper by Shewchuk[154].

In this chapter the Conjugate Gradient (CG), the Bi-Conjugate Gradient Stabilised (BiCGSTAB), and the General Minimalised Residual (GMRES) methods are discussed. Other methods which are commonly encountered in the literature include the Steepest Descent (SD), the Bi-Conjugate Gradient (BiCG), the Conjugate Gradient Squared (CGS), and the Quasi-Minimal Residual (QMR) methods. These additional methods were also implemented for this study but offered no advantages over the three methods discussed below. They are briefly discussed at the end of the section, but for a fuller description the reader is referred to Barrett et al[7] and Golub and Van Loan[51].

Conjugate Gradient Method (CG)

The Conjugate Gradient method is the oldest of the Krylov space methods. It was independently developed by Hestenes and Stiefel[60] as a direct method, but its popularity dates from the discovery by Reid[138] that it could be effectively used as an iterative method.

Following the presentation given in Barrett et al[7], for each iteration the residual is minimised along a

path orthogonal to the previous search. Thus the solver steps along the residual surface in the solution c

space to find the minimum residual. At each iteration the iterate £ is updated by a multiple of the c

search direction vector  ,

c c c



£ ¥i£

c 

V (3.47)

%*ý "

c

c

¥¤§ ¢¤£

The residual Ç is updated as,

T

c

c c c c



¥-¢ ¥

Ç Ç

 c

 |

V where (3.48)

" T„ý

| "

c

c

¥   ¢  

Ç Ç

c

! !

The choice  minimises for all choices of .





ý ý ‚

The search directions are updated using the residual,

c c c



¥ ©

Ç

c



 

V (3.49)

‚

% V

where the choice

c c

Ç Ç

¥ ©

c

"

c c 

 (3.50)

Ç Ç

V V

"

c



c c c



¢

Ç Ç

  V

ensures that and V , or and are orthogonal.

The pseudo code for the preconditioned conjugate gradient method is given in Figure 3.16, the pre-

c



c



¥

ò

Ç

ï ï # V

conditioning being the “solve V ” operation. If is set to the then for

¥ ò each iteration Ç and the algorithm simplifies to its unpreconditioned form. The preconditioned form of the solver requires àµ8 words of storage (not including that required by the preconditioner), whilst the unpreconditioned form requires [µ8 words. CHAPTER 3. LINEAR SOLVERS 69

set £Uœ

œ

œ®¥¤§ ¢¨£

Ç

ž¥

T

Q

 ÕiŠ¢D£¤ ¢D£¤

Ä Ä

while and ¥

c



c



¥

ò

Ç

ï

V

solve V

c



¥

ò

þ Ç$

c



V&% V

ž¥

V " Å

if ‚



¥ œ

ò 

else ‚

¥('



c



ó‰

'



c c c



V

¥

ò

óŠŽ

c



 

V

c

c

¥-¢

% V



| )



'

¥



c

ó‰

!



 

c c c



ý

£ ¥¨£

c



V

c c c



¥

%*ý

Ç Ç



c

V

c

Î

È È

¥

T7ý

Ç

Ä

ˆ¥-

Q %

Figure 3.16: The preconditioned Conjugate Gradient algorithm.

The Conjugate Gradient method works only for symmetric positive definite systems. Thus it cannot be used to solve the general transport equation, but can be used for diffusion equations or the pressure correction equation of the SIMPLE coupling schemes described in Section 4.2.1.

Since it requires a symmetric system, any preconditioner used with the solver must preserve this symmetry. Thus Jacobi’s method, Red Black SOR and Symmetric-SOR can be used as precondition- ers, but not SOR. The incomplete LU solvers make excellent preconditioners, with the incomplete Cholesky method commonly being used in early implementations of the solver. This combination of solver and preconditioner is referred to as the ICCG (Incomplete Cholesky-Conjugate Gradient) method.

Generalised Minimal Residual (GMRES)

The Generalised Minimal Residual method is designed to solve non-symmetric linear system, using a sequence of orthogonal vectors. Thus unlike the conjugate gradient method it can be applied to the transport equation, but it does so at the cost of requiring the storage and use of the sequence of solution vectors.

The GMRES iterates are constructed as the series

c c



œ

£ ¥¨£ ©

O O

c

+* *

 (3.51)

% %¨"]"^".%

where the O coefficients have been chosen to minimise the residual norm. The number of operations

c £ in the calculation of the iterate thus increases linearly with the number of iterations, as does the

storage used. To place an upper limit on the storage required by the scheme, the solver is commonly

-,/.10/2 ,/2 8 -,/.10/2 ,/2gI8 £

implemented with a restart after £ iterations, limiting the memory usage to

,/.1032 ,32

 £ à

K %

words of storage. % % CHAPTER 3. LINEAR SOLVERS 70

The algorithm for the restarted GMRES solver is given in Figure 3.17. It is taken from the method suggested by Saad and Schultz[145].

set £Uœ

ž¥

Q

 Õi

¢D£¤ ¢D£<¤

Ä Ä

while and ¥

¥¨§ ¢¤£

Ç



È È

ž¥ ¥

T

Ç

Ä q

if ( Q )



¥

Ç *

solve ï

 

È È

ž¥ ¥

Ä

*

q

Q d

if ( ) 4

¥

5

Å



È È

¥

 *

q

  

È È

¥

ù

* * *

q

d

,/.1032 ,32

¥ ©CZ  £ Ÿ ¢D£<¤

Ä Ä Q

for while

¥-¢

"]"]"

Ç

*





¥

Ç

* ï

solve ^`

¥ ©6Z

Q

for b

 

¥

"]"^"

 * *

]`

  

¥

¦   "

* *  *

]` ]`



È È

¥

T¦  

 *

q

]`

 

¥

¦\]`  

 * *

d

‚

^` ^`

¥ ©6Z

¦ ]`  

Q Q

for b

¥

"]"^" T

_

    

‚

¥

ý ¦   % ¦ `  

_

    

¥

T ¦ % ¦

  `  



¥

¦ ý

 

 

ñ

¦

`  

¥

q q



d

¦ ¦ %¦

  

ñ

  ^`  

¥

q q

_





d

¦ %6¦ ¦

 ]`  

ñ

  ^`  

¥

q q



¦\  ¦ %¦

¥

  ^`  



Å

¥

¦\]`  

_



¥

ùn]` T  ù¾

¥

ùn #‘ùn

Ä



Ê Ê

4

ˆ¥-

ù¾^`

Q

¥ ©

%

Q Q

for b

¥

T "^"]"

O

   

d

P⥠© Z

ù ¦



Q Q b

for b

¥

T T "^"]"

O

 

¥ ©

ù ù T ¦

ç ç ç«

Q Q

for b



£¦¥¨£

T "^"]"

O

*

 %

Figure 3.17: The Preconditioned restarted GMRES Method. CHAPTER 3. LINEAR SOLVERS 71

Bi-Conjugate Gradient Stabilised (BiCGSTAB)

The Bi-Conjugate Gradient Stabilised method was developed by Van der Vorst[174] from the CGS, BiCG and GMRES methods, in order to solve non symmetric systems of equations whilst avoiding the often highly irregular convergence patterns of CGS and BiCG, and the large storage requirements

of GMRES. œ

set £

œ

œ®¥¤§ ¢¤£

Ç

œ

7 ¥

T

Ç Ç

§¥

Q

 Õ¡Š¢D£<¤ ¢D£¤

Ä Ä

while and ¥

c



¥87

þ Ç Ç

c



V

¥ ¥

V "

þ

c c

  Å

if Å or method fails

ˆ¥

Ö

V V Q

if ‚



œ

¥

Ç 

else ‚

'

¥

 

c



ó‰:9 ó‰

'

 

c c c c

>=  

V

¥

óŠŽ; ó‰

Ç

c c

©<  *

 

V V V

¥

% V T)Ö V

@? 

solve ï

c

¥-¢

| B

*

 ?

'

¥



c

A



ó‰





c c



¥

ý

5 Ç

c*

V

c

È ÈnÎ

¥

T„ý

5

Ä

¢D£¤

Ä Ä

if ¥

c c



£ ¥¤£

c

 ? V

stop %*ý

¥

5 5

ïC? G

solve |

¥-¢

D

5

|

?

¥FE



c

E E



c c c



£ ¥¨£

Ö

5

c c

 ?

V

c

¥

D

%'ý %7Ö

Ç 5

c

c

È ÈoÎ

¥

T)Ö

Ç

Ä

ž¥-

Q %

Figure 3.18: The preconditioned Bi-Conjugate Gradient Stabilised algorithm.

The method can be applied to non-symmetric systems, and is quite robust and efficient. Like the con- 8

jugate gradient method it only stores a limited number of vectors at any iteration, requiring ø words 8

of memory ( ÷ in its unpreconditioned form), and so unlike the GMRES solver doesn’t increase its per-iteration memory use and operation count with increasing number of iterations.

Other Krylov Space Methods

Other Krylov space solvers implemented and tested in this study were the Steepest Descent (SD), the Bi-Conjugate Gradient (BiCG), the Conjugate Gradient Squared (CGS), and the Quasi-Minimal Residual (QMR) methods.

The Steepest Descent scheme (SD) is an optimisation method—by minimising the residuals of the linear equations it arrives at the equations solution[154]. For the current iteration the solver corrects the solution in the direction of the steepest downward gradient. It is simple but inefficient. CHAPTER 3. LINEAR SOLVERS 72

The Bi-Conjugate Gradient (BiCG)[44] and Conjugate Gradient Squared (CGS)[159] methods were developed to free the CG solver from its limitation of only being applicable to symmetric systems.

For the BiCG method the orthogonal sequence used in the CG method is replaced by two mutually

¢ orthogonal sequences, one based on the system ¢ and the other on its transpose . The CGS solver

is a modification of the BiCG solver that applies the updating operations for the ¢ sequence and the ¢ sequence to both vectors. Ideally this would double the convergence rate, but in practice convergence is very irregular.

The Quasi-Minimal Residual (QMR)[46] method applies a least squares solve and update to the BiCG residuals, smoothing out the convergence of the method and preventing the breakdowns that can occur with BiCG.

None of these methods was found to be superior to the methods discussed above, and so for the sake of brevity detailed descriptions of the methods and data of their behaviour have been omitted. For further information on these methods the reader is directed to Barrett et al[7] and Golub and Van Loan[51].

Preconditioners

The rate of convergence of the Krylov space methods is often improved by the use of a preconditioner.

This is expressed in the algorithms above as a step of the form,

¥ ©

ò Ç

ï (3.52)

ò Ç

where is a current residual field, is the preconditioned residual, and ï is a matrix having similar

¢ ¢

properties to . If ï is identical to , then preconditioning entails solving the set of equations, which

is what we were using the Krylov space method to do in the first place. However if ï only resembles ¢ , then the convergence of the Krylov space method will still be improved by the preconditioning.

Convergance of MSI, CG and CG-MSI Solvers

1 Unpreconditioned CG MSI MSI Preconditioned CG

0.1

0.01

0.001 Residual

0.0001

1e-05

1e-06 0 2 4 6 8 10 12 Time (Seconds)

Figure 3.19: Comparison of the convergence of the MSI, the unpreconditioned CG, and the MSI preconditioned CG solvers.

The convergence of an unpreconditioned CG solver, and a MSI solver together with a MSI precondi- tioned CG solver are shown in Figure 3.19. The conjugate gradient solver converges slowly at first, with the convergence rate increasing after 10 seconds of solution time. In contrast, the incomplete CHAPTER 3. LINEAR SOLVERS 73

factorisation method has an initially fast rate of convergence that slows in the later iterations. Com- bining the two methods by using the MSI as a preconditioner for the CG scheme results in a method that is faster than either the CG or MSI solvers.

Among the easiest preconditioning methods to implement are the Jacobi and Symmetric SOR algo- rithms. More complex methods such as incomplete Cholesky decomposition, Incomplete LU meth- ods, and the multigrid schemes give faster convergence, at the expense of greater memory usage and a more complex implementation.

3.1.7 Multigrid Methods

Whilst the simple iterative and incomplete factorisation solvers are fast for small mesh sizes, their rate of convergence decreases with an increasing number of equations. This is partly due to the increased number of operations with the increased number of mesh points, but mostly due to the fact that these methods are much faster at smoothing the small wavelength components than the long wavelength components of the error. One way to improve their efficiency is the multigrid technique, whereby a PDE is solved on a series of meshes with a varying number of mesh points. The seminal paper for the method is that of Brandt[16], whilst Briggs[18] gives an excellent tutorial on the technique.

For the multigrid method discussed here, a set of equations is given for a PDE discretised on a fine mesh. The multigrid scheme then transforms the equations onto a series of progressively coarser meshes, solving the equations fully on the coarsest mesh. The solution is then solved on the series of successively finer meshes, using the solution from the previous (coarser) mesh as an initial estimate of the solution, finally solving on the finest mesh. By transferring from a fine to a coarse mesh the medium wavelength errors in the fine mesh solution are transformed into short wavelength errors on the coarse mesh which are much easier to smooth. The computation cost for solving on the coarse meshes is low, and the cost for solving on the finer meshes is reduced by using the coarse mesh solution as an initial estimate.

The three basic operators for the multigrid technique are the smoother, which improves the current estimate of the solution on a given mesh, and the restriction and prolongation operators, which map a set of equations and a solution between a fine and a coarse mesh.

Given two meshes, with the fine mesh having a mesh spacing of , and the coarse mesh having a Z spacing of . The restriction operator maps the fine mesh solution¦ onto the coarse mesh, and is

written ¦

q

£ ¥IHi£ ©   (3.53) whilst the prolongation operator performs the inverse operation, interpolating a coarse mesh solution

onto a fine mesh,

q

£ ¥IJ„£ 

 (3.54) "

The basic algorithm for a multigrid scheme is given in Figure 3.20 For each iteration the current solution is smoothed using some smoothing operator, typically a few iterations of one of the simple iterative or incomplete factorisation methods. The residual for the current solution is calculated and then restricted onto the coarse mesh and solved. The solution from the coarse mesh system is then interpolated onto the finer mesh using the prolongation operator, and is added to the previous iterations solution.

For all but the coarsest mesh the multigrid solver is recursively applied to solve the restricted system of equations. On the coarsest mesh the restricted system is solved either by an iterative method, solving the equations to full convergence, or by the use of a direct method.

On the finest mesh the infinity norm of the residual is taken and compared to a supplied tolerance. If the norm of the residual is less than the tolerance then the solver is taken to have converged and the

CHAPTER 3. LINEAR SOLVERS 74

§¥ Q

while  Õ¡Š¢D£<¤

c ¥¨§

smooth ¢¨£

c c

¥¨§ ¢¨£ Ç

if on finestT mesh

c c

È ÈoÎ

¥

Ç

Ä

c

¢D£¤

Ä Ä

if ¥ exit

c

q ¥IH

Ç Ç 

if on coarsest mesh

¢aq q ¥ q

Ç

N

  solve 

else

¢aq q ¥ q

Ç

N

 

apply multigrid to 

q

¥IJ

N N 

c c



£ ¥¨£

N

`

ž¥-

%

Q %

Figure 3.20: The multigrid algorithm.

process terminates. Otherwise the solver is applied repeatedly to the system. c

For the algorithm given in Figure 3.20 the superscript refers to the iteration number, whilst the q

superscript  specifies that the variable applies to the coarse mesh.

Prolongation and Restriction Operators

For the solution of the equations resulting from a finite volume discretisation of a PDE two types of prolongation and restriction schemes can be used – either the equations can be rederived on each mesh, or some form of operation can be used to generate the coarse mesh equations from the fine mesh system.

For the first type of prolongation and restriction scheme, where the equations are rederived and the boundary conditions re-applied on each mesh, the solver is necessarily closely tied to the discretisation of the PDE. The resulting code is not very general, and for complicated differencing schemes the calculation of the equations can be slow2. For such a scheme the value of the solution at each point can be transfered to the corresponding point on the coarser/finer mesh, an operation called straight injection. For a finite volume solver this carries some extra overhead in that the cell centres of two meshes don’t align, and instead of a simple injection process the solution must be averaged over the cells.

For a black box solver that is not directly coupled to the discretisation process the prolongation and restriction operators must be derived from the fine mesh equations rather than re-discretised from

the underlying equations. A method that’ applies to the solution of PDEs is developed below. The

Z [ ’

method’ is applied to meshes that have nodes along each axis including boundary nodes (ie:

Z Z [

% Q

internal points on each axis). However it can be used for systems with K nodes on

% % Q each axis, where K is the number of points along the axis on the coarsest mesh. The method can be used for the equations% arising from both finite volume and finite difference differencing, and straight injection can be used for transferring fields between different meshes without the problems of averaging solutions as can the case with the finite volume schemes. In addition the boundary 2Moreover, for some equations such as the pressure correction equation in SIMPLE coupling schemes, the variables are defined only upon the mesh they were created on. CHAPTER 3. LINEAR SOLVERS 75

conditions are automatically applied in the equations restriction operation.

For a simple system resulting from the finite volume or finite difference discretisation of a one dimen-

sional PDE, the equations are (using the natural ordering),

Rw Rw Rw



 

w w w

w w w

$ (

‰ ‰



X{z X{z X{z

w w w

q q

z z z

S S S

¥

+ $ (

  

Ž Ž Ž



z z z s

s (3.55)

z z z

 

+ L $ML (NL

  

"



Y Y Y

+ O $PO (QO

  



™ ™

+©R $MR

 

Í  For the above system the and ™ nodes are interior boundary points (ie: the first row of points on the interior of a solution domain), and the boundary conditions have been applied in a form that removes the need for the points physically located on the boundary (see Section 2.3).

For an elliptic PDE the solution must be reasonably smooth, and so the solution values at the even numbered points in the mesh can be estimated from a second order centred interpolation from the odd

numbered points,



¥ I ©

K

 U 

q s

q



%



I ¥

K (3.56)

  

s ™

q

% " By substituting these equations into the initial system, a system with only half the equations of the

original system is generated,

R R R

S S S

Z Z

Í 

Z Z

¥ ©

( ( $

 %* 

‰ ‰ ‰



X X X

Y Y Y s

s (3.57)

Z Z

L L L L L

,+ $ %* + %'( (



™ ™

R R R

,+ $ %*,+

which can be rewritten

R R R

S S S

q q

@q

Í



 



q q q q

¥

©

 

$ (

‰ ‰



X X X

Y Y Y

s

s

  

 (3.58)

q q @q

  

+ L $ML (NL



™

™

  

 

R R

$ +

where

q ¥ Z ©



q © ¥

 $ %'( %* +

Žšó‰ Ž «ó‰ Žšó‰

$





q © ¥

(

  Ž «ó‰

( (3.59)





oq ¥ Zô

+

 

Žšó‰

+





q



"

V  For these equations, restricting the field from the one mesh to the next coarser mesh can be accom-

plished using simple injection,

q

¥

   q

 (3.60)

V " 

The corresponding prolongation operator is

q

¥ © ¥ ©C[,©

 

 Q

(3.61)

 ‰

 "]"^"

` 

Ž Ž

for the odd numbered points, whilst the even numbered points are found by linear interpolation

¥ÝQ I © ¥!Z ©à

K

  

 Z

(3.62)

 

%  "]"^"

`

Ž Ž

For a two dimensional system the equation restriction operator becomes

¥ Z q ©





q

¥ ©

$ ( +

  %* %* %* %' /

Žš󁉑‹ ŽšéŒó‰ Žš󁉑‹ ŽšéŒó‰ Žš󁉑‹ ŽšéŒó‰ Ž «ó‰Œ‹ Žé#ó‰ Ž «ó‰Œ‹ Žé#ó‰

$

«‹ é



¥ q ©

(

 

Ž «ó‰Œ‹ ŽšéŒó‰

(

«‹ é



¥ © q

+

 

Žš󁉑‹ ŽšéŒó‰

+ é

«‹ (3.63)







q ¥ ©

 

Žš󁉑‹ ŽšéŒó‰

‹ é



q

¥ Zô ©

  /

Ž «ó‰Œ‹ Žé#ó‰

/

«‹ é

  



q q



V  V  CHAPTER 3. LINEAR SOLVERS 76

the field restriction is

q

¥ ©

    



q q

 (3.64)

V  V 

and the prolongation operator becomes

q

© ¥ ©C[,© © ¥ ©C[,© ¥

 





Q Q b

(3.65)

é

‰ ‰ 

"^"]" "]"^"



`  `

Ž Ž Ž Ž with the remaining points being found by bilinear interpolation.

Similarly in three dimensions the equation restriction operators are

q ¥ Z



  



+ ( $

%ß %J  

Ž «ó‰Œ‹ Žé#󁉑‹ Ž ó‰ Ž «ó‰Œ‹ ŽéŒó‰Œ‹ Ž ó‰ Ž «ó‰Œ‹ Žé#󁉑‹ Ž ó‰

$

«‹ é ‹



 

©

%J %ß /

Ž «ó‰Œ‹ ŽšéŒó‰Œ‹ Ž ó‰ Ž «ó‰Œ‹ ŽéŒó‰Œ‹ Ž ó‰

 

© q ¥

%J %ß2

Ž «ó‰Œ‹ ŽéŒó‰Œ‹ Ž ó‰ Ž «ó‰Œ‹ ŽšéŒó‰Œ‹ Ž ó‰







© q ¥

(

 

Žš󁉑‹ ŽšéŒó‰‘‹ Ž ó‰

(

«‹ é ‹







q ¥ ©

+

 

Ž «ó‰Œ‹ ŽéŒó‰Œ‹ Ž ó‰

+

«‹ é ‹

 (3.66)









q ¥ ©

 

Ž «ó‰Œ‹ Žé#󁉑‹ Ž ó‰

‹ é ‹







q ¥ ©

  /

Ž «ó‰Œ‹ ŽšéŒó‰Œ‹ Ž ó‰

/

«‹ é ‹







q ¥ ©

 

Žš󁉑‹ ŽšéŒó‰‘‹ Ž ó‰

«‹ é ‹







q

¥ Z ©

 2

Žš󁉑‹ ŽšéŒó‰Œ‹ Ž ó‰

2

«‹ é ‹

c

   

c q q q





V  V  V

 

the field restriction is

q

© ¥

  c

   

c



q q q

 (3.67)

V  V  V

 

and the prolongation becomes

q

¥ © ¥ ©[\© © ¥ ©[\© ©6ˆ¥ ©C[,©

c

  



Q Q Q

b

 (3.68)

é

‰ ‰ ‰ 

  "^"]" "^"]" "]"^"

`  `  `

Ž Ž Ž Ž Ž Ž with the remaining points found by trilinear interpolation.

Smoothing and Solving

The smoother for the multigrid method is typically a simple iterative scheme such as SOR, Gauss Seidel or Jacobi iteration, or an incomplete factorisation method such as SIP or MSI. At each recursive call of the multigrid solver the solution is smoothed by applying several iterations of the smoother to the solution. The smoother doesn’t have to be applied to convergence since the aim is not to solve the equations but rather to smooth the solution on the current mesh.

The solve operation in Figure 3.20 varies depending on what level of the mesh hierarchy the solver is

on–for all but the coarsest mesh the solution to

q q q

¢ £ ¥¤§ ©

   (3.69) is found by recursing and applying the multigrid solver to the system. At the coarsest mesh however the system is solved either by an iterative technique applied to convergence, or by using a direct method. Since this system is for the coarsest mesh the computational cost of its solution is minimal.

The test for overall convergence of the scheme is performed on the finest mesh, where the norm of the residual is calculated and compared with the user supplied solution tolerance. Once the norm reduces below the specified error bound the solver is assumed to have converged and the process is terminated. CHAPTER 3. LINEAR SOLVERS 77

3.2 A Comparison of the Solvers

The linear solvers were compared in terms of their speed, and memory usage. When comparing the speeds of the solvers several factors come into play. For direct solvers the number of operations is fixed for a given number of equations, but for iterative methods the time taken to converge to a solution depends not only on the number of equations but also on the properties of the equations themselves (such as the boundary conditions and diagonal dominance), and the convergence criteria and tolerance chosen.

The number of equations to be solved, the layout of the data in memory, and implementation details such as the syntax used to perform a matrix-vector operation also have a big effect on speed. These effects are discussed in the following chapter, but it is important to note that comparisons of different codes should be made for a range of array sizes and with a consistent coding style to reduce variability due to these factors.

In the following sections the test case used to compare the solvers is described, and then the solvers are compared on the basis of their convergence characteristics and scaling.

3.2.1 The Solver Test Case

To compare the speeds of the linear solvers they were used to solve two finite volume problems, one with Dirichlet and the other with Neumann boundary conditions, which simulate the equations encountered in a finite volume CFD code. The test cases were run for both two and three dimensional problems, and were solved to full convergence.

The test case was a finite volume discretisation of the Laplace equation applied to a unit square or cubic domain, with a sinusoidally varying source term. For the three dimensional case the equations

were, S

qUT

VW IšZ-X VW IšZ-X&Z © ¥IVW IYX

K K K

O



A A A (3.70)

with Dirichlet boundary conditions, and

S

T

q

V:W IšZ-X V:W IšZ-X&Z © ¥[VW I ZX

K K K

O



A A A (3.71)

with Neumann boundary conditions. For Neumann boundaries a zero normal gradient was applied to

T

¥ ¥ ¥

 

Q Q

all boundaries, and for the Dirichlet problem the Å and boundaries were set to and

T T

¥ ¥

T Å Q respectively, with all other boundaries set to . With the Neumann problem, the solution

is not unique, and a further condition of

T ¥

Å (3.72) was imposed at the centre of the solution domain to ensure uniqueness. For the two dimensional test cases the Z forcing component of the source term was dropped. The solutions to the two dimensional forms of the test functions are shown in Figure 3.213.

For each equation two runs were made. The first compared the convergence of the iterative methods

ZP\-]µq s _^ at one mesh size–a mesh in two dimensions, and ÷ mesh in three. The other run was made

for a range of mesh sizes, with results obtained for the time to reduce the maximum residual by Q¾Å

a factor of Æ , the solution then being considered fully converged. The direct methods were also

T ¥

timed for the same range of array sizes. All runs were made from an initial field of Å , and ¥

the relaxation parameters for the simple iterative and incomplete LU solvers were set to Q and

¥ ] Ö

Å respectively.

ý " The runs were made on a DEC Alpha 500au workstation running Digital Unix 4.0E, using Fortran 90 code compiled with the Digital Fortran compiler and using double precision storage for the floating 3The Neumann boundary problem is an approximation of the pressure correction equation from a cavity flow problem, whilst the Dirichlet boundary problem approximates a scalar field from such a flow CHAPTER 3. LINEAR SOLVERS 78

point data. Timings were made using the C getrusage and gettimeofday functions which  provide accuracy to Q0d QnŊŵŠof a second, with multiple runs being made at the small array sizes to ensure an accurate resolution of the runtime.

One failing of the test problem (and one which the Author has not had time to remedy) is the square or cubic dimensions of the solution domains, and the unit aspect ratio of the mesh cells. Iterative methods can stall when the mesh is distorted from such a regular topology[43], an analysis of the mechanisms behind the failing being given by Brandt[17]. Unfortunately, timer prevented a thorough investigation of this phenomena, although Ferziger and Peric[43´ ] claim it’s effect is less pronounced on convection–diffusion problems. CHAPTER 3. LINEAR SOLVERS 79

1

0.5

0

-0.5

-1

1.5

1

0.5

0

-0.5

-1

-1.5

Figure 3.21: The 2D solutions to equation (3.70) (top) and (3.71) (bottom). CHAPTER 3. LINEAR SOLVERS 80

3.2.2 Convergence of the Solvers

ZP\-]µq s _^ To compare the convergence rates of the iterative solvers, runs were made using and ÷ meshes for both the Neumann and Dirichlet boundary condition problems, with the residual at each iteration being printed out for plotting, the abscissa being plotted as solution time. Other studies of iterative solver have either studied only one or two classes of solver (such as Briggs[18], Yu[183]), or have made comparisons in terms of number of iterations (Ferziger and Peric[43´ ], Iserles[68], Barrett et al[7], and Kershaw[77]). The speed of each iteration can vary widely between different solvers, and so comparisons in terms of iterations alone is rather meaningless. One study by Botta et al[14] makes a comparison of a number of different methods in terms of computation speed, but the rate of convergence is not given, and the coding of different solvers by different groups allows for a wide range of coding styles which may affect the relative solver speeds.

In general the iterative methods took longer to solve the problems with Neumann boundary conditions than those with Dirichlet boundaries. The system from the Neumann boundary problem is singular unless the solution is specified at some point in the solution domain. Most methods performed more efficiently if the system was left singular, with the exception of the direct Cholesky factorisation method which would fail with the singular system.

Figures 3.22 and 3.23 show the convergence of the solvers for the 2D test cases with Dirichlet and Neumann boundary conditions respectively. Figures 3.24 and 3.25 show the convergence for the 3D Dirichlet and Neumann test runs.

The Simple Iterative Methods

The Jacobi and RBSOR methods were typically the slowest to converge for the Dirichlet boundary problems (Figures 3.22 and 3.24), convergence patterns that are typical in the literature[43]. For large system SOR and SSOR had similar speeds to each other and were the fastest of the simple iterative schemes. All solvers exhibited a smooth monotonic reduction in the residual. It should be noted that for two dimensional problems the Block Tridiagonal direct method is faster to solve than these simple iterative schemes.

The Neumann problems were slower to solve than those with Dirichlet boundary conditions. For the Dirichlet boundary conditions the convergence exhibits an initial sharp drop in the maximum residual, followed by a much longer period where the relative residual drops by a constant ratio with each iteration. For the Neumann boundary conditions this initial sharp drop was not observed, and it was thought to be due to the rapid smoothing of the short wavelength structures adjacent to the boundaries in the Dirichlet problem.

Incomplete Factorisation Methods

The more complicated incomplete factorisation solvers tended to converge faster than their simpler counterparts, with MSI being the fastest to converge, and the incomplete Cholesky and incomplete LU methods being slowest. The ILU solver gave a similar convergence rate to the fastest of the simple iterative methods, whilst the MSI and SIP solvers give a much improved rate of convergence. All solvers exhibited a smooth monotonic reduction in residual.

Multigrid Schemes

For the Dirichlet problem, the multigrid solvers were not as fast as the simple and incomplete fac- torisation methods at reducing the residual for the first two of orders of magnitude. However they CHAPTER 3. LINEAR SOLVERS 81

exhibited vastly superior performance in generating a fully converged solution, being an order of magnitude faster than the best of the incomplete factorisation methods, and two orders of magni- tude better than the simple iterative schemes. They exhibited a linear reduction in the residual. This behaviour is generally agreed upon in the literature; see [16], [55] and [49] for example.

Typically the SIP smoothed multigrid was fastest, being faster than the MSI smoother version, but in many 2D cases the SSOR smoothed multigrid had similar speed to the SIP smoothed scheme.

2D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid Solvers 2D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid Solvers 1 1 Jacobi Jacobi SOR SOR SSOR SSOR 0.1 RBSOR 0.1 RBSOR ILU ILU SIP SIP MSI MSI 0.01 MG-Jacobi 0.01 MG-Jacobi MG-SSOR MG-SSOR MG-ILU MG-ILU MG-SIP MG-SIP 0.001 MG-MSI 0.001 MG-MSI Residual Residual

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 5 10 15 0 50 100 150 Time Time

2D/Dirichlet: Convergance of CG Solver 2D/Dirichlet: Convergance of BiCGSTAB Solver 1 1 CG BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi CG-SSOR BiCGSTAB-SSOR 0.1 CG-ILU 0.1 BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP CG-MSI BiCGSTAB-MSI CG-MG-Jacobi BiCGSTAB-MG-Jacobi 0.01 CG-MG-ILU 0.01 BiCGSTAB-MG-SSOR CG-MG-SIP BiCGSTAB-MG-ILU

BiCGSTAB-MG-SIP ` ` 0.001 0.001 Residual Residual

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 5 10 15 0 5 10 15 Time Time

2D/Dirichlet: Convergance of GMRES Solver 2D/Dirichlet: Convergance of Iterative Solvers 1 1 GMRES Jacobi GMRES-Jacobi MSI GMRES-SSOR MG-Jacobi 0.1 GMRES-ILU 0.1 MG-SIP GMRES-SIP CG GMRES-MSI CG-Jacobi GMRES-MG-Jacobi CG-MSI 0.01 GMRES-MG-SSOR 0.01 CG-MG-SIP GMRES-MG-ILU BiCGSTAB GMRES-MG-SIP BiCGSTAB-Jacobi BiCGSTAB-MSI 0.001 0.001 BiCGSTAB-MG-SIP

Residual Residual GMRES-MSI GMRES-MG-SIP

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 15 30 45 0 5 10 15

Time Time

P\-]µq Figure 3.22: Convergence with time of 2D solvers on a Z mesh with Dirichlet boundary conditions. Note the scale of the x axis varies from graph to graph.

Krylov Space Methods

The Krylov space methods all exhibited an irregular convergence pattern for both test problems, with a non-monotonic reduction in the residual.

¡ CG: The unpreconditioned CG solver showed an initial period of approximately linear conver- gence, after which the rate of convergence increased. CHAPTER 3. LINEAR SOLVERS 82

2D/Neumann: Convergance of Simple, Incomplete LU and Multigrid Solvers 2D/Neumann: Convergance of Simple, Incomplete LU and Multigrid Solvers 1 1

0.1 0.1

0.01 0.01 Jacobi SOR SSOR 0.001 0.001 Jacobi RBSOR Residual SOR Residual ILU SSOR SIP MSI 0.0001 RBSOR 0.0001 ILU MG-Jacobi SIP MG-SSOR MSI MG-ILU 1e-05 MG-Jacobi 1e-05 MG-SIP MG-SSOR MG-ILU MG-SIP 1e-06 1e-06 0 5 10 15 20 25 30 0 50 100 150 200 250 Time Time

2D/Neumann: Convergance of CG Solvers 2D/Neumann: Convergance of BiCGSTAB Solvers 100 100 CG BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi 10 CG-SSOR 10 BiCGSTAB-SSOR CG-ILU BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP 1 CG-MSI 1 BiCGSTAB-MSI CG-MG-Jacobi BiCGSTAB-MG-Jacobi CG-MG-ILU BiCGSTAB-MG-SSOR 0.1 CG-MG-SIP 0.1 BiCGSTAB-MG-ILU BiCGSTAB-MG-SIP

0.01 0.01 Residual Residual

0.001 0.001

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time Time

2D/Neumann: Convergance of GMRES Solvers 2D/Neumann: Convergance of Iterative Solvers 1 100 Jacobi MSI 10 MG-Jacobi 0.1 MG-SIP CG 1 CG-Jacobi CG-MSI 0.01 CG-MG-SIP 0.1 BiCGSTAB BiCGSTAB-Jacobi

BiCGSTAB-MSI ` ` 0.001 0.01 BiCGSTAB-MG-SIP

Residual GMRES Residual GMRES-MSI GMRES-MG-SIP GMRES-Jacobi 0.001 0.0001 GMRES-SSOR GMRES-ILU GMRES-SIP 0.0001 GMRES-MSI 1e-05 GMRES-MG-Jacobi GMRES-MG-SSOR 1e-05 GMRES-MG-ILU GMRES-MG-SIP 1e-06 1e-06 0 50 100 150 200 250 0 5 10 15 20 25 30

Time Time

q P\P] Figure 3.23: Convergence with time of 2D solvers on a Z mesh with Neumann boundary condi- tions. Note the scale of the x axis varies from graph to graph.

Preconditioning improved the smoothness of the convergence of the CG solver, with the incom- plete factorisation smoothed multigrid preconditioners forcing a smooth linear reduction in the residual. The Jacobi smoothed multigrid preconditioner however gave a very poor performance for the Neumann boundary problem. The fastest solvers in all cases were the multigrid-SIP and multigrid-ILU preconditioned meth- ods, both giving very similar rates of convergence. For all cases the Jacobi preconditioner reduced the number of iterations required to converge, but increased the overall run time.

¡ BiCGSTAB: The BiCGSTAB solvers typically had a more irregular rate of convergence than their CG counterparts. However experience of their use in a CFD code has proved them to be more robust than CG, not breaking down in cases that break down for a CG scheme. Otherwise they show the same overall convergence behaviour as the CG solvers, with the exception that the Jacobi smoothed multigrid preconditioner converged with a similar speed to the multigrid- incomplete factorisation schemes for the Neumann boundaries, unlike the case for the CG solver CHAPTER 3. LINEAR SOLVERS 83

3D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid Solvers 3D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid Solvers 10 10 Jacobi Jacobi SOR SOR SSOR SSOR 1 1 RBSOR RBSOR ILU ILU SIP SIP 0.1 MSI 0.1 MSI MG-Jacobi MG-Jacobi MG-SSOR MG-SSOR 0.01 MG-ILU 0.01 MG-ILU MG-SIP MG-SIP

Residual 0.001 Residual 0.001

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 5 10 15 20 25 30 35 40 0 50 100 150 200 Time Time

3D/Dirichlet: Convergance of CG Solvers 3D/Dirichlet: Convergance of BiCGSTAB Solvers 10 10 CG BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi CG-SSOR BiCGSTAB-SSOR 1 1 CG-ILU BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP CG-MSI BiCGSTAB-MSI 0.1 CG-MG-Jacobi 0.1 BiCGSTAB-MG-Jacobi CG-MG-ILU BiCGSTAB-MG-SSOR CG-MG-SIP BiCGSTAB-MG-ILU 0.01 0.01 BiCGSTAB-MG-SIP

Residual 0.001 Residual 0.001

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 Time Time

3D/Dirichlet: Convergance of GMRES Solvers 3D/Dirichlet: Convergance of Iterative Solvers 10 10 GMRES Jacobi GMRES-Jacobi MSI GMRES-SSOR MG-Jacobi 1 1 GMRES-ILU MG-SIP GMRES-SIP CG GMRES-MSI CG-Jacobi 0.1 GMRES-MG-Jacobi 0.1 CG-MSI GMRES-MG-SSOR CG-MG-SIP GMRES-MG-ILU BiCGSTAB 0.01 GMRES-MG-SIP 0.01 BiCGSTAB-Jacobi

BiCGSTAB-MSI ` ` BiCGSTAB-MG-SIP

Residual 0.001 Residual 0.001 GMRES-MSI GMRES-MG-SIP

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 50 100 150 200 0 5 10 15 20 25 30 35 40

Time Time

s _^ Figure 3.24: Convergence of 3D solvers on a ÷ mesh with Dirichlet boundary conditions.

where it gave a much slower rate of convergence.

¡ GMRES: The GMRES solver typically displayed a monotonic convergence. The heavily pre- conditioned versions with multigrid or incomplete factorisation preconditioners converged with a similar rate to the CG or BiCGSTAB solvers, but the unpreconditioned solver had a very slow convergence rate, comparable to the simple iterative schemes.

Summary

The incomplete LU and simple iterative schemes all exhibited a smooth monotonic reduction in the residual, with the MSI and SIP schemes being up to an order of magnitude faster than the other meth- ods. The multigrid schemes also exhibited a smooth reduction in residual. Whilst the incomplete LU smoothed multigrid schemes were the fastest methods to converge, the performance of the multigrid solver when using one of the simple iterative schemes as a smoother was still faster than the use of the best of the incomplete LU methods without multigrid acceleration. However, for the initial smoothing CHAPTER 3. LINEAR SOLVERS 84

3D/Neumann: Convergance of Simple, Incomplete LU and Multigrid Solvers 3D/Neumann: Convergance of Simple, Incomplete LU and Multigrid Solvers 1 1 Jacobi SOR SSOR 0.1 0.1 RBSOR ILU SIP MSI 0.01 0.01 MG-Jacobi MG-SSOR MG-ILU MG-SIP 0.001 0.001 Jacobi Residual SOR Residual SSOR 0.0001 RBSOR 0.0001 ILU SIP MSI 1e-05 MG-Jacobi 1e-05 MG-SSOR MG-ILU MG-SIP 1e-06 1e-06 0 10 20 30 40 50 60 70 0 100 200 300 400 500 600 700 Time Time

3D/Neumann: Convergance of CG Solvers 3D/Neumann: Convergance of BiCGSTAB Solvers 100 100 CG BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi 10 CG-SSOR 10 BiCGSTAB-SSOR CG-ILU BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP 1 CG-MSI 1 BiCGSTAB-MSI CG-MG-Jacobi BiCGSTAB-MG-Jacobi CG-MG-ILU BiCGSTAB-MG-SSOR 0.1 CG-MG-SIP 0.1 BiCGSTAB-MG-ILU BiCGSTAB-MG-SIP

0.01 0.01 Residual Residual

0.001 0.001

0.0001 0.0001

1e-05 1e-05

1e-06 1e-06 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Time Time

3D/Neumann: Convergance of GMRES Solvers 3D/Neumann: Convergance of Iterative Solvers 1 100 GMRES Jacobi GMRES-Jacobi MSI GMRES-SSOR 10 MG-Jacobi 0.1 GMRES-ILU MG-SIP GMRES-SIP CG GMRES-MSI 1 CG-Jacobi GMRES-MG-Jacobi CG-MSI 0.01 GMRES-MG-SSOR CG-MG-SIP GMRES-MG-ILU 0.1 BiCGSTAB GMRES-MG-SIP BiCGSTAB-Jacobi

BiCGSTAB-MSI ` ` 0.001 0.01 BiCGSTAB-MG-SIP

Residual Residual GMRES-MSI GMRES-MG-SIP 0.001 0.0001

0.0001

1e-05 1e-05

1e-06 1e-06 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 70

Time Time

s a^ Figure 3.25: Convergence of 3D solvers on a ÷ mesh with Neumann boundary conditions. of the Dirichlet boundary problems the incomplete factorisation methods give a faster initial reduction in the residual, behaviour that was thought to be due to their rapid smoothing of the short wavelength components in the residual at the boundary.

The Krylov space methods typically exhibited an irregular convergence, but when coupled with a multigrid preconditioner these solvers gave a smooth regular reduction in the residual. Generally the use of a preconditioner smoothed and speeded the convergence, with the exception of the Jacobi preconditioner, which whilst decreasing the number of iterations taken to converge to the solution, actually increased the overall solution time.

The multigrid-SIP and Krylov-multigrid-incomplete factorisation methods give the best performance, both exhibiting a smooth monotonic reduction in the residual, whilst being faster than the other meth- ods. CHAPTER 3. LINEAR SOLVERS 85

3.2.3 Scaling of the Solvers

The previous section compared the convergence of the iterative solvers for a single mesh size. For a more general comparison of the solvers, the scaling of the speed of the solvers with mesh size is

useful in predicting the performance of the methods for very large and very small problems. QnÅ

The solvers were run for a range of mesh sizes, with a reduction of Æ in the infinity norm of the resid-

Î

È È ual ( Ç ) being used as a convergence tolerance. The solvers were run for the equations resulting from the Dirichlet and Neumann boundary condition problems (see Equations (3.70) and (3.71) and Figure 3.21), with both the 2D and 3D versions of the solvers being tested. The time for solution of the test codes is plotted as a function of mesh size in Figures 3.26 to 3.29, with Figures 3.26 and 3.27 showing the solution time for 2D problems with Dirichlet and Neumann boundary conditions, whilst Figures 3.28 and 3.29 show the solution times for the 3D problems.

Direct Solvers

These solvers run for the same length of time regardless of the boundary conditions – no pivoting was

done so there was a set number and order of operations regardless of the equations being solved. With

I8 K

large systems of equations the time that the dense LU and LDL solvers took to solve had an r

scaling, where 8 was the number of equations. However, from an operation count one expects an

Iš8&s K

r scaling, and no explanation is available for the scaling discrepancy.

I8&q K

For two dimensional problems the banded LU and Cholesky solvers were both r in run time. A

8 ¥ 8 ¥ß[ ŵŊŠjump in the solver runtime is clearly visible at QnŵŊŠand for the 2D banded LU and

Cholesky solvers respectively. This occurs when the solver data exceeded the size of the cache of the Iš8&q 4 K test machine . The block tridiagonal solvers also exhibit r scaling, but run 15 and 4 times faster than the banded LU and Cholesky solvers respectively.

The three dimensional solvers exhibit slightly different behaviour to those solving two dimensional

problems. As might be expected the dense LU and LDL solvers scale identically to the two dimen-

Iš8&s K

sional versions. However the tridiagonal solvers scale like r , as does the banded LU solver.

The direct methods were rarely faster than the iterative schemes. However, for the two dimensional

test problems the block-tridiagonal solver was faster than the simple iterative schemes, and for large

8 Z ŊŊŵŠtwo dimensional systems ( ¥ ) with Neumann boundaries it was faster than the incomplete factorisation schemes.

The Simple Iterative Methods |

For the two dimensional problem the time taken by the simple iterative schemes to solve a system of



› I8&q I8

K K r equations scaled like r , but for the three dimensional test cases they scaled like . For all but the smallest systems the simple iterative methods were the slowest of the iterative methods.

Incomplete Factorisation Methods

For the two dimensional problems the solution time for the incomplete factorisation| solvers scales like

Iš8 K

r for small numbers of equations, but as the numbers of equations increases they started to exhibit



Iš8&q Iš8 ›

K K r

r behaviour. For the three dimensional problem the solvers exhibited behaviour.

8‡Ÿ 8 Ÿ Q¾ÅŠÅŠÅµÅ For small to moderate sized problems ( Q¾ÅŠÅŠÅ for Neumann problems, and for Dirichlet problems) the MSI solver was typically the fastest solver of all the methods tested. 4The effects of cache size are discussed in Chapter 6 CHAPTER 3. LINEAR SOLVERS 86

2D/Dirichlet: Solution Time for Direct Solvers 2D/Dirichlet: Solution Time for Simple, Incomplete LU and Multigrid Solvers 100 100

10 10

1 1

0.1 0.1 Jacobi SOR 0.01 0.01 SSOR Solution Time Solution Time RBSOR ILU 0.001 0.001 SIP LU Decomposition MSI LDL Decomposition MG-Jacobi Banded LU Decomposition MG-SSOR 0.0001 0.0001 Banded Cholesky Decomposition MG-ILU Block Tridiagonal MG-SIP Block Tridiagonal/LDL MG-MSI 1e-05 1e-05 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

2D/Dirichlet: Solution Time for CG Solvers 2D/Dirichlet: Solution Time for BiCGSTAB Solvers 100 100

10 10

1 1

0.1 0.1

0.01 0.01 Solution Time CG Solution Time BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi 0.001 CG-SSOR 0.001 BiCGSTAB-SSOR CG-ILU BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP CG-MSI BiCGSTAB-MSI 0.0001 0.0001 CG-MG-Jacobi BiCGSTAB-MG-Jacobi CG-MG-ILU BiCGSTAB-MG-ILU CG-MG-SIP BiCGSTAB-MG-SIP 1e-05 1e-05 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

2D/Dirichlet: Solution Time for GMRES Solvers 2D/Dirichlet: Solution Time for Solvers 100 100

10 10

1 1

0.1 0.1 LDL Decomposition Block Tridiagonal/LDL 0.01 0.01 Jacobi Solution Time GMRES Solution Time MSI GMRES-Jacobi MG-Jacobi 0.001 GMRES-SSOR 0.001 MG-SIP GMRES-ILU CG-MSI GMRES-SIP CG-MG-SIP GMRES-MSI BiCGSTAB-MSI 0.0001 0.0001 GMRES-MG-Jacobi BiCGSTAB-MG-SIP GMRES-MG-ILU GMRES-MSI GMRES-MG-SIP GMRES-MG-SIP 1e-05 1e-05 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

Figure 3.26: The time taken to solve a two dimensional discretisation of the Laplace equation with Dirichlet boundary conditions. CHAPTER 3. LINEAR SOLVERS 87

2D/Neumann: Solution Time for Direct Solvers 2D/Neumann: Solution Time for Simple, Incomplete LU and Multigrid Solvers 100 100

10 10

1 1

0.1 0.1

Jacobi 0.01 0.01 SOR Solution Time Solution Time SSOR RBSOR 0.001 0.001 ILU LU Decomposition SIP LDL Decomposition MSI Banded LU Decomposition MG-Jacobi 0.0001 0.0001 Banded Cholesky Decomposition MG-SSOR Block Tridiagonal MG-ILU Block Tridiagonal/LDL MG-SIP 1e-05 1e-05 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

2D/Neumann: Solution Time for CG Solvers 2D/Neumann: Solution Time for BiCGSTAB Solvers 100 100

10 10

1 1

0.1 0.1

0.01 0.01 Solution Time CG Solution Time BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi 0.001 CG-SSOR 0.001 BiCGSTAB-SSOR CG-ILU BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP CG-MSI BiCGSTAB-MSI 0.0001 0.0001 CG-MG-Jacobi BiCGSTAB-MG-Jacobi CG-MG-ILU BiCGSTAB-MG-ILU CG-MG-SIP BiCGSTAB-MG-SIP 1e-05 1e-05 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

2D/Neumann: Solution Time for GMRES Solvers 2D/Neumann: Solution Time for Solvers 100 100

10 10

1 1

0.1 0.1 LDL Decomposition Block Tridiagonal/LDL 0.01 0.01 Jacobi Solution Time GMRES Solution Time MSI GMRES-Jacobi MG-Jacobi 0.001 GMRES-SSOR 0.001 MG-SIP GMRES-ILU CG-MSI GMRES-SIP CG-MG-SIP GMRES-MSI BiCGSTAB-MSI 0.0001 0.0001 GMRES-MG-Jacobi BiCGSTAB-MG-SIP GMRES-MG-ILU GMRES-MSI GMRES-MG-SIP GMRES-MG-SIP 1e-05 1e-05 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

Figure 3.27: The time taken to solve a two dimensional discretisation of the Laplace equation with Neumann boundary conditions. CHAPTER 3. LINEAR SOLVERS 88

3D/Dirichlet: Solution Time for Direct Solvers 3D/Dirichlet: Solution Time for Simple, Incomplete LU and Multigrid Solvers 1000 100

100

10 10

1 1

0.1 0.1 Jacobi SOR

Time to Solution 0.01 Time to Solution SSOR RBSOR 0.01 ILU 0.001 SIP LU Decomposition MSI LDL Decomposition 0.001 MG-Jacobi 0.0001 Banded LU Decomposition MG-SSOR Block Tridiagonal MG-ILU Block Tridiagonal/LDL MG-SIP 1e-05 0.0001 10 100 1000 10000 100000 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

3D/Dirichlet: Solution Time for CG Solvers 3D/Dirichlet: Solution Time for BiCGSTAB Solvers

100 100

10 10

1 1

0.1 0.1

Time to Solution CG Time to Solution BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi 0.01 0.01 CG-SSOR BiCGSTAB-SSOR CG-ILU BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP 0.001 CG-MSI 0.001 BiCGSTAB-MSI CG-MG-Jacobi BiCGSTAB-MG-Jacobi CG-MG-ILU BiCGSTAB-MG-ILU CG-MG-SIP BiCGSTAB-MG-SIP 0.0001 0.0001 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

3D/Dirichlet: Solution Time for GMRES Solvers 3D/Dirichlet: Solution Time for Solvers

100 100

10 10

1 1

LDL Decomposition 0.1 0.1 Block Tridiagonal/LDL Jacobi

Time to Solution GMRES Time to Solution MSI GMRES-Jacobi MG-Jacobi 0.01 0.01 GMRES-SSOR MG-SIP GMRES-ILU CG-MSI GMRES-SIP CG-MG-SIP 0.001 GMRES-MSI 0.001 BiCGSTAB-MSI GMRES-MG-Jacobi BiCGSTAB-MG-SIP GMRES-MG-ILU GMRES-MSI GMRES-MG-SIP GMRES-MG-SIP 0.0001 0.0001 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

Figure 3.28: The time taken to solve a three dimensional discretisation of the Laplace equation with Dirichlet boundary conditions. CHAPTER 3. LINEAR SOLVERS 89

3D/Neumann: Solution Time for Direct Solvers 3D/Neumann: Solution Time for Simple, Incomplete LU and Multigrid Solvers 1000

100 100

10 10

1 1

0.1 Jacobi 0.1 SOR Time to Solution 0.01 Time to Solution SSOR RBSOR ILU 0.001 0.01 SIP LU Decomposition MSI LDL Decomposition MG-Jacobi 0.0001 Banded LU Decomposition MG-SSOR Block Tridiagonal 0.001 MG-ILU Block Tridiagonal/LDL MG-SIP 1e-05 10 100 1000 10000 100000 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

3D/Neumann: Solution Time for CG Solvers 3D/Neumann: Solution Time for BiCGSTAB Solvers

100 100

10 10

1 1

0.1 0.1

Time to Solution CG Time to Solution BiCGSTAB CG-Jacobi BiCGSTAB-Jacobi CG-SSOR BiCGSTAB-SSOR 0.01 CG-ILU 0.01 BiCGSTAB-ILU CG-SIP BiCGSTAB-SIP CG-MSI BiCGSTAB-MSI CG-MG-Jacobi BiCGSTAB-MG-Jacobi 0.001 CG-MG-ILU 0.001 BiCGSTAB-MG-ILU CG-MG-SIP BiCGSTAB-MG-SIP

100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

3D/Neumann: Solution Time for GMRES Solvers 3D/Neumann: Solution Time for Solvers

100 100

10 10

1 1 LDL Decomposition Block Tridiagonal/LDL 0.1 0.1 Jacobi

Time to Solution GMRES Time to Solution MSI GMRES-Jacobi MG-Jacobi GMRES-SSOR MG-SIP 0.01 GMRES-ILU 0.01 CG-MSI GMRES-SIP CG-MG-SIP GMRES-MSI BiCGSTAB-MSI GMRES-MG-Jacobi BiCGSTAB-MG-SIP 0.001 GMRES-MG-ILU 0.001 GMRES-MSI GMRES-MG-SIP GMRES-MG-SIP

100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 Number of Equations Number of Equations

Figure 3.29: The time taken to solve a three dimensional discretisation of the Laplace equation with Neumann boundary conditions. CHAPTER 3. LINEAR SOLVERS 90

Multigrid Schemes

| |

For the two dimensional test cases the time taken by the multigrid methods to solve the system| of

  

Iš8 ™ I8 s

K K r

equations scaled like r for the Dirichlet boundary condition problem, and for the

 

I8 ™ K

problem with Neumann boundaries. With the three dimensional test case the scaling was r

for both boundary conditions.

8 Ÿ

For small problems the multigrid scheme are typically the slowest, for QnŵŠbeing slower than

the slowest of the direct methods. However their superior scaling means that for large problem sizes

8 Z ŊŵŊŠ( ¥ ) the multigrid solvers are the fastest of the methods studied.

Krylov Space Methods

The Krylov space solvers are the fastest of the methods for moderate sized problems, with either the

8 ö

preconditioned conjugate gradient or BiCGSTAB solvers being fastest for QnŵŊŵŠwhen coupled with the MSI or SIP smoothed multigrid preconditioners. For larger problems the slightly superior scaling of the multigrid solvers ensures that they become faster. The multigrid-preconditioned Krylov space methods scaled better than methods that used other preconditioners, due to the better scaling of

multigrid schemes as a whole.

| | ¡

CG: For the two dimensional test case| the time taken by the conjugate gradient method to solve

 

I8 s Iš8

K K

r r

| |

the system of equations scaled between and Æ , with the multigrid precondi-



I8 s

K r

tioned | solvers scaling like and the other preconditioners and the unpreconditioned

 

I8 I8 ™

K K

r r

solver scaling Æ . For the Neumann boundaries the scaling was of order and



| |

I8

K r

Æ for the multigrid-preconditioned and other solvers respectively.



 

q

Iš8 I8

K K r

For the three dimensional test cases the scaling was r and for the general and |

multigrid preconditioned versions of the solver respectively. |



 

I8 I8

K K

¡

r r

| |

BiCGSTAB: For the two dimensional tests the scaling was Æ and for the gen-

eral and multigrid-preconditioned codes, and with the three dimensional problems the scaling

 

q

I8 Iš8

K K r was of r and .

¡ GMRES This solver exhibited similar scaling to its CG and BiCGSTAB brethren. Generally it was slower than those two solvers.

Summary \

For small problems (where the number of equations were less than ŵŊŠ) the MSI solver, an incom- plete factorisation method, was typically the fastest method to solve the equations. For moderate sized

problems (where the number of equations was of the order of Q¾ÅŠÅŠÅµÅ ) the CG and BiCGSTAB Krylov space method were fastest when coupled with either the MSI or SIP preconditioners or the multigrid

preconditioners that use MSI or SIP smoothing. For large systems (where the number of equations

[ |

exceeded ŊŊŵŠ) the multigrid solvers were the fastest.

|

 

Iš8 ™ K

The solution times for the multigrid methods scaled like r for large sets of equations, unlike



™ I8 I8&q

K K r the other iterative schemes for which the solution time generally scaled between r to . For preconditioned Krylov space methods the scaling was somewhere between the scaling of the Krylov method and that of it’s preconditioner.

The direct methods were typically the slowest of the methods tested. However it is noteworthy that Z

for two dimensional systems with Neumann boundary conditions and more than ŵŊŵŠequations the block tridiagonal solver was faster than the incomplete factorisation schemes such as MSI. The

CHAPTER 3. LINEAR SOLVERS 91



I8 K

solution time for the direct LU solver exhibited r scaling of the solution time, whilst the block

q s

Iš8 Iš8

K K r tridiagonal solver had r and scaling for the 2D and 3D cases respectively.

3.2.4 Memory Usage

The number of words of memory used by the solvers is shown in Figure 3.30. The storage shown

¢ £ § 8

includes the storage of the equations, solution, and right hand side ( , and ) which are ^ and µ8 ] for the 2D and 3D finite volume discretisations used.

Memory Usage for 2D Linear Solvers Memory Usage for 3D Linear Solvers 1e+09 1e+09

1e+08 1e+08

1e+07 1e+07

1e+06

1e+06 b b 100000

100000 10000 LU Decomposition LU Decomposition

Memory Usage (Words) Block Tridiagonal Memory Usage (Words) Block Tridiagonal Jacobi 10000 Jacobi 1000 MSI MSI Multigrid-Jacobi Multigrid-Jacobi Multigrid-SIP Multigrid-SIP 1000 100 CG CG BiCGSTAB BiCGSTAB GMRES GMRES 10 100 1 10 100 1000 10000 100000 1e+06 1e+07 10 100 1000 10000 100000 1e+06 1e+07 Number of Equations Number of Equations

Figure 3.30: The memory usage in words of the 2D (left) and 3D (right) linear solvers.

The simple iterative schemes such as SOR used the least amount of memory, with SOR, RBSOR and SSOR not requiring any memory above the storage requirements of the equations, and Jacobi requiring only 8 words of memory. The simplest of the incomplete factorisation methods also are efficient in memory use, with the Incomplete Cholesky and ILU methods both only requiring Z8

words of memory above the storage of the equations. However the SIP and MSI solvers require more

8 8 8 àµ8

Q

ø ø memory, using ÷ and words of memory for the 2D versions, and and in 3D.

The Krylov space methods similarly vary in their storage requirements. The unpreconditioned CG

solver is the most efficient requiring [µ8 words of memory above the storage of the equations, with 8

the unpreconditioned BiCGSTAB solver requiring ÷ words of memory. The GMRES solver is

8 8&q I 8 8 K

comparatively greedy in its memory usage, requiring Q r r words of memory, where r % is the recurence length of the solver. The recurence length% should be as large as possible, but the fact

that memory is finite will set an upper limit available. For the data in Figure 3.30 a recurence length \ of Å was used. If a preconditioner is used the above figures must be increased to account for any storage required by the preconditioner.

The memory usage of the multigrid solvers depends both on the number of equations and also upon

the number of grid levels used. With being the number of equations on a particular grid level, the

[ \ ]

ã Q

2D multigrid solver requires , Q and words of memory for that level for the Jacobi/ILU, SIP

Z[

ã ã ã

QµQ Q

and MSI versions of the solver respectivly. For the 3D solver the requirements are , ^ and .

Zq

ã ã For each restriction from a fine to a coarse mesh the number of equations decreases byã a factor of

for the 2D solver and Zs for the 3D solver, so the number of points on all meshes is given by the series

   

I I 8 8

K K

  

 Q

Q for the 2D solver and for the 3D solver. Truncating both of these

% % % % "]"^" "^"]" Æ

series at threeÆ terms gives an approximate estimate of the overall memory requirement for the solvers

8 8 ] 8

QŠQ Q Q

^ ^

as ø , and for the Jacobi/ILU, SIP and MSI versions of the 2D multigrid solver, and

Z \8 ] àµ8 Z Z8

" "

Q Q

, and ÷ for the Jacobi/ILU, SIP and MSI versions of the 3D solver.

" " " Finally the direct methods have a much larger memory footprint than the iterative schemes. The na¨ıve LU decomposition scheme uses 8&q words of memory, whilst the memory usage of the block tridiagonal methods depends on the relative axis lengths of the mesh upon which the finite volume equations have been discretised. For the worst case, a square or cubic mesh, the 2D block tridiagonal

CHAPTER 3. LINEAR SOLVERS 92 8&™6˜s solver uses 8&s6˜Cq words of memory, whilst the 3D solver uses words. If the meshes are much shorter along one axis however the memory usage is reduced.

3.3 Conclusions

A number of linear solvers suitable for the solution of elliptic PDEs have been described, and tested on the equations arising from a two and three dimensional finite volume discretisation of the Laplace equation. Comparisons of the solvers have been made in terms of the speed to solve the systems, and

the memory used. \

For small problems (where the number of equations is less than ŊŊŠ) the MSI (Modified Strongly Implicit) solver, an incomplete factorisation method, is typically the fastest method to solve the equa-

tions. For moderate sized problems (where the number of equations is of the order of Q¾ÅŠÅŠÅµÅ ) the CG (Conjugate Gradient) and BiCGSTAB (Bi-Conjugate Gradient Stabilised) Krylov space methods become fastest when coupled with either the MSI or SIP (Strongly Implicit Proceedure) precondition-

ers or the multigrid preconditioners that use MSI or SIP smoothing.

|

Z ŊŊŵÅ

For large systems (where the number of equations exceeds ) the multigrid| solvers become the

 

Iš8 ™ K

fastest. The solution times for the multigrid methods scale like r for large sets of equations,



I8 ™ Iš8&q

K K r unlike the other iterative schemes for which the solution time scales like r to .

The direct methods were typically the slowest methods tested. However in one case (two dimensional Z

systems with more than ŊŊŵŠequations and Neumann boundary conditions) the block tridiagonal solver was faster than the simple iterative schemes such as SOR (Successive Over Relaxation) and the incomplete factorisation schemes such as MSI.

In terms of memory usage the most efficient solvers are the simple iterative methods such as SOR family of solvers. However none of the iterative methods are especially greedy in terms of memory

use, with the notable exception of the GMRES (General Minimalised Residual) Krylov space solver

Iš8 K

which required the storage of sucessive iterates of the solution, and all required approximatly r

words of memory. The direct methods however use more memory than the iterative methods, with

Iš8&q Iš8&sC˜Cq Iš8&™C˜Cs

K K K

r r their memory requirement scaling as r for LU decomposition and up to or for the 2D and 3D block tridiagonal methods.