TOWARDS A COST-EFFECTIVE ILU PRECONDITIONER WITH
HIGHER LEVEL FILLS
z
y z
E. F. D'AZEVEDO , P. A. FORSYTH AND WEI-PAI TANG
Abstract. A recently prop osed Minimum Discarded Fill (MDF ) ordering (or pivoting) technique
is e ective in nding high quality ILU (`) preconditioners, esp ecially for problems arising from unstruc-
tured nite element grids. This algorithm can identify anisotropy in complicated physical structures
and orders the unknowns in a \preferred" direction. However, the MDF ordering is costly, when `
increases.
In this pap er, several less exp ensive variants of the MDF technique are explored to pro duce cost-
e ective ILU preconditioners. The Incomplete MDF and Threshold MDF orderings combine MDF
ideas with drop tolerance techniques to identify the sparsity pattern in the ILU preconditioners. These
techniques pro duce orderings that encourage fast decay of the entries in the ILU factorization. The
Minimum Up date Matrix (MUM ) ordering technique is a simpli cation of the MDF ordering and is
an analogue of the minimum degree algorithm. The MUM ordering is esp ecially e ective for large
matrices arising from Navier-Stokes problems.
Key Words. minimum discarded ll(MDF ), incomplete MDF , threshold MDF , minimum up-
dating matrix(MUM ), incomplete factorization, matrix ordering, preconditioned conjugate gradient,
high-order ILU factorization.
AMS(MOS) sub ject classi cation. 65F10, 76S05
1. Intro duction. The use of Preconditioned Conjugate Gradient (PCG ) metho ds
has proven to b e a robust and comp etitive solution technique for large sparse matrix
problems [1, 22, 28]. A vital step for the successful application of PCG metho ds is the
computation of a high quality preconditioner. Many previous studies have explored this
topic and their various approaches can b e summarized as follows:
When the incomplete LU (ILU ) preconditioner was rst prop osed it was ob-
served that as the ll level ` in an ILU decomp osition increases, the quality
of the ILU (`) preconditioner improves [1, 19, 22, 25, 29]. Unfortunately, the
resulting reduction in the numb er of iterations cannot comp ensate for the in-
creased cost of the factorization, and forward and backward solve in each it-
eration. The most cost-e ective ll level is ` = 1; 2 in most cases. For some
problems, the high-level ll entries in the ILU decomp osition are numerically
very small and contribute little to the quality of the preconditioner. This ob-
servation leads naturally to the improved technique describ ed b elow.
Based on this observation with the high-level- ll approach, a drop tolerance
technique was investigated by Munksgaard [30] and Zlatev [37]. The strategy
This work was supp orted by the Natural Sciences and Engineering Research Council of Canada,
by the Information Technology Research Centre, which is funded by the Province of Ontario, and by
the Applied Mathematical Sciences subprogram of the Oce of Energy Research, U.S. Department of
Energy under contract DE-AC05-84OR21400 with Martin Marietta Energy Systems, Inc., through an
app ointment to the U.S. Department of Energy Postgraduate Research Program administered by Oak
Ridge Asso ciated Universities.
y
Mathematical Sciences Section, Oak Ridge National Lab oratory, Oak Ridge, Tennessee 37831.
z
Department of Computer Science, University of Waterlo o, Waterlo o, Ontario, Canada N2L 3G1. 1
for deciding the sparsity pattern of a preconditioner was to discard all \small"
ll entries that are less than a given tolerance. This approach works well for
the mo del Laplace problem. However, it was so on noticed that if the original
ordering was \p o or", then the decay of the ll entries in the decomp osition
was very slow, sometimes leading to an unacceptably dense decomp osition.
Consider the following problem,
! !
@ @P @ @P
(1) K + K = q
x y
@ x @ x @ y @ y
with a 5-p oint discretization on the regular grid, where K , K > 0. When
x y
K K , the drop tolerance approach will pro duce a very dense precondi-
x y
tioner, if a natural row ordering is used. In contrast, a cost-e ective precon-
ditioner is yielded using the same approach, when K K [8]. Interesting
y x
results concerning the e ects of ordering on the p erformance of PCG were
rep orted recently in [8, 14, 12, 13, 15, 31].
Greenbaum and Ro drigue [18] addressed the problem of computing an optimal
preconditioner for a given sparsity pattern. The values of the nonzero elements
in the preconditioner were determined by numerical optimization techniques
for a given sparsity pattern. Kolotilina and Yeremin investigated some least
squares approximations for blo ck incomplete factorization [24]. Their results
provide insights of a theoretical nature, but have limited practical application.
Numerous sp ecial techniques that pro duce go o d preconditioners for elliptic
problems and domain decomp osition metho ds have also b een rep orted [3, 4, 20,
23]. These approaches are very successful for applying PCG to the solutions
of elliptic PDE's. However, they are dicult to generalize to Jacobians arising
from systems of PDE's and to general sparse matrix problems.
In summary, the following factors are observed to have a signi cant impact on the
quality of a preconditioner:
1. The ordering of the unknowns in the original matrix A.
2. The sparsity pattern of the preconditioner M .
3. How closely the sp ectrum of M resembles that of A.
Motivated by the signi cant e ect of ordering on the preconditioner, we have pro-
p osed the Minimum Discarded Fill (MDF (`)) ordering (or pivoting strategy) [8] for
general sparse matrices. This ordering technique is e ective in nding high quality
ILU (`) preconditioners, esp ecially for problems arising from unstructured nite ele-
ment grids. This algorithm can identify anisotropy in complicated physical structures
and orders the unknowns in a \preferred" direction. Numerical testing of the MDF (`)
metho d shows that it yields b etter convergence p erformance than some other orderings.
The MDF (`) ordering is successful b ecause it takes into account the numerical values of
matrix entries during the factorization pro cess, and not just the top ology of the mesh.
However, the MDF (`) ordering may b e costly to pro duce. A rough estimate of the
3
cost of MDF (`) ordering is N d , where d is the average numb er of nonzero elements in 2
t 1
each row of the ll matrix, L + L , and N is the size of the original matrix . Thus, if
the average numb er of nonzero in each row is large, the MDF (`) ordering is practical
only when similar matrix problems need to b e solved several times, such as solving the
Jacobian matrices in a Newton iteration [5, 7].
In this pap er, several variants of the MDF ordering algorithm for use with a drop
tolerance incomplete factorization are investigated. Section 2 contains a detailed de-
scription of these ordering techniques. Test problems are describ ed in Section 3 and the
numerical results and discussion are provided in Section 4, with our concluding remarks
in the last section.
Results from applying the MDF algorithm and its variants to unsymmetric matrices
derived from a system of PDE's will b e discussed in another pap er [5].
2. Algorithms .
2.1. ILU (`) factorization. Let us de ne the nonzero sparsity pattern of a matrix
C as the the set
(2) NZ [C ] = f(i; j ) j c 6= 0g :
ij
~
Given a sparsity pattern P , denote C := C [P ] to b e the matrix extracted from C with
sparsity pattern de ned by P ,
(
c if (i; j ) 2 P ;
ij
(3) c~ =
ij
0 otherwise.
Apply an Incomplete LU (ILU ) factorization to a sparse matrix problem Ax = b.
2
After k steps of the factorization, we have the following (incomplete LD U ) decomp osition ,
" #" #" #
t
L 0 D 0 U Q
k k k
k
(4) A ! ;
P I 0 A 0 I
k n k k n k
where L (U ) is k k lower (upp er) unit triangular, D is k k diagonal matrix, P and
k k k k
Q are (n k ) k , I is the (n k ) (n k ) identity, and A is the (n k ) (n k )
k n k k
submatrix remaining to b e factored. Let A = A and G = (V ; E ), k = 0; 1; ; n 1
0 k k k
b e the graphs [32] of A , i.e.
k
o n
(k )
(5) 6= 0; i; j > k ; : V = fv ; : : : ; v g ; E = (v ; v ) j a
k k +1 n k i j
ij
Some new ll entries in A may b e created in the factorization pro cess, yet many lls
k
are also discarded. A common criterion for discarding a new ll is by its ll level. The
notion of \ ll-level" can b e de ned through reachable sets and the shortest path length
in the graph G [17]. Let S : fv ; : : : ; v g b e the set of the eliminated no des at step
0 k 1 k
k , S V , and cho ose no des u; v 62 S . No de u is said to b e reachable from vertex v
k 0 k
through S if there exists a path (v ; u ; : : : ; u ; u) in graph G , such that each u 2 S ,
k i i 0 i k
m
1
j
1
To simplify notation, a case of symmetric nonzero structure is presented here.
2
we assume the nal elimination sequence is v ; : : : ; v .
1 n 3
1 j m. Note that m can b e zero, so that any pair of adjacent no des u; v 62 S is
k
reachable through S . The reachable set of v through S is denoted by
k k
(6) Reach (v ; S ) := fu j u is reachable from v through S g :
k k
Let v , v 62 S and v 2 Reach (v ; S ) with the shortest path (v ; u ; : : : ; u ; v ).
i j k j i k i i i j
m
1
Then the shortest path length from v to v through S is de ned m. For convenience,
i j k
we de ne the path length b etween v and v through S to b e 1, if v 62 Reach (v ; S ).
i j k j i k
(k )
Then a ll level level for ordered no de pair (v ; v ) in the incomplete eliminated graph
i j
ij
3
G can b e de ned as :
k
(
m; if a is an accepted ll,
(k )
ij
= (7) : level
ij
1; otherwise
where m is the shortest path length from v to v through S and i, j > k . It is clear
i j k
that
(
0; if a 6= 0,
(k )
ij
(8) = level
ij
1; otherwise
since S = ;.
0
As the elimination pro ceeds, there may b e a shorter path b etween v ; v through
i j
the new set of eliminated no des. Hence the ll-levels can b e up dated :
(k )
k 1 k 1 k 1
(9) level = min Level + Level + 1; Level : :
ij
ik k j ij
In an ILU (`) factorization, only ll entries with ll-level less than or equal to ` are
kept. More precisely, consider again the k -th step in an ILU (`) factorization of A,
" # " #" #" #
(k 1) (k 1) (k 1)
t t
1 0
a a 0 1 =a
k k k k k k
A = = (10)
k 1
(k 1)
=a I B 0 C 0 I
n k k 1 k n k
k k
where
(k 1)
t
(11) C = B =a :
k k 1
k k
(k )
Fill-level level for the no de pair (v ; v ), i, j > k can b e deduced from (9). Let D
i j k
ij
b e the sparsity pattern that contains all nonzero elements in C having ll-level higher
k
than `, the given level for the ILU decomp osition,
o n
(k )
(12) > `; i; j > k : D = (i; j ) j level
k
ij
3
If the order of the unknowns is predetermined b efore the incomplete factorization, a ll level
which is indep endent to k can b e intro duced. However, in this particular study, the pivoting order is
determined dynamically during the computations. Therefore, it is not advantageous to de ne such a
level. 4
The nonzero elements corresp onding to D will b e discarded after this elimination step
k
and
o n
(k )
(13) `; where i; j > k NZ [C ] D = (i; j ) j level
k k
ij
is the accepted sparsity pattern in the ILU (`) decomp osition. The ILU (`) decomp osi-
tion pro cess at the next step will op erate on a truncated matrix of
(k 1)
t
(14) A = B =a F = C F ; F = C [D ] ;
k k 1 k k k k k k
k k
where F = C [D ] is the discarded ll matrix at step k .
k k k
The following theorem characterizes the elimination graph G pro duced by an
k
ILU (`) factorization.
Theorem 2.1.
o n
(k )
` (v ; v ) j v 2 Reach (v ; fv ; : : : ; v g) and level E =
i j j i 1 k k
ij
where v ; v ; : : : ; v are previously eliminated nodes.
1 2 k
Proof. If ` = 0, no new ll is intro duced in the ILU (0) decomp osition, hence all
entries have ll-level zero. Therefore G is the subgraph of G induced by fv ; : : : ; v g.
k 0 k +1 n
The theorem holds trivially in this case.
For ` 1, an induction on k can b e used to prove this theorem.
In Section 1, we intro duced a problem which demonstrates very di erent conver-
gence b ehaviors if di erent anisotropy is used. Here we intro duce two decay estimates
for the entries of the ILU decomp ostion presented in [9]. These estimates explain the
causes of the di erent convergence b ehaviors and demonstrate the need for developing
some ordering (or pivoting) strategies which will b e able to encourage a fast decay in
the ILU decomp ostion.
(k )
First, a lower b ound for the entries a of A in ILU (`) decomp osition of an M -
k
ij
matrix A exists.
(k )
are the l l entries in A and ` Theorem 2.2. Let A be an M -matrix, a
k
ij
(k )
level 1, and if
ij
(v ; v ; : : : ; v ; v ); v ; : : : ; v (15) 2 S : fv ; : : : ; v g
i i i j i i k 1 k
m m
1 1
is a path from v to v through S , then
i j k
ja a a j
ii i i i j
(k )
m
1 1 2
a ; where d = a :
i ii
ij
d d : : : d
i i i
m
1 2
If ` = 1, then this theorem also provides a lower b ound for LU decomp osition.
For the problem (1), the resulting matrix is an M -matrix. If K K , all new
x y
higher level ll entries of the factorization will have a path for which all edges lie in the
x-direction but at most one lies in the y direction. From Theorem 2.2, and since the
(j 1) (j 1)
, entries l will have a very slow =a entries in L = fl g of (4) are given by l = a
ij ij ij
j j ij
i 1
decay rate with the ll level Level . Thus, nearly all ll entries in the decomp osition
ij 5
Table 1
Comparison of two problems
Problem Nonzeros in L Iterations Fact. time Sol. time
K = 100, K = 1 10330 17 0.15 0.20
x y
K = 1, K = 100 2705 13 0.04 0.11
x y
will b e kept, resulting in an unacceptably dense factorization, when a drop tolerance
technique is used.
In contrast, if K K in problem (1), we have the following estimate of the
y x
entries l in the LU decomp osition for that problem,
ij
Theorem 2.3.
!
1
(16) ; j = 0; 1; ; n 1 (i 1) mo d n; jl j = O
i;i n+j
j
(K =K )
y x
and i > n;
!
1
jl j = O (17) ; j = 0; 1; ; (i 1) mo d n;
i;i j
j
(K =K )
y x
T
where l are the entries of the LD L factorization of the matrix A in problem (1).
ij
i 1 i 1
Notice that j = Level , in (16) and j + 1 = Level in (17), resp ectively. It is
i;i n+j i;i j
obvious that l decay rapidly with the ll level. A much sparser incomplete factorization
ij
will b e resulted, if the same drop tolerance is used. Our numerical results can provide a
more clear picture of these estimate. We use a 30 30 regular grid in the computations.
A new ll entry is dropp ed if
(k )
j 0:001 min(ja j; ja j) : ja
ii j j
ij
Table 1 summarizes the e ects of di erent anisotropy on the quality of the precon-