Domain decomposition methods

1 Introduction

Domain decomposition techniques are efficient tools for solving large problems in analysis. Despite their usefulness in dividing large computational tasks in the framework of traditional computer architecture, these methods are particularly attractive regarding parallel computations. These techniques are based on splitting the analysis domain Ω into a number NSD of subdomains Ωs (figure 1) such that:

NSD[ Ω = Ωs (1) s=1

2 Γ2 Γ 4 1 2 Γ 2 2 1 2 Γ3 Γ3 Ω 4 1 Γ 4 Ω 3 Ω 3 Γ3 1 3 Γ1 Ω Γ Γ 3 1 1 3 Γ2 Figure 1: Non overlapping domain decomposition

There are several ways of performing domain decompositions. A first division arises whether the anal- ysis domain is decomposed in overlapping subdomains (Schwarz methods)(Figure 2) or in non overlapping subdomains (i.e. ) (Figure 1). The latter has been widely used in structural mechanics.

overlappingregion

2 W 1 W

Figure 2: Overlapping domain decomposition

A second one is related to the technique. A number of nodal points are defined in the domain if the finite difference method is employed. In the finite element method, a number of nodal points are defined, but also a number of finite elements connecting those nodes. Thus in finite difference methods it may be desirable a partition of the domain like that of figure 3 where nodes are allocated on a given subdomain. This means that there are no nodes on the subdomain boundaries and it is referred to as node-based partition. In finite element methods, a partition where elements be fully contained in a given subdomain is search: an element-based partition (Figure 4). This means that some nodes, the interface nodes, will be located on the subdomain interfaces, and that no elements will be shared among two subdomains.

1 Figure 3: Node-based partition of a finite difference grid

Figure 4: Element-based partition of a finite element mesh

2 The Schur Complement

In this section the finite element method will be supposed to be used for discretizing the continuum problem. Reference will be made to linear elasticity problems where the primal discretized variables are displacements (compatible finite element models). s s Let Ω denote each subdomain and Γi (i = 1, 3) their boundaries (figure 1). Γ1 states for boundaries with kinematical (Dirichlet) conditions , Γ2 for those with mechanical (Neumann) conditions and Γ3 for frontiers with other subdomains. Applying the usual displacement based finite element method to solve a static mechanical problem the following system is get

Ku = f (3) where u is a vector containing the discretized unknowns, in this case nodal displacements; f is a vector with the discretized dual variables (nodal forces); and K is the stiffness matrix. s s s At each subdomain (comprising Ω and Γ3) the subdomain stiffness matrix K , displacement vector us and force vector f s may be constructed. We can realize partitioning these matrices into a group of internal d.o.f. ˚us and a group of interface d.o.f. ¯us. The subdomain stiffness matrix may be written · ¸ K˚s K~ s Ks = (4) K~ s,T K¯ s

2 and the displacement and force vectors · ¸ · ¸ ˚us ˚f s us = and f s = (5) ¯us ¯f s

The upper symbol ˚2 means internal d.o.f., 2¯ means interface d.o.f. and 2~ refers to interaction between internal and interface d.o.f.. On the other hand, if we collect the contribution of all subdomains to the interface d.o.f., we can write

NSD s KI = A K¯ (6) s=1

as the assembled stiffness matrix for the whole interface problem, uI being the displacement vector associated to it. Matrices with subscript I have the size of the whole interface problem. Recalling the whole problem, the stiffness matrix may be written   K˚1 0 ... 0 ... K~ 1  I   0 K˚2 ... 0 ... K~ 2  I   . . . . .   ......  K =   (7)  0 0 ... K˚s ... K~ s  I   . . . . .   ......  ~ 1,T ~ 2,T ~ s,T KI KI ... KI ... KI

and the vectors     ˚u1 ˚f 1  2  ˚2  ˚u   f      ...  ...  u =  s  and f =  s  (8) ˚u  ˚f  ...  ...  uI fI The equilibrium equation (3) may be partitioned into the following systems  ˚s s ~ s ˚s  K ˚u + KI uI = f , s = 1,NSD NSDX (9) ~ s,T s  KI ˚u + KI uI = fI s=1 Performing gaussian elimination by blocks:  ˚s s ˚s ~ s  K ˚u = f − KI uI , s = 1,NSD NSD NSD X −1 X −1 (10) ~ s,T ˚s ~ s ~ s,T ˚s ˚s [KI − KI (K ) KI ]uI = fI − KI (K ) f s=1 s=1 The coefficient matrix of the second equation in (10):

NSD X −1 ~ s,T ˚s ~ s S = KI − KI (K ) KI (11) s=1 is known as the Schur complement matrix or capacitance matrix. The first group of equations in (10) represents the system of equations for the internal unknowns ˚us at each subdomain, resulting from the non penetration character of the decomposition. This part of the solution is perfectly parallelizable. The size of these problems is given by the granularity of the domain decomposition. The second part of (10) represents the interface problem. The size of the Schur complement S is usually much smaller than that of the global matrix K, but it is also more dense than K. The condition number of S is usually much smaller than that of K. On the other hand matrix S may not be explicitly assembled and the solution performed in a subdomain-wise fashion. This part of the solution is coupled for the whole problem. The efficiency of the global solution is highly tightened to the efficiency of the solution for the interface problem.

3 3 Direct Solution Method

A way of solving problem (10) is by means of direct methods. The interface problem (second equation in (10) ) is solved in the first place. The Schur complement (11) is computed. It requires that for each subdomain the matrix for internal unknowns K˚s be factorized. Then a substitution is performed using ~ s ~ s,T as rhs vector each column of matrix KI . The matrix so obtained is pre-multiplied by KI and summed over the number of subdomains. This completes the second term of eq. (11). In a similar fashion the second term of the second equation in (10) is formed: solving the internal ˚s ~ s,T problem with f as rhs, pre-multiplying the vector obtained by KI and summing over NSD. This completes the second term of the rhs.

Once the interface problem (10-b) is solved uI is used to solve the internal problem at each subdomain (eq. 10-a). The algorithm to solve the problem by a direct method is given in Table I.

Table I: Solution by direct methods

1. Factorize K˚s, for s = 1,NSD

2. By substitution compute: −1 s ˚s ~ s A = K KI −1 Bs = K˚s ˚f s

3. Compute: NSD X −1 ~ s,T ˚s ˚s gI = fI − KI (K ) f s=1 4. Compute: NSD X −1 ~ s,T ˚s ~ s S = KI − KI (K ) KI s=1 5. Solve

S uI = gI 6. For each s = 1,NSD compute s s s ˚u = B − A uI

If the problem is well posed, the global matrix K is nonsingular. Each of the K˚s, as well as the Schur complement S are also non singular. Furthermore, if matrix K is symmetric and positive definite, as happens in linear elastic problems, so is S. The explicit construction of the Schur complement is no practical for large problems.

4 Iterative Schur Complement Methods

The problem (10) may be seen as being solved in two steps: an interface problem and an internal one. A popular strategy is to solve the internal problem (10-a) by direct methods and the interface problem (10-b) by iterative ones. Direct methods are preferred for (10-a) since they lead to close solutions and no error propagation is introduced to the interface problem. Direct methods, however, are not suitable for the interface problems due to their large storage re- quirements. Matrix S is usually full and expensive to construct. Some applications of direct methods are reported in the literature, but they are mainly concerned with special cases (e.g. slender structures)

4 where the interface problem size is limited. (Farhat and Crivelli, 1989; Lessoine et al, 1991). Iterative methods perform well for the interface problem, in particular conjugate gradient techniques with pre- conditioning. As it was already said, the condition number for the Schur complement is less than for 1 1 the whole stiffness matrix. In the case of a Laplace problem, cond(K) = O( h2 ) while cond(S) = O( h ). Domain decomposition can be seen as a way of preconditioning the whole problem.

5 Solution of the interface problem

5.1 The Preconditioned Conjugate Gradient Method

We will focus on the solution of the interface problem (10-b). The interface d.o.f. matrix KI may be written NSDX s KI = KI (12) s=1 and the Schur complement NSDX S = Ss (13) s=1 with s s ~ s,T ˚s −1 ~ s S = KI − KI (K ) KI (14) Eq. 13 shows that contributions of each subdomain to the matrix S may be computed independently. Equation 10-b is rewritten S uI = gI (15) We will now discuss the solution of (15) via Preconditioned Conjugate Gradient methods. The algorithm is shown in Table II for a generic linear system

Ax = b (16)

The most time consuming parts of the process are the matrix-vector products at steps A.2 and B.1, and the solution of the linear equation systems (LES) at A.3 and B.7 (preconditioning). The rest of the operations on vectors with the size of the global interface problem are dot product and SAXPY operations (Sum of Alpha X Plus Y). Preconditioning is of paramount importance regarding the performance of a Conjugate Gradient algorithm. Among the popular preconditioners we can mention:

1. Jacobi or diagonal scaling The preconditioning matrix P is P = diag(A) it is the most simple preconditioner not requiring to solve any LES. It is effective if A is diagonal dominant. 2. Block Jacobi In this case P is block diagonal. Solving the LES requires invertion (or factorization) of each block matrices. It is tractable if blocks are small. 3. Incomplete Cholesky factorization In linear elastic problems the stiffness matrix is symmetric, positive definite, and sparse within a bandwith. In such case factorization of the matrix is usually made by Cholesky method leading to

A = CTC

being C an upper triangular matrix. A preconditioner constructed from incomplete factorization is of the form P = C˜ TC˜

5 Table II: Preconditioned Conjugate Gradient Algorithm

A. Initialization A.1 x initial guess A.2 r = b − Ax matrix-vector prod + sum vectors A.3 solve Pz = r system solution A.4 ρ = (r, z) dot product A.5 ρ0 = ρ A.6 p = z A.7 k = 1 B. Iterations: while k < Kmax do

B.1 Convergence test: if ρ < T ol ρ0 end iterations B.2 a = Ap matrix-vector prod. B.3 m = (p, a) dot product ρ B.4 α = m B.5 x = x + αp SAXPY B.6 r = r − αa SAXPY B.7 solve Pz = r system solution B.8 ρold = ρ B.9 ρ = (r, z) dot product B.10 γ = ρ ρold B.11 p = z + γp SAXPY B.12 k = k + 1, go to B.1

where C˜ is the restriction of C to the sparsity of A. That is, no fill-in is performed out of the skyline of A. This preconditioning may lead to zero pivots. Modifications such as Shifted Incomplete Cholesky Factorization have been introduced to eliminate this drawback (ref. Manteuffel, 1979). 4. Element by Element (EBE) (Winget and Hughes, 1985) By making use of two decomposition of the matrix (additive and multiplicative) this method con- structs a preconditioner from the elemental matrices.

Domain decomposition results itself in a good preconditioner for a global Conjugate Gradient solution. As limiting cases it performs as a direct global method if the number of subdomains tends to one (i.e. the whole structure as a subdomain), or as a Conjugate Gradient global solution with EBE preconditioner when the number of subdomain tends to the number of elements. Domain decomposition or substructure- by-substructure preconditioners are reported to be more efficient than EBE preconditioner (Nour-Omid, 1993). Some methods for efficient solution of the interface problem within a domain decomposition frame are referred to in the subsequent sections.

5.2 The Dirichlet-Neuman Method (Bjørstad and Widlund, 1986)

The most time consuming part in the algorithm of Table II are the matrix-vector products (A.2 and B.3) and the system solutions (A.3 and B.7). We will consider these operations in the frame of a domain decomposition. The matrix-vector product in B.1 (also in A.3) may be written

a = Sp (17)

6 S being the Schur complement and, owing to (13),

NSDX NSDX a = as = Ssp (18) s=1 s=1

that is, the contribution to a is computed separately at each subdomain. At subdomain s we have (see (14)) −1 s s ~ s,T ˚s ~ s S p = [KI − KI (K ) KI ]p (19) and considering the subdomain problem: · ¸ · ¸ · ¸ K˚s K~ s vs 0 = (20) ~ s,T ¯ s s K KI p a The contribution of subdomain s to the vector in (18) is

s ~ s,T s s a = K v + Ki p (21)

where vs is the solution of K˚svs = −K~ sp (22) Equations (20) to (22) show that to compute the matrix-vector product (18) it suffices to solve at s each subdomain a Dirichlet problem where prescribed values p are imposed on the interface Γ3, and the associated force vector as is obtained. Finally the contributions as of each subdomain are added together. For the preconditioning phase (B.7 and A.3, in Table II) the following procedure is followed (De Roeck s and Le Tallec, 1991). At each subdomain a matrix DI is defined such that

NSDX s DI = II (23) s=1

s That is to say that by assembling the matrices DI of all subdomains the identity matrix is get on the s global interface space. The simplest choice for DI is a diagonal matrix whose entries are the reciprocal of the number of subdomains that share the current d.o.f. . Given the residual force computed at each conjugate gradient iteration the projection onto each subdomain is performed: rs = Ds,T r (24) At each subdomain the following system is solved:

Sszs = rs (25)

Finally subdomain contributions to the kinematical residual are averaged:

NSDX z = Dszs (26) s=1 This is equivalent to use a preconditioner of the form

NSDX P−1 = Ds(Ss)−1DsT (27) s=1

Solution of (25) is in turn equivalent to solve a Neumann problem on subdomain s where the solution vector zs contains displacements of the interface d.o.f.. It can be performed without explicitly forming the Schur complement matrix S, by writing for each subdomain: · ¸ · ¸ · ¸ K˚s K~ s vs 0 = (28) ~ s,T ¯ s s s K KI z r

7 The solution of the Neumann problem (28) on subdomain s is

−1 vs = −(K˚s) K~ szs (29)

¡ −1 ¢ ¡ ¢ s s ~ s,T ˚s ~ s −1 s s −1 s z = KI − K (K ) K r = S r (30) When applying the conjugate gradient iteration to the interface problem, matrix-vector products (A.2 and B.1) and system solutions (A.3 and B.7) are replaced by alternatively solving subdomain problems with Dirichlet and Neumann boundary conditions, respectively. These subdomain solvers are perfectly parallelizable. For Laplace problems it has been reported that while the condition number for the global 1 1 matrix is O( h2 )(h being the element size), that for the Schur complement is O( h ), and by using the preconditioner (27) it reduces to O(1). This result have not been proved for other problems but numerical evidence show similar behavior for structural mechanics problems. In general cases equation (25) leads to singular S matrices unless enough rigid body motions be prevented by proper fixations. Several techniques have been proposed to handle this drawback. De Roek and Le Tallec (De Roek and Le Tallec, 1991) modified the solution algorithm by eliminating d.o.f. associated with zero pivots. If the rigid body motions are set arbitrarily, the efficiency of the preconditioner deteriorates with the increase in the number of subdomains. A practical limit was found to be 16 subdomains (Le Tallec et al, 1990). This is due to the local character of the correction in the residual forces implied in this algorithm. In fact Widlund (Widlund, 1988) has shown that the condition number of a domain decomposition method, without a mechanism of the global transportation of the 1 information, grows at least as H2 , h being the subdomain largest diameter. Some attempts to propagate the corrections between the subdomains have been done by solving, at each iteration, a coarse problem with few d.o.f. on each subdomain (Bramble et al, 1986; Mandel, 1993; Smith, 1991). One such technique is analyzed in the next section. Other efficient techniques have been developed, such as the FETI (Finite Element Tearing and Interpolating)(Farhat and Roux, 1991), which is based on a Dual Schur complement method.

5.3 Balancing Domain Decomposition (J. Mandel, 1993)

J. Mandel (Mandel, 1993) proposed a Balancing Domain Decomposition, where matrices Zs are defined containing some base for the rigid body motions of each subdomain s. Singularity is removed by balancing the rhs vector r on each subdomain, i.e. ˆr = r − Sw (31) where w is an assembled interface vector resulting from subdomain contributions which express rigid body motions of each subdomain. NSDX w = Dsws (32) s=1 with ws ∈ range Zs. Thus ZsDsT ˆr = 0 (33) is verified at each subdomain. Once a suitable base for rigid body motions (Zs) is constructed for each subdomain, the solution procedure implies two phases:

1. A local solution for the subdomain whose rigid body d.o.f. have been removed; 2. Computation of the rigid body d.o.f.

This last phase implies the solution for the whole domain of a problem with a number of d.o.f at most 6 times the number of subdomains ( 6 being the number of d.o.f. for 3D bodies ). This problem is stated, in turn, such that the residual at the next step be minimum, i.e. minimize ||r − Sz|| (34) The solution of a global coarse problem, in the last phase, may be accomplished by other techniques such as the reduced base hybrid method (Farhat and G´eradin,1990) making use of the dual form of the Schur complement.

8 6 Dual Schur Complement Method (Farhat and Roux, 1991)

Farhat and Roux (1991) presented the Finite Element Tearing and Interpolating (FETI) method in which the interface problem (eqs. 9-b) are solved in the conjugate or dual variables which -for this displacement based finite element method- are traction forces.

6.1 Saddle point variational principle

Solving a static linear elasticity problem may be stated as: find the displacement function u that makes stationary the potential energy functional:

1 J(v) = a(v, v) − (v, f) − (v, h) (31) 2 Γ where f and h are given functions (volume and surface forces) R a(v, v) = v(i,j) cijkl w(k,l) dΩ R Ω (v, f) = vi fi dΩ ΩR (v, h)Γ = vi hi dΓ Γh i, j, k takes vales 1,2 and 3

vi,j: partial of the i component of v with respect to xj 1 v(i,j) = 2 (vi,j + vj,i): Cauchy infinitesimal strain tensor cijkl : elastic constant tensor Ω: volume Γ: boundary of Ω

Γh: part of the boundary with imposed tractions (hi)

If the whole domain Ω is subdivided into NSD regions Ωs(s = 1,NSD), solving the elastic problem is equivalent to find us(s = 1,NSD) which are stationary points of the energy functional

s 1 s s s s J (v ) = a(v , v ) s − (v , f) s − (v , h) s (32) s 2 Ω Ω Γ being R s s s s s a(v , w )Ωs = Ω v(i,j) cijkl w(k,l) dΩ R s s (v , f)Ωs = s v fi dΩ R Ω i s s (v , h)Γs = s vi hi dΓ Γh for s = 1,NSD and satisfying the continuity conditions at interface:

NSD[ s r t s u = u = ··· = u at ΓI = Γ3 (33) s=1 for subdomains r, s, . . . , t sharing a given interface node. Solving the NSD variational problems (32) with conditions (33) is equivalent to find the saddle point of the Lagrangian:

NSDX NSDX NSDX ∗ 1 2 NSD 1 2 t s s r rs J (v , v , . . . , v , µ , µ , . . . , µ ) = Js(v ) + (v − v , µ )ΓI (34) s=1 s=1 r=1 where Z s r rs sr s r (v − v , µ )ΓI = µ (v − v ) dΓ ΓI

9 That is: find the displacements fields us(s = 1,NSD) and the lagrangian multipliers λsr satisfying:

J ∗(u1, u2, . . . , uNSD, µ1, µ2, . . . , µt) ≤ J ∗(u1, u2, . . . , uNSD, λ1, λ2, . . . , λt) ≤ J ∗(v1, v2, . . . , vNSD, λ1, λ2, . . . , λt) (35) for any admissible vs(s = 1,NSD) and µk(k = 1, t). The left inequality states the continuity conditions at interface. The right inequality states that among all admissible sets vs(s = 1,NSD) satisfying the continuity conditions (33), the set us(s = 1,NSD) minimizes the sum of energy functionals J s defined respectively over Ωs(s = 1,NSD). Therefore us(s = 1,NSD) are restrictions, to Ωs(s = 1,NSD), of the solution u of the whole problem. Equations (34) and (35) correspond to a hybrid variational problem, where continuity conditions among subdomains (33) are relaxed by the introduction of Lagrange multipliers.

6.2 Application to finite elements

The displacement field us(s = 1,NSD) is expressed, in the finite element method, as

us = Nus (s = 1,NSD)

If the division in subdomains is conform and continuity conditions (33) are stated at each d.o.f. in ΓI , the standard Galerkin procedure transform the hybrid variational principle (34)-(35) in a system of algebraic equations:  s s s s,T K u = f + B λ , s = 1,NSD NSDX s s (36)  B u = 0 s=1

Ks, us and f s: stiffness matrix, displacement vector and force vector, for subdomain Ωs s s B : boolean matrix (whose entries are 0, 1 or −1) indicating which d.o.f. of Ω are in ΓI . The number of rows in Bs is the number of global interface unknowns, and the number of columns is the number of the subdomain unknowns. When operating on a vector or a matrix, Bs does not implies flops, it just picks up (eventually also change the sign) some elements of that matrix or vector. λ: Vector of Lagrange multipliers. It represent the interaction forces between subdomains along the interface ΓI . The length of vector λ is nI , the number of d.o.f. on ΓI .

If all subdomains are statically restrained (no rigid movements are allowed) matrices Ks(s = 1, NSD) are not singular and systems (36) may be written:  NSD  X F λ = Bs,T (Ks)−1 f s I (37)  s=1 us = (Ks)−1 (f s + Bs,T λ) , s = 1,NSD

where NSDX s s −1 s,T FI = B (K ) B s=1

For large problems matrix FI is not assembled. Iterative methods are chosen to solve the interface problem. Farhat and G´eradin(1992) proposed a reduced number of Lagrange multipliers, which reduces the size of the interface problem. This results in placing a number of connectors different (less than ) the number of interface d.o.f.

10 6.3 Eliminating local singularities

If a subdomain is not, at least isostatically, restrained matrix Ks is singular and equation (37-b) may not be written. If we denote as Ωf a floating subdomain, the equation for subdomain unknowns is:

Kf uf = (f f + Bf,T λ) (38)

Being Kf singular, the solution uf may be obtained if the load term is orthogonal to the null space of Kf : (f f + Bf,T λ) ⊥ Ker Kf (39) and uf may be write: uf = K(+)f (f f + Bf,T λ) + Rf αf (40) where

K(+)f : is a generalized inverse of Kf (not explicitly computed) Rf : matrix with the rigid modes of Ωf (null space of Kf ) αf : states a linear combination of modes in Rf

f f Let nr the number of redundant rows in K , matrix R has nr columns. nr is , at most 6 in 3D problems (3 in 2D). The pseudo inverse K(+)f is defined such that:

Kf K(+)f Kf = Kf

Computation of rigid modes

Let · ¸ f f f Kpp Kpr K = f f Krp Krr

where subindex p denotes principal d.o.f., while r denotes redundant ones. nr is the number f of redundant rows. Kpp has rank n − nr. The same partition is made on vector Rf : · ¸ f f Rp R = f Rr

Kf,T Rf = 0, if Rf is a base for the null space of Kf , that is: ½ f,T f f,T f Kpp Rp + Kpr Rr = 0 f,T f f,T f Krp Rp + Krr Rr = 0

f Let Rr = Inr , i.e. an identity matrix with size nr, then from the first equation above:

f f −1 f Rp = −(Kpp) Kpr

while from the second equation:

f f,T f −1 f Krr = Kpr (Kpp) Kpr

then: · ¸ −(Kf )−1 Kf Rf = pp pr Inr

f f f R has rank nr. The nr columns of R form a base for the null space of K .

11 Computing K(+)f f f :

From the above partitioning:

f f,T f −1 f Krr = Kpr (Kpp) Kpr

Matrix: · ¸ (Kf )−1 0 K(+)f = pp 0 0 is a pseudo inverse of Kf . Then a solution of the form: K(+)f f f may also be written: · ¸ (Kf )−1f f uf = K(+)f f f = pp p 0

In practice matrix Kf is not partitioned into p and r. Rather when factorizing, if a null pivot is found that means that the row involved is redundant. Then the pivot is set to 1, the remaining column downward is copied to the r.h.s. as an extra vector; and coefficients of that equation are set to zero. At the end of the factorization equations where the pivot was not null correspond to the non singular f matrix Kpp. The substitution process is modified in order to add the extra terms in the r.h.s. recovering:

−1 upp = −(Kpp) Kprupr

Rigid modes do not produce energy. This may be expressed (condition (39)):

Rf,T Kf uf = Rf,T (f f + Bf,T λ) = 0 (41)

Replacing (40) in (36), for each of the floating subdomains, and adding (41), after some algebraic manipulation the interface problem may be stated as: · ¸ · ¸ · ¸ FI GI λ d T = (42) GI 0 α −e where NSDX s ∗s s,T FI = B K B s=1 h i 1 2 Nf GI = RI RI ... RI s s s RI = B R (s = 1,Nf ) NSDX d = BsK∗sf s s=1 s s,T s e = R f (s = 1,Nf )   e1  2   e  e =  .   .  eNf   α1  2   α  α =  .   .  αNf ½ s −1 s ∗s (K ) if Ω is not floating K = s K(+) if Ωs is floating

12 The dual Schur complement matrix FI is a square matrix with dimensions equal to the global interface d.o.f.. This is also the dimension of the Lagrange multiplier vector λ. Vector α has the dimension of the total number of rigid modes, that is: at most 6 times the number of subdomains. The number of rows of matrix GI is the number of global interface d.o.f. and the number of columns, that of total rigid modes. The product GI α represents the jump in rigid body motions.

6.4 Duality of FETI with the Schur Complement Method

Let the stiffness matrix be partitioned as: · ¸ K˚s K~ s Ks = K~ s,T K¯ s

FI in eq. (37) is:

NSDX s s −1 s,T FI = B (K ) B s=1 which is, in general, symmetric, positive semi-definite (Farhat and Mandel 1998). We can see that

−1 s s −1 s,T ¯ s ~ s,T ˚s ~ s −1 B (K ) B = (K − KI (K ) KI ) The FETI operator: NSDX NSDX s s −1 s,T s −1 FI = B (K ) B = (S ) (43) s=1 s=1 and the Schur complement operator:

NSD NSD X −1 X ¯ s ~ s,T ˚s ~ s s S = (K − KI (K ) KI ) = S (44) s=1 s=1 are dual operators.

The condition number of the product FI S is independent of the mesh size h (Bjørstad and Wid- lund,1986), therefore it is expected than both matrices had similar condition number. Nevertheless numerical experiences reported by Farhat et al. show faster convergence with FETI than with the Schur complement method, attributable to the spectral properties of the matrices (see Farhat, Mandel and Roux, 1994)

6.5 Solving interface problem by Projected Conjugate Gradient Method

Recall problem (42) over the interface d.o.f.. Solving it is equivalent to minimize the following functional with restrictions:  1 T T  min Φ(λ) = 2 λ FI λ − λ d (45)  T subject to GI λ = e The minimum problem is solved by conjugate gradient and at each iteration the restriction is intro- duced. This is obtained by projecting the search direction p over the null space of GI . The algorithm for the projected conjugate gradient method is given in Table III.

T T The initial value for λ must satisfy GI λ(0) = e and subsequent increments GI δλ = 0 to guarantee that solution satisfy restrictions (45-b). This may be obtained if the residual is projected with matrix:

T −1 T T = [I − GI (GI GI ) GI ]

13 Table III: Projected Conjugate Gradient Algorithm

A. Initialization T −1 A.1 λ = GI (GI GI ) e A.2 r = d − FI λ matrix-vector prod + sum vectors A.3 z = Tr matrix-vector prod A.4 ρ = (r, z) dot product A.5 ρ0 = ρ A.6 p = z A.7 k = 1 B. Iterations: while k < Kmax do

B.1 Convergence test: if ρ < T ol ρ0 end iterations B.2 a = FI p matrix-vector prod. B.3 m = (p, a) dot product ρ B.4 α = m B.5 λ = λ + αp SAXPY B.6 r = r − αa SAXPY B.7 z = Tr matrix-vector prod. B.8 ρold = ρ B.9 ρ = (r, z) dot product B.10 γ = ρ ρ old B.11 p = z + γp SAXPY B.12 k = k + 1, go to B.1

z = T r since

T T T T −1 T T T T −1 T GI z = GI (T r) = GI [I − GI (GI GI ) GI ] r = GI r − GI GI (GI GI ) GI r = 0

T T from the conjugate gradient algorithm GI p = 0 and GI λ(0) = 0 as required. Projection requires, then, to perform the following operations:

T 1. v1 = GI r, this operations may be performed independently at each SD; T 2. solving (GI GI )v2 = v1, this means solving a problem with size equal to the total rigid body modes. This is a coarse global problem;

3. v3 = GI v2, this operation requires that subdomains interchange information through the interface.

The projected conjugate gradient method, shown in Table III, will be effective if it is preconditioned. In fact only matrix FI needs to be preconditioned.

FI may be written:     K∗1 0 ... 0 B1,T £ ¤  0 K∗2 ... 0   B2,T  1 2 NSD     FI = B B ... B  . . . .   .  (46)  . . .. .   .  0 0 ... K∗NSD BNSD,T

This suggest an approximate to the inverse of FI :

14 Table IV: Projected Preconditioned Conjugate Gradient Algorithm

A. Initialization T −1 A.1 λ = GI (GI GI ) e A.2 r = d − FI λ matrix-vector prod + sum vectors A.3 z = Tr matrix-vector prod A.4 ρ = (r, z) dot product A.5 ρ0 = ρ A.6 p = z A.7 w = z A.8 k = 1 B. Iterations: while k < Kmax do

B.1 Convergence test: if ρ < T ol ρ0 end iterations B.2 a = FI p matrix-vector prod. B.3 m = (p, a) dot product B.4 δ = (p, w) dot product δ B.5 α = m B.6 λ = λ + αp SAXPY B.7 r = r − αa SAXPY B.8 w = Tr matrix-vector prod. B.9 ¯z = P−1w matrix-vector prod. B.10 z = T¯z matrix-vector prod. B.11 ρold = ρ B.12 ρ = (w, z) dot product B.13 γ = ρ ρ old B.14 p = z + γp SAXPY B.15 k = k + 1, go to B.1

    K1 0 ... 0 B1,T  2   2,T  NSD £ ¤  0 K ... 0   B  X P−1 = B1 B2 ... BNSD  . . .   .  = BsKsBs,T  . . .. .   .  . . . . . s=1 0 0 ... KNSD BNSD,T (47) The projected preconditioned conjugate gradient algorithm is given in Table IV.

Solving the system P z = w is easy since P−1 is formed by matrices Ks and therefore factorization is not needed, just matrix-vector products. The preconditioning matrix (47) is sometimes referred to as ’lumped’ preconditioner, as it is equivalent to assemble the interface submatrix of the subdomain stiffness: NSD P−1 = A K¯ s (48) s=1 Another one called Dirichlet preconditioner is given by:

NSD −1 X PD = Ss (49) s=1 This is better than the preconditioner (47) but requires more computations to be computed. The lumped preconditioner has been successfully utilized in some references (Farhat and Roux, 1991, Farhat, Mandel and Roux, 1994).

15 In a second order linear elasticity problem (plane , plane strains, 3D), if we think in a regular square mesh with elements of size h, partitioned in subdomains of size H, the condition number of the full size matrix is 1 κ(K) = O( ) h2 whereas with the Dirichlet preconditioned FETI method the condition number of the interface problem is: H κ(P−1F ) = O(1 + log )m I h being m = 2 or m = 3. This shows optimal convergence properties of this method. In effect:

• If the number of subdomains NSD is kept fixed, and the number of elements grows (h &) then the 1 2 1 condition number grows with O(log h ) , instead of O( h2 ) as would the full matrix; • If the element size h is kept fixed, and the number of subdomains grows (NSD %), the subdomain size decreases (H &) and the condition number decreases;

H • If we think in a case where each subdomain is allocated on one processor, being h constant, the porblem size may grow by increasing the number of processors(subdomains) while the condition number remains constant.

6.6 Spectral properties of the Schur Complement and FETI matrices

The following example is given in reference (Farhat, Mandel and Roux, 1994). A solid 3D cantilever beam (fig. 5) is analyzed with 2 subdomains. The number of iterations for two different meshes are given in Table V.

Figure 5: Example: 3D beam

Table V: Number of iterations for the 3D beam problem

Mesh 1 Mesh 2 2400 internal d.o.f 16320 internal d.o.f 192 interface d.o.f 672 interface d.o.f condition number κ(S) 64.0 56.0 condition number κ(FI ) 64.6 60.1 Global residue ||Ku−f||2 10−2 10−4 10−6 10−2 10−4 10−6 ||f||2 Number of iterations Schur Complement 22 28 34 21 27 34 Number of iterations FETI 7 16 21 7 17 24

The spectral densities for Mesh 2 are drawn in figure 6 for matrices S and FI . In log scale, abscissae represents the ratio of the eigenvalues to the minimum eigenvalue. Ordinates are the number of eigenvalues found at each band. The condition number, measured as the abscissa of the greater eigenvalue, is similar for both matrices. But for S the eigenvalues are slightly more scattered than for FI . In this structural problem the conjugate gradient method captures the higher eigenvalues in the first place. When using the Schur complement matrix S the first iterations contain the higher structural modes, which are far from the structural responde.

16 In the case of FETI, the first iterations give mainly the lower modes and therefore they lead more rapidly to a good approximation to the deformed structure.

S FI

l/lmin l/lmin

Figure 6: Spectral density for S (left) and FI (right), for Mesh 2

References

[1] Beguelin, A., Dongarra, J., Geist, G.A., Manchek, R., Sunderam, V.S., (1992) - “A user’s guide to PVM parallel virtual machine”, Oak Ridge National Lab. Int. Report, TN, June 1992.

[2] Bjørstad, P.E. and Widlund, O.B., (1986) - “Iterative Methods for the Solution of Elliptic Problems on Regions Partitioned into Substructures”, SIAM J.Num.Analysis, Vol. 23, pp. 1097-1120. [3] Bramble, J.H., Pasciak, J.E., Schatz A.H., (1986) - “The Construction of Preconditioners for Elliptic Problems by Substructuring”, Math. Computations, Vol. 47, pp 103-134. [4] De Roeck, Y.-H., Le Tallec, P. (1991) - “Analysis and Test of a Local Domain Decomposition Preconditioner”, in IV Intl.Symp. on Domain Decomposition Meth. for Partial Differential Equations (R. Glowinsky, Y. Kuznetsov, G. Meurant, J. P´eriauxand O. Widlund, Eds.), SIAM, Philadelphia.

[5] Farhat, Ch., Crivelli, L (1989) - “A General Aproach to NL FE Computations on Shared Memory Multiprocessors”, Comp. Meth. in Appl. Mechanics and Engineering, Vol. 72, pp 153-172. [6] Farhat, Ch., G´eradin,M. (1990) - “Using a Reduced Number of Lagrange Multipliers for Assem- bling Parallel Incomplete Field Finite Element Approximations”,Report CU-CSSC-90-23, Univ. of Colorado, Boulder. [7] Farhat, Ch., Mandel, J., Roux, F.-X. (1994) - “Optimal convergence properties of the FETI domain decomposition method”, Computer Methods in Applied Mechanics and Engineering, Vol.115, pp 365-385.

[8] Farhat, Ch., Roux, F.-X. (1991) - “A Method of Finite Element Tearing and Interconnection and its Parallel Solution Algorithm”, Int.J.Numerical Methods in Engineering, Vol.32, pp 1205-1227.

[9] Lesoinne, M., Farhat, Ch., G´eradinM. (1991) - “Parallel/Vector Improvements of the Frontal Method”, Int. J. Numer. Methods in Engineering, Vol. 32, pp 1267-1282. [10] Le Tallec, P., De Roeck, Y.-H., Vidrascu, M. (1990) - “Domain Decomposition Methods for Large Linearly Elliptic 3D Problems”, Tech.Report TR/PA/90/20, Centre Europ´eende Recherche et de Formation Avanc´eeen Calcul Scientifique, Toulouse, France. [11] Mandel J. (1993) - “Balancing Domain Decomposition”, Communications in Numerical Methods in Engineering, Vol. 9, pp 233-241. [12] Nour-Omid, B. (1993) - “Solving Large linearized Systems in Mechanics”, in Solving Large-Scale Problems in Mechanics ( M. Papadrakakis, Ed.), J.Wiley and Sons.

17 [13] Smith B.F. (1991) - “An Optimal Domain Decomposition Preconditioner for the Finite Element Solution of Linear Elasticity Problems”, SIAM J. Sci.Stat.Comput., Vol 13.

[14] Sonzogni, V.E. (1999) - “A communication system for distributed computations on a network of personal computers”, Advances in Engineering Software, Vol. 30, N. 4, pp. 255-260, 1999.

[15] Sonzogni, V.E. (1994) - “Domain Decomposition Techniques in Structural Mechanics”, Proc. XV CILAMCE: Congresso Ibero Latino-Americano sobre Metodos Computacionais para Engenharia, pp. 1167-1176. 30 nov- 2 dic, 1994, Belo Horizonte (Brasil).

[16] Widlund, O.B. (1988) - “Iterative Substructuring Methods: Algorithms and Theory for Elliptic Prob- lems in the Plane”, First Int. Symposium on Domain Decomposition Methods for Partial Differential Equations, SIAM, R. Glowinski, G.H. Golub, G.A. Meurant and J. Periaux, eds., Philadelphia, 1988.

18