Theoretically supported scalable FETI for numerical solution of variational inequalities

ZdenˇekDost´aland David Hor´ak ∗

Abstract The FETI method with a natural coarse grid is combined with recently proposed optimal algorithms for the solution of bound and/or equality con- strained quadratic programming problems in order to develop a scalable solver for elliptic boundary variational inequalities such as those describing equilib- rium of a system of bodies in mutual contact. A discretized model problem is first reduced by the duality theory of convex optimization to the quadratic programming problem with bound and equality constraints. The latter is then modified by means of orthogonal projectors to the natural coarse grid intro- duced by Farhat, Mandel and Roux. Finally the classical results on linear scalability for linear problems are extended to boundary variational inequal- ities. The results are validated by numerical experiments. The experiments also confirm that the algorithm enjoys the same parallel scalability as its linear counterpart.

Keywords: Domain decomposition, variational inequality, scalability, parallel algorithms, FETI Acknowledgment: This research is supported by grant No. 101/04/1145 of the Grant Agency of the Czech Republic and by projects of the Ministry of Education No. 1ET400300415 and ME641.

1 Introduction The FETI (Finite Element Tearing and Interconnecting) domain decomposition method was originally proposed by Farhat and Roux [26] for parallel solving of the linear problems described by elliptic partial differential equations. Its key ingredient

∗FEI VSB-Technicalˇ University of Ostrava, CZ-70833 Ostrava, Czech Republic

1 is decomposition of the spatial domain into non-overlapping subdomains that are ”glued” by Lagrange multipliers, so that, after eliminating the primal variables, the original problem is reduced to a small, relatively well conditioned, typically equality constrained quadratic programming problem that is solved iteratively. The time that is necessary for both the elimination and iterations can be reduced nearly proportion- ally to the number of the processors, so that the algorithm enjoys parallel scalability. Observing that the equality constraints may be used to define so called ”natural coarse grid”, Farhat, Mandel and Roux [25] modified the basic FETI algorithm so that they were able to adapt the results by Bramble, Pasciak and Schatz [5] to prove its numerical scalability, i.e. asymptotically linear complexity. The comprehensive review of the mathematical results related to the FETI methods may be found in the monograph by Tosseli and Widlund [38]. If the FETI procedure is applied to an elliptic variational inequality, the resulting quadratic programming problem has not only the equality constraints, but also the non-negativity constraints. Even though the latter is a considerable complication as compared with linear problems, it seems that the FETI procedure should be even more powerful for the solution of variational inequalities than for the linear problems. The reason is that FETI not only reduces the original problem to a smaller and better conditioned one, but it also replaces for free all the inequalities by the bound constraints. Promising experimental results by Dureisseix and Farhat [22] supported this claim and even indicated numerical scalability of their method. Similar results were achieved also for the FETI–DP (Dual–Primal) method introduced by Farhat et al. [24]. The FETI–DP method is very similar to the original FETI, the only difference is that it enforces the continuity of displacements at corners on primal level. A new Lagrange multipliers algorithm, FETI–C, based on FETI–DP and on active set strategies with additional planning steps and preconditioning, was introduced by Farhat et al. [1, 22]. Its scalability was demonstrated experimentally. Another approach yielding an experimental evidence of scalability was proposed by Dost´al,Friedlander, Gomes, Santos and Hor´ak[12, 13, 14]. The algorithm com- bined FETI with a special variant of the augmented Lagrangian method [10]. Scal- ability was later proved for an algorithm that enforced the equality constraints by the optimal dual penalty [15, 16] and solved the resulting bound constrained prob- lem by recent in a sense optimal algorithms [7, 21]. Using the same algorithms, Dost´al,Hor´akand Stefanica then proved numerical scalability for a FETI–DP algo- rithm applied to the coercive problems discretized by means of either nodal [18] or mortar [19] Lagrange multipliers. Most recently, the scalability results were proved also for semicoercive problems [20]. The rate of convergence was given in terms of the effective condition number of the dual Schur complement of the stiffness matrix which was proved to be bounded by CH2/h2, where C is a constant independent of the discretization and decomposition parameters h and H, respectively. The es- timates did not assume any preconditioning. Indeed, numerical experiments by the present authors, V. Vondr´akand M. Lesoinne indicated that the performance of our FETI–DP based algorithms may be considerably improved by preconditioning.

2 It should be noted that the effort to develop scalable solvers for variational in- equalities was not restricted to FETI. For example, using ideas related to Mandel [34], Kornhuber, Krause, Sander and Wohlmuth [30, 31, 40, 32, 33] gave an experimen- tal evidence of numerical scalability of the algorithm based on monotone multigrid. Probably the first theoretical results concerning development of scalable algorithms were proved by Sch¨oberl [36, 37]. In this paper, we use the FETI method with a natural coarse grid to develop a scalable algorithm for numerical solution of both coercive and semicoercive varia- tional inequalities. The rate of convergence is again given in terms of the effective condition number of the dual Schur complement of the stiffness matrix, but this time it is bounded by CH/h. The paper is organized as follows. After describing a model problem, we briefly review the FETI methodology [12] that turns the variational inequality into the well conditioned quadratic programming problem with bound and equality constraints. Then we review our algorithms for solution of the resulting bound and equality constrained quadratic programming problem whose rate of convergence may be ex- pressed in terms of bounds on the spectrum of the dual Schur complement matrix [21, 8, 9]. Finally we present the main results about optimality of our method and give results of numerical experiments with parallel implementation of the algorithm in PETSc [3].

2 Model problem For the sake of simplicity, we shall reduce our analysis to a simple model problem, but our reasoning is valid also in more general cases, including contact problems of 2D and 3D elasticity, provided that the conditions exploited in the proof of the results by Farhat, Mandel and Roux [25] are satisfied. Let Ω = Ω1 ∪ Ω2, Ω1 = (0, 1) × (0, 1) and Ω2 = (1, 2) × (0, 1) denote open domains with boundaries Γ1, 2 i i i i Γ and their parts Γu,Γf ,Γc formed by the sides of Ω , i = 1, 2 as in Figure 1a or Figure 1b. Let H1(Ωi), i = 1, 2 denote the Sobolev space of the first order in the space L2(Ωi) of the functions on Ωi whose squares are integrable in the sense of Lebesgue. Let i © i 1 i i i ª V = v ∈ H (Ω ): v = 0 on Γu denote the closed subspaces of H1(Ωi), i = 1, 2, and let

1 2 © 1 2 2 1 ª V = V × V and K = (v , v ) ∈ V : v − v ≥ 0 on Γc denote the closed subspace and the closed convex subset of H = H1(Ω1) × H1(Ω2), respectively. The relations on the boundaries are in terms of traces. On H we shall

3 define a symmetric bilinear form Z µ ¶ X2 ∂ui ∂vi ∂ui ∂vi a(u, v) = + dΩ i ∂x ∂x ∂y ∂y i=1 Ω and a linear form X2 Z `(v) = f ividΩ, i i=1 Ω where f i ∈ L2(Ωi), i = 1, 2 are the restrictions of   −1 for (x, y) ∈ (0, 1) × [0.75, 1) f(x, y) = 0 for (x, y) ∈ (0, 1) × [0, 0.75) and (x, y) ∈ (1, 2) × [0.25, 1)  −3 for (x, y) ∈ (1, 2) × [0, 0.25) for coercive problem and   −3 for (x, y) ∈ (0, 1) × [0.75, 1) f(x, y) = 0 for (x, y) ∈ (0, 1) × [0, 0.75) and (x, y) ∈ (1, 2) × [0.25, 1)  −1 for (x, y) ∈ (1, 2) × [0, 0.25) for semicoercive problem. Thus we can define a problem to find 1 min q(u) = a(u, u) − `(u) subject to u ∈ K. (2.1) 2

−1 f -3 f −3 -1

Ω 2 Ω 2 0.25 0.75 0.25 0.75 0.25 0.25 0.75 0.75 Ω 1 Ω 1

1 1 1 1

ΓΓΓΓ1 1 2 Γ 2 ΓΓΓΓ1 1 2 uf c f u uf c f Fig. 1a: Coercive model problem Fig. 1b: Semicoercive model problem Figure 1: Model problems

We shall consider two variants of the Dirichlet data. In the first case, both the membranes are fixed on the outer edges as in Figure 1a, so that

1 2 2 2 Γu = {(0, y) ∈ IR : y ∈ [0, 1]}, Γu = {(2, y) ∈ IR : y ∈ [0, 1]}.

4 i Since the Dirichlet conditions are prescribed on parts Γu, i = 1, 2 of the boundaries of the both membranes with positive measure, the quadratic form a is coercive which guarantees both existence and uniqueness of the solution [28, 27]. In the second case, only the left membrane is fixed on the outer edge and the right membrane has no prescribed displacement as in Figure 1b, so that

1 2 2 Γu = {(0, y) ∈ IR : y ∈ [0, 1]}, Γu = ∅. Even though a is in this case only semidefinite, the form q is still coercive due to the choice of f so that it has again the unique solution [28, 27]. More details about this particular model problem may be found in [12]. The solution of the model problem may be interpreted as the displacement of two mem- branes under the traction f. The left edge of the right membrane is not allowed to penetrate below the right edge of the left membrane.

3 Domain decomposition and discretization In our definition of the problem, we have so far used only the natural decomposition of the spatial domain Ω into Ω1 and Ω2. However, to enable efficient application of the domain decomposition methods, we can optionally decompose each Ωi into sub- domains Ωi1,..., Ωip, p > 1 as in Figure 2. The continuity in Ω1 and Ω2 of the global solution assembled from the local solutions uij will be enforced by the ”gluing” con- ditions uij(x) = uik(x) that should be satisfied for any x on the interface Γij,ik of Ωij and Ωik. After modifying appropriately the definition of problem (2.1), introducing regular grids in the subdomains Ωij that match across the interfaces Γij,kl, indexing contiguously the nodes and entries of corresponding vectors in the subdomains, and using the Lagrangian finite element discretization, we get the discretized version of problem (2.1) with auxiliary domain decomposition that reads 1 min u>Ku − f >u s.t.B u ≤ 0 and B u = 0. (3.1) 2 I E In (3.1), K denotes a block diagonal positive semidefinite stiffness matrix, the full rank matrices BI and BE describe the discretized non-penetration and gluing condi- tions, respectively, and f represents the discrete analog of the linear term `(u). The rows of BE and BI are filled with zeros except 1 and -1 in positions that correspond to the nodes with the same coordinates on the artificial or contact boundaries, re- spectively. In particular, if bi denotes a row of BE or BI , then bi will not have more than four nonzero entries, and for any displacement vector u, biu will denote the difference or jump between the displacements on each side of the boundary. Some more details may be found in [12]. Our next step is to simplify the problem, in particular to replace the general inequality constraints BIu ≤ 0 by the nonnegativity constraints using the duality

5 Ω1,2 Ω1,4 Ω2,2 Ω2,4 H λ E Ω1,1 Ω1,3 Ω2,1 Ω2,3 h λ λ λ E IE

Figure 2: Domain decomposition and discretization theory. To this end, let us introduce the Lagrangian associated with problem (3.1) by 1 L(u, λ , λ ) = u>Ku − f >u + λ>B u + λ>B u, (3.2) I E 2 I I E E where λI and λE are the Lagrange multipliers associated with inequalities and equal- ities, respectively. Introducing the notation · ¸ · ¸ λ B λ = I and B = I , λE BE we can observe that B is a full rank matrix and write the Lagrangian briefly as 1 L(u, λ) = u>Ku − f >u + λ>Bu. 2 It is well known [4] that (3.1) is equivalent to the saddle point problem

Find (u, λ) so that L(u, λ) = sup inf L(u, λ). (3.3) u λI ≥0

For fixed λ, the Lagrange function L(·, λ) is convex in the first variable and the minimizer u of L(·, λ) satisfies

Ku − f + B>λ = 0. (3.4)

Equation (3.4) has a solution iff

f − B>λ ∈ ImK, (3.5) which can be expressed more conveniently by means of a matrix R whose columns span the null space of K as R>(f − B>λ) = 0. (3.6)

6 The matrix R may be formed directly so that each floating subdomain is assigned to a column of R with ones in positions of the nodal variables that belong to the subdomain and zeros elsewhere. It may be checked that R>B> is a full rank matrix. The matrix R may also be extracted from K [23]. Now assume that λ satisfies (3.5) and denote by K† any matrix that satisfies KK†K = K. (3.7) Let us note that a generalized inverse K† that satisfies (3.7) may be evaluated at the cost comparable with the Cholesky decomposition of regularized K [23]. It may be verified directly that if u solves (3.4), then there is a vector α such that u = K†(f − B>λ) + Rα. (3.8) After substituting expression (3.8) into problem (3.3) and changing signs, we shall get the minimization problem to find

> > min Θ(λ) s.t. λI ≥ 0 and R (f − B λ) = 0 (3.9) where 1 Θ(λ) = λ>BK†B>λ − λ>BK†f. (3.10) 2 Once the solution λ of (3.9) is known, the vector u that solves (3.1) can be evaluated by (3.8) and the formula [12]

α = −(R>Be>BRe )−1R>Be>BKe †(f − B>λ), (3.11)

e e> > > e where B = [BI ,BE ] , and the matrix BI is formed by the rows bi of BI that correspond to the positive components of the solution λ characterized by λi > 0.

4 Natural coarse grid Even though the problem (3.9) is much more suitable for computations than (3.1) and was used to efficient solving of the discretized variational inequalities [11], further improvement may be achieved by adapting some simple observations and the results of Farhat, Mandel and Roux [25]. Let us denote

F = BK†B>, de= BK†f, Ge = R>B>, e = R>f and let T denote a regular matrix that defines orthonormalization of the rows of Ge so that the matrix G = T Ge

7 has orthonormal rows. After denoting e = T e,e problem (3.9) reads 1 min λ>F λ − λ>de s.t λ ≥ 0 and Gλ = e. (4.1) 2 I Next we shall transform the problem of minimization on the subset of the affine space to that on the subset of the vector space by looking for the solution of (4.1) in the form λ = µ + λe, where Gλe = e. The following Lemma shows that we can even e e find λ such that λI = 0.

Lemma 4.1. Let B be such that the negative entries of BI are in the columns 2 e that correspond to the nodes in the floating subdomain Ω . Then there is λI ≥ 0 such that Gλe = e. Proof: See [15] (coercive problem) or [16] (semicoercive problem). ¤ To carry out the transformation, denote λ = µ + λe, so that 1 1 1 λ>F λ − λ>de= µ>F µ − µ>(de− F λe) + λe>F λe − λe>de 2 2 2 and the problem (4.1) is, after returning to the old notation, equivalent to 1 min λ>F λ − λ>d s.t Gλ = 0 and λ ≥ −λe (4.2) 2 I I e e e with d = d − F λ and λI ≥ 0. Our final step is based on observation that the problem (4.2) is equivalent to 1 min λ>(PFP + ρQ)λ − λ>P d s.t Gλ = 0 and λ ≥ −λe , (4.3) 2 I I where ρ is arbitrary positive constant and

Q = G>G and P = I − Q denote the orthogonal projectors on the image space of G> and on the kernel of G, respectively. The regularization term is introduced in order to simplify the reference to the results of quadratic programming that assume regularity of the Hessian matrix of the quadratic form. The problem (4.3) turns out to be a suitable starting point for development of an efficient algorithm for variational inequalities due to the classical estimates of the extreme eigenvalues. To formulate them, we shall denote by αmin(A)

8 and αmax(A) the smallest and the largest eigenvalue of a given symmetric matrix A, respectively.

Theorem 4.2. There are constants C1 > 0 and C2 > 0 independent of the dis- cretization parameter h and the decomposition parameter H such that H α (PFP |ImP ) ≥ C and α (PFP |ImP ) ≤ ||PFP || ≤ C . min 1 max 2 h Proof: See Theorem 3.2 of Farhat, Mandel and Roux [25]. ¤

Note: The statement of Theorem 3.2 of Farhat, Mandel and Roux [25] gives only an upper bound on the spectral condition number κ(PFP |ImP ). However, the reasoning that precedes and substantiates their estimate proves both bounds of (4).

5 Optimal solvers to bound and equality constrained problems We shall now briefly review our in a sense optimal algorithms for the solution of the bound and equality constrained problem (4.3). They combine our semimonotonic augmented Lagrangian method [8] which generates approximations for Lagrange mul- tipliers for the equality constraints in the outer loop with the working set algorithm for bound constrained auxiliary problems in the inner loop [21]. If a new Lagrange multiplier vector µ is used for the equality constraints, the augmented Lagrangian for problem (4.3) can be written as 1 L(λ, µ, ρ) = λ>(PFP + ρQ)λ − λ>P d + µ>Gλ + ρλ>Qλ. 2 The gradient of L(λ, µ, ρ) is given by

g(λ, µ, ρ) = (PFP + ρQ)λ − P d + GT (µ + ρGλ).

Let I denote the set of the indices of the bound constrained entries of λ ≥ −λe. The projected gradient gP = gP (λ, µ, ρ) of L at λ is given componentwise by ( g for λ > −λe or i∈ / I gP = i i i i − e gi for λi = −λi and i ∈ I

− where gi = min{gi, 0}. Our algorithm is a variant of that proposed by Conn, Gould and Toint [6] for identifying stationary points of more general problems. Its mod- ification by Dost´al,Friedlander and Santos [10] was used by Dost´aland Hor´akto

9 develop a scalable FETI based algorithm, as shown experimentally in [14]. The key to proving optimality results is to combine the adaptive precision control of auxiliary problems in Step 1 with the new update rule for the penalty parameter ρ in Step 4. All the necessary parameters are listed in Step 0, and typical values of these param- eters for our model problem are given in brackets.

Algorithm 5.1. Semi-monotonic augmented Lagrangian method for bound and equality constrained problems (SMALBE). Step 0. {Initialization of parameters} Given η > 0 [η = kP dk], β > 1 [β = 10], M > 0 [M = 1], 0 0 ρ0 > 0 [ρ0 = 100], and µ [µ = 0] , set k = 0. Step 1. {Inner iteration with adaptive precision control.} k k e Find λ such that λI ≥ −λI P k k k ||g (λ , µ , ρk)|| ≤ min{MkGλ k, η}. Step 2. {Stopping criterion.} P k k k If ||g (λ , µ , ρk)|| and ||Gλ || are sufficiently small, then λk is the solution. end if. Step 3. {Update of the Lagrange multipliers.} k+1 k k µ = µ + ρkGλ Step 4. {Update the penalty parameter.} k k k k−1 k−1 k 2 If k > 0 and L(λ , µ , ρ ) < L(λ , µ , ρk−1) + ρkkGλ k /2 then ρk+1 = βρk else ρk+1 = ρk end if. Step 5. Increase k and return to Step 1.

Step 1 may be implemented by any algorithm for minimization of the augmented f Lagrangian L with respect to λ subject to λI ≥ −λI which guarantees convergence of the projected gradient to zero. More about the properties and implementation of SMALBE algorithm may be found in [8]. The unique feature of the SMALBE algorithm is its capability to find an approx- imate solution of problem (4.3) in a number of steps which is uniformly bounded in terms of the bounds on the spectrum of PFP + ρQ [8]. To get bound on the number of matrix multiplication, it is necessary to have algorithm which can solve the problem e minimize L(λ, µ, ρ) subject to λI ≥ −λI (5.1) with the rate of convergence in terms of the bounds on the spectrum of the Hessian matrix of L. To describe such algorithm, let us recall that the unique solution λ = λ(µ, ρ) of

10 (5.1) satisfies the Karush-Kuhn-Tucker conditions

gP (λ, µ, ρ) = 0. (5.2) Let A(λ) and F(λ) denote the active set and free set of indices of λ, respectively, i.e., e e A(λ) = {i ∈ I : λi = −λi} and F(λ) = {i : λi > −λi or i∈ / I}. To enable an alternative reference to the KKT conditions [4], let us define the free gradient ϕ(λ) and the chopped gradient β(λ) by ½ ½ gi(λ) for i ∈ F(λ) 0 for i ∈ F(λ) ϕi(λ) = and βi(λ) = − 0 for i ∈ A(λ) gi (λ) for i ∈ A(λ) so that the KKT conditions are satisfied if and only if the projected gradient gP (λ) = e ϕ(λ) + β(λ) is equal to zero. We call λ feasible if λi ≥ −λi for i ∈ I. The projector P to the set of feasible vectors is defined for any λ by e P (λ)i = max{λi, −λi} for i ∈ I,P (λ)i = λi for i∈ / I. Let A denote the Hessian of L with respect to λ. The expansion step is defined by ¡ ¢ λk+1 = P λk − αϕ(λk) (5.3) with the steplength α ∈ (0, kAk−1]. This step may expand the current active set. To describe it without P , let ϕe(λ) be the reduced free gradient for any feasible λ, with entries ϕei = ϕei(λ) = min{λi/α, ϕi} for i ∈ I, ϕei = ϕi for i ∈ E such that P (λ − αϕ(λ)) = λ − αϕe(λ). (5.4) If the inequality ||β(λk)||2 ≤ Γ2ϕe(λk)>ϕ(λk) (5.5) holds, then we call the iterate λk strictly proportional. The test (5.5) is used to decide which component of the projected gradient gP (λk) will be reduced in the next step. The proportioning step is defined by

k+1 k k λ = λ − αcgβ(λ ).

k k k The steplength αcg is chosen to minimize L(λ − αβ(λ ), µ , ρk) with respect to α, i.e., β(λk)>g(λk) α = . cg β(λk)>Aβ(λk)

11 The purpose of the proportioning step is to remove indexes from the active set. The conjugate gradient step is defined by

k+1 k k λ = λ − αcgp (5.6) where pk is the conjugate gradient direction [2] which is constructed recurrently. The recurrence starts (or restarts) with ps = ϕ(λs) whenever λs is generated by the expansion step or the proportioning step. If pk is known, then pk+1 is given by the formulae [2] ϕ(λk)>Apk pk+1 = ϕ(λk) − γpk, γ = . (5.7) (pk)>Apk The conjugate gradient steps are used to carry out the minimization in the face s WJ = {λ : λi = 0 for i ∈ J } given by J = A(λ ) efficiently. The algorithm that we use may now be described as follows.

Algorithm 5.2. Modified proportioning with reduced gradient projections (MPRGP). 0 e −1 Let λ be an n-vector such that λi ≥ −λi for i ∈ I, α ∈ (0, kAk ], and Γ > 0 be given. For k ≥ 0 and λk known, choose λk+1 by the following rules: Step 1. If gP (λk) = 0, set λk+1 = λk. Step 2. If λk is strictly proportional and gP (λk) 6= 0, try to generate λk+1 by the k+1 k+1 conjugate gradient step. If λi ≥ 0 for i ∈ I, then accept it, else generate λ by the expansion step. Step 3. If λk is not strictly proportional, define λk+1 by proportioning.

The MPRGP algorithm has linear rate of convergence in terms of the spectral condition number of the Hessian A of L [21]. More about the properties and imple- mentation of SMALBE algorithm may be found in [21] and [9].

6 Optimality To show that Algorithm 5.1 with the inner loop implemented by Algorithm 5.2 is optimal for the solution of problem (or a class of problems) (4.3), we shall introduce new notation that complies with that used in [9]. We shall use

T = {(H, h) ∈ IR2 : H ≤ 1, 2h ≤ H and H/h ∈ IN} as the set of indices. Given a constant C ≥ 2, we shall define a subset TC of T by

2 TC = {(H, h) ∈ IR : H ≤ 1, 2h ≤ H, H/h ∈ IN and H/h ≤ C}.

12 For any t ∈ T , we shall define

At = PFP + ρQ, bt = P d e Ct = G, `t,I = −λI and `t,E = −∞ by the vectors and matrices generated with the discretization and decomposition parameters H and h, respectively, so that the problem (4.3) is equivalent to the problem minimize Θt(λt) s.t.Ctλt = 0 and λt ≥ `t (6.1) 1 > > > with Θt(λ) = 2 λ Atλ − bt λ. Using these definitions, Lemma 4.1 and GG = I, we obtain + kCtk ≤ 1 and k`t k = 0, (6.2) + where for any vector v with the entries vi, v denotes the vector with the entries + vi = max{vi, 0}. Moreover, it follows by Theorem 4.2 that for any C ≥ 2 there are C C constants amax > amin > 0 such that C C amin ≤ αmin(At) ≤ αmax(At) ≤ amax (6.3) C for any t ∈ TC . Moreover, there are positive constants C1 and C2 such that amin ≥ C1 C and amax ≤ C2C. In particular, it follows that the assumptions of Theorem 5 (i.e. the inequalities (6.2) and (6.3)) of [9] are satisfied for any set of indices TC ,C ≥ 2 and we have the following result:

k k Theorem 6.1 Let C ≥ 2 denote a given constant, let {λt }, {µt } and {ρt,k} be generated by Algorithm 5.1 (SMALBE) for (6.1) with kbtk ≥ ηt > 0, β > 1, 0 M > 0, ρt,0 = ρ0 > 0, and µt = 0. Let s ≥ 0 denote the smallest integer such that s 2 β ρ0 ≥ M /amin and assume that Step 1 of Algorithm 5.1 is implemented by means s −1 of Algorithm 5.2 (MPRGP) with parameters Γ > 0 and α ∈ (0, (amax + β ρ0) ], so k,0 k,1 k,l k that it generates the iterates λt , λt , . . . , λt = λt for the solution of (6.1) starting k,0 k−1 −1 from λt = λt with λt = 0, where l = lt,k is the first index satisfying

P k,l k k,l kg (λt , µt , ρt,k)k ≤ MkCtλt k (6.4) or P k,l k −1 kg (λt , µt , ρt,k)k ≤ ²kbtk min{1,M }. (6.5)

Then for any t ∈ TC and problem (6.1), Algorithm 5.1 generates an approximate kt solution λt which satisfies

−1 P kt kt kt M kg (λt , µt , ρt,kt )k ≤ kCtλt k ≤ ²kbtk (6.6) at O(1) matrix-vector multiplications by the Hessian of the augmented Lagrangian Lt s for (6.1) and ρt,k ≤ β ρ0.

13 7 Numerical experiments In this section we report some results of numerical solution of the semicoercive model problem of Section 2 in order to illustrate the performance of the algorithm, in particular its numerical and parallel scalability. To this end, we have implemented Algorithm 5.1 with the solution of auxiliary bound constraints by Algorithm 5.2 in C exploiting PETSc [3] to solve problem (4.3) with varying decomposition and discretization parameters.

0

0 −0.1

−0.2 −0.2

−0.4 −0.3 −0.6 −0.4 −0.8

−0.5 1 −1

−0.6 1 −1.2 0.8 0.5 0.6 −0.7 0.4 −1.4 0 0.2 0.4 0.2 0.6 0.8 0 0.2 1 1.2 0.4 0.6 0.8 1.4 1.6 1.8 0 1 1.2 1.4 0 2 1.6 1.8 2 Fig. 3a: Coercive problem Fig. 3b: Semicoercive problem Figure 3: Solution of model problems

The experiments were run on the Lomond 18-processor Sun HPC 6500 Ultra SPARC-II based SMP system with 400 MHz, 18 GB of shared memory, 90 GB disc space, nominal peak performance 14.4 GFlops, 16 kB level 1 and 8 MB level 2 cache in EPCC Edinburgh, and on the Turing Cray T3E 1200, 788 applications processors, each 1.2 GFlops with 256 MB, 209 GB memory, 28 command processors, 2TB disk space, high-speed network with low latency in the University of Manchester. All the computations were carried out with parameters 1 M = 1, ρ = 10, Γ = 1, λ0 = max{−λ,e Bf}, ² = 10−4. 0 2 The results of computations are summarized in Tables 1 – 3. Table 1 illustrates numerical scalability of Algorithm 5.1. In particular, for vary- ing decompositions and discretization parameters, the upper row of each field of the table gives the corresponding primal dimension/ dual dimension/ times in seconds, while the number in the lower row gives a number of the conjugate gradient iterations that were necessary for the solution of the problem to the given precision. We can see that the number of the conjugate gradient iterations for a given ratio H/h (in rows) varies very moderately.

14 Table 1: Performance for varying decomposition and discretization

H 1 1/2 1/4 1/8 H/h \ procs 2 8 16 16 128 33282/129/41.95 133128/1287/89.50 532512/6687/74.9 2130048/29823/421.5 28 59 36 47 64 8450/65/2.04 33800/647/4.14 135200/3359/7.10 540800/14975/53.48 22 47 33 43 32 2178/33/0.20 8712/327/0.50 34848/1695/1.48 139392/7551/11.66 17 33 30 37 16 578/17/0.04 2312/167/0.18 9248/863/0.68 36992/3839/4.30 13 29 26 32 8 162/9/0.03 648/87/0.10 2592/447/0.39 10365/1983/2.06 10 20 23 27 4 50/5/0.01 200/47/0.04 800/239/0.28 3200/1055/1.30 7 19 22 25

Table 2 indicates that the algorithm presented enjoys high parallel scalability. The results for the largest problems are in Table 3.

Table 2: Parallel scalability for 128 subdomains processors 1 2 4 8 16 32 64 128 time[sec] 2907.13 1022.03 462.4 165.8 68.06 51.40 85.47 232.00

8 Comments and conclusion We have presented the scalability results related to application of the augmented Lagrangians with the FETI based domain decomposition method using the natural coarse grid to the solution of variational inequalities by recently developed algorithms for the solution of special QP problems. In particular, we have shown that the so- lution of the discretized elliptic variational inequality to a prescribed precision may be found in a number of matrix vector multiplications bounded independently of the discretization parameter provided the ration of the decomposition and the discretiza- tion parameters is kept bounded. Numerical experiments with the model variational inequality are in agreement with the theory and indicate that the algorithm may

15 Table 3: Highlights h H prim. dual. num. of procs out. cg. time dim. dim. subdom. iter. iter. [sec] 1/1024 1/8 2130048 29823 128 32 of Turing 2 47 193.6 1/2048 1/8 8454272 59519 128 16 of Lomond 2 65 3959.0 be efficient. The results remain valid also for the solution of frictionless 2D and 3D contact problems of elasticity and may be adapted to the solution of problems with Coulomb friction as indicated in [17]. The solution of auxiliary linear problems in the inner loop may be improved by standard preconditioners [29, 35, 38] and may be adapted to the mortar discretization [39]. We shall discuss these topics elsewhere.

References [1] P. Avery, G. Rebel, M. Lesoinne and C. Farhat, A umerically scalable dual- primal substructuring method for the solution of contact problems. I: the fric- tionless case. Comput. Methods Appl. Mech. Eng. 193, 23–26, (2004) 2403–2426. [2] O. Axelsson, Iterative Solution Methods. Cambridge University Press, Cam- bridge, 1994. [3] S. Balay , W. Gropp , L.C. McInnes and B. Smith, PETSc 2.0 Users Manual. Argonne National Laboratory, http://www.mcs.anl.gov/petsc/. [4] D. P. Bertsekas, Nonlinear Optimization. Athena Scientific, Belmont 1999. [5] J.H. Bramble, J. E. Pasciak, A. H. Schatz, The construction of preconditioners for elliptic problems by substructuring I, Mathematics of Comput. 47, 103-134 (1986). [6] A. R. Conn, N. I. M. Gould and Ph. L. Toint, A globally convergent aug- mented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM Journal on 28 (1991) 545–572. [7] Z. Dost´al,A proportioning based algorithm for bound constrained quadratic programming with the rate of convergence. Numerical Algorithms 34, 2–4(2003) 293–302. [8] Z. Dost´alInexact semi-monotonic augmented Lagrangians with optimal feasi- bility convergence for quadratic programming with simple bounds and equality constraints. SIAM J. Numerical Analysis 43,1 (2005) 96–115.

16 [9] Z. Dost´al, An optimal algorithm for bound and equality constrained quadratic programming problems with bounded spectrum, submitted. [10] Z. Dost´al,A. Friedlander, S. A. Santos, Augmented Lagrangians with adaptive precision control for quadratic programming with simple bounds and equality constraints. SIAM Journal on Optimization 13,4(2003)1120–1140. [11] Z. Dost´al,A. Friedlander, S. A. Santos, Solution of contact problems of elasticity by FETI domain decomposition. Contemporary Mathematics 218 (1998) 82–93. [12] Z. Dost´al,F. A. M. Gomes, S. A. Santos, Duality based domain decomposition with natural coarse space for variational inequalities. Journal of Computational and Applied Mathematics 126, 1–2 (2000) 397–415. [13] Z. Dost´al,F. A. M. Gomes, S. A. Santos, Solution of contact problems by FETI domain decomposition with natural coarse space projection. Computer Methods in Applied Mechanics and Engineering 190, 13–14 (2000) 1611–1627. [14] Z. Dost´alZ and D. Hor´ak,Scalability and FETI based algorithm for large dis- cretized variational inequalities. Mathematics and Computers in Simulation 61, (3–6) (2003) 347–357. [15] Z. Dost´aland D. Hor´ak, Scalable FETI with Optimal Dual Penalty for a Vari- ational Inequality, and Applications 11, 5–6 (2004) 455 – 472. [16] Z. Dost´aland D. Hor´ak,Scalable FETI with Optimal Dual Penalty for Semico- ercive Variational Inequalities, Contemporary Mathematics 329 (2003)79-88. [17] Z. Dost´al,D. Hor´ak,R. Kuˇcera,V. Vondr´ak,J. Haslinger, J. Dobi´aˇsans S. Pt´ak,FETI based algorithms for contact problems: scalability, large displace- ments and 3D Coulomb friction, Computer Methods in Applied Mechanics and Engineering 194, 2-5 (2005) 395-409. [18] Z. Dost´al,D. Hor´akand D. Stefanica, A scalable FETI-DP algorithm for coercive variational inequalities. IMACS Journal Applied Numerical Mathematics 54,3–4 (2005) 378–390. [19] Z. Dost´al,D. Hor´akand D. Stefanica, A Scalable FETI–DP Algorithm with Non–penetration Mortar Conditions on Contact Interface, submitted. [20] Z. Dost´al,D. Hor´akand D. Stefanica, A Scalable FETI–DP Algorithm for Semi- coercive Variational Inequalities, submitted. [21] Z. Dost´alZ and J. Sch¨oberl, Minimizing quadratic functions over non-negative cone with the rate of convergence and finite termination. Comput. Optimiz. and Applications 30,1 (2005)23–44.

17 [22] Dureisseix D and C. Farhat, A numerically scalable domain decomposition method for solution of frictionless contact problems. International Journal for Numerical Methods in Engineering 50, 12 (2001) 2643–2666. [23] C. Farhat and M. G´erardin,On the general solution by a direct method of a large scale singular system of linear equations: application to the analysis of floating structures. International Journal for Numerical Methods in Engineering 41,4 (1998) 675–696. [24] C. Farhat, M. Lesoinne, P. LeTallec, K. Piersonand D. Rixen, FETI-DP: A dual–prime unified FETI method. I: A faster alternative to the two–level FETI method. Int. J. Numer. Methods Eng. 50, No.7 (2001)1523–1544. [25] C. Farhat, J. Mandel and F.-X.Roux, Optimal convergence properties of the FETI domain decomposition method. Computer Methods in Applied Mechanics and Engineering 115, (1994) 365-385. [26] C. Farhat and F.-X. Roux, An unconventional domain decomposition method for an efficient parallel solution of large-scale finite element systems. SIAM Journal on Scientific Computing 13 (1992) 379-396. [27] R. Glowinski, Variational Inequalities, Springer Verlag, Berlin 1980. [28] I. Hlav´aˇcek,J. Haslinger, J. Neˇcasand J. Lov´ıˇsek, Solution of Variational In- equalities in Mechanics. Springer Verlag: Berlin, 1988. [29] A. Klawonn and O. B. Widlund FETI and Neumann-Neumann iterative sub- structuring mathods: connections and New results. Communications on Pure and Applied Mathematics Vol. LIV (2001) 57-90. [30] Kornhuber R. Adaptive monotone multigrid methods for nonlinear variational problems. Teubner-Verlag: Stuttgart, 1997. [31] Kornhuber R., Krause R. Adaptive multigrid methods for Signorini’s problem in linear elasticity. Computer Visualization in Science 4, 1, (2001) 9–20. [32] R. Krause and O. Sander, Fast solving of contact problems on complicated geometries. In Kornhuber, Ralf (ed.) et al., Domain decomposition methods in science and engineering. Selected papers of the 15th international conference on domain decomposition, Berlin, Germany, July 21-25, 2003. Berlin: Springer. Lecture Notes in Computational Science and Engineering 40, (2005) 495–502. [33] R. H. Krause, B. I. Wohlmuth, A Dirichlet-Neumann type algorithm for contact problems with friction. Comput. Vis. Sci. 5, 3 (2002)139–148. [34] Mandel J. Etude´ alg´ebriqued’une m´ethode multigrille pour quelques probl`emes de fronti`erelibre (French). Comptes Rendus de l’Academie des Sciences Sr. I 298 (1984) 469–472.

18 [35] Mandel J, Tezaur R. Convergence of substructuring method with Lagrange mul- tipliers. Numerische Mathematik 73 (1996) 473–487. [36] J. Sch¨oberl, Solving the Signorini problem on the basis of domain decomposition techniques. Computing 60,4 (1998) 323–344. [37] J. Sch¨oberl, Efficient contact solvers based on domain decomposition techniques. Computers&Mathematics 42 (1998) 1217–1228. [38] A. Toselli and O. B. Widlund, Domain Decomposition Methods–Algorithms and Theory, Springer Series on Computational Mathematics 34, Springer-Verlag, Berlin 2005. [39] Wohlmuth B. I., Discretization Methods and Iterative Solvers Based on Domain Decomposition, Springer , Berlin 2001. [40] B. I. Wohlmuth and R. Krause, Monotone methods on nonmatching grids for nonlinear contact problems. SIAM J. Sci. Comput. 25, No.1, 324-347 (2003).

19