MATHEMATICS OF COMPUTATION Volume 71, Number 239, Pages 1105–1135 S 0025-5718(01)01344-8 Article electronically published on November 14, 2001

CONVERGENCE RATE ANALYSIS OF AN ASYNCHRONOUS SPACE DECOMPOSITION METHOD FOR CONVEX MINIMIZATION

XUE-CHENG TAI AND PAUL TSENG

Abstract. We analyze the convergence rate of an asynchronous space decom- position method for constrained convex minimization in a reflexive Banach space. This method includes as special cases parallel domain decomposition methods and multigrid methods for solving elliptic partial differential equa- tions. In particular, the method generalizes the additive Schwarz domain de- composition methods to allow for asynchronous updates. It also generalizes the BPX multigrid method to allow for use as solvers instead of as precon- ditioners, possibly with asynchronous updates, and is applicable to nonlinear problems. Applications to an overlapping domain decomposition for obstacle problems are also studied. The method of this work is also closely related to relaxation methods for nonlinear network flow. Accordingly, we specialize our convergence rate results to the above methods. The asynchronous method is implementable in a multiprocessor system, allowing for communication and computation delays among the processors.

1. Introduction With the advent of multiprocessor computing systems, there has been much work in the design and analysis of iterative methods that can take advantage of the parallelism to solve large linear and nonlinear algebraic problems. In these methods, the computation per iteration is distributed over the processors and each processor communicates the result of its computation to the other processors. In some systems, the activities of the processors are highly synchronized (possibly via a central processor), while in other systems (typically those with many processors), the processors may experience communication or computation delays. The latter lack of synchronization makes the analysis of the methods much more difficult. To aid in this analysis, Chazan and Miranker [16] proposed a model of asynchronous computation that allows for communication and computation delays among pro- cessors, and they showed that the Jacobi method for solving a diagonally dominant

Received by the editor September 27, 2000. 2000 Mathematics Subject Classification. Primary 65J10, 65M55, 65Y05; Secondary 65K10, 65N55. Key words and phrases. Convex minimization, space decomposition, asynchronous computa- tion, convergence rate, domain decomposition, multigrid, obstacle problem. The work of the first author was supported by the Norwegian Research Council Strategic Insti- tute Program within Inverse Problems at RF-Rogaland Research, and by Project SEP-115837/431 at Mathematics Institute, University of Bergen. The work of the second author was supported by the National Science Foundation, Grant No. CCR-9311621.

c 2001 American Mathematical Society 1105

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1106 XUE-CHENG TAI AND PAUL TSENG

system of linear equations converges under this model of asynchronous computa- tion. Subsequently, there has been extensive study of asynchronous methods based on such a model (see [5, 6] and references therein). For these methods, convergence typically requires the algorithmic mapping to be either isotone or nonexpansive with respect to the L∞-norm or gradient-like. However, aside from the easy case where the algorithmic mapping is a contraction with respect to the L∞-norm, there have been few studies of the convergence rate of these methods. One such study was done in [55] for an asynchronous gradient-projection method. In this paper, we study the convergence rate of asynchronous Jacobi and Gauss- Seidel type methods for finite- or infinite-dimensional convex minimization of the form ! Xm (1) min F vi , vi∈Ki,i=1,... ,m i=1

where each Ki is a nonempty closed convex set in a real reflexive Banach space V and F is a real-valuedP lower semicontinuous Gˆateau-differentiable function that m is strongly convex on i=1 Ki. Our interest in these methods stems from their close connection to relaxation methods for nonlinear network flow (see [4, 5, 56] and references therein) and to domain decomposition (DD) and multigrid (MG) methods for solving elliptic partial differential equations (see [7, 8, 9, 14, 18, 19, 33, 40, 45, 52, 53, 57] and references therein). For example, the additive and the multiplicative Schwarz methods may be viewed as Jacobi and Gauss-Seidel type methods applied to linear elliptic partial differential equations reformulated as (1) [9, 57]. DD and MG methods are also useful as preconditioners and it can be shown that such preconditioning improves the condition number of the discrete approximation [7, 8, 10, 9, 14, 33, 40, 45, 57]. In addition, DD and MG meth- ods are well suited for parallel implementation, for which both synchronous and asynchronous versions have been proposed. Of the work on asynchronous methods [21, 22, 27, 38, 37, 39, 46], we especially mention the numerical tests by From- mer et al. [22] which showed that, through improved load balancing, asynchronous methods can be advantageous in solving even simple linear equations. Although these tests did not use the coarse mesh in its implementation of the DD method, it is plausible that the asynchronous method would still be advantageous when the coarse mesh is used. However, the convergence rate analysis of the above asynchro- nous methods seems still missing from the literature. In the case where the equation is linear (corresponding to F being quadratic and K1,... ,Km being suitable sub- spaces of V ) or almost linear, this issue has been much studied for synchronous methods (see see [7, 8, 9, 14, 18, 19, 33, 40, 45, 52, 53, 57] and references therein) but little studied for asynchronous methods. In the case where the equation is generally nonlinear (corresponding to K1,... ,Km being suitable subspaces of V ), there are some convergence studies for synchronous methods [15, 18, 44, 52, 53], and none for asynchronous methods. In the case where K1,... ,Km are not all subspaces, there are various convergence studies for synchronous methods (see [1, 12, 23, 25, 28, 29, 30, 31, 34, 35, 36, 47, 50] and references therein) but, again, none for asynchronous methods. The contributions of the present work are two-fold. • We consider an asynchronous version of Jacobi and Gauss-Seidel methods for solving (1), and we show that, under a Lipschitzian assumption on the

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1107

Gˆateau derivative F 0 and a norm equivalence assumption on the product of K1,... ,Km and their sum (see (5) and (6)), this asynchronous method attains a global linear rate of convergence with a convergence factor that can be explicitly estimated (see Theorem 1). This provides a unified convergence and convergence rate analysis for such asynchronous methods. • We apply the above convergence result to (finite-dimensional) linearly con- strained convex programs and, in particular, nonlinear network flow prob- lems. This yields convergence rate results for some asynchronous network relaxation methods (see Section 6). Previous work studied the convergence of these methods, but no rate of convergence result was obtained. We also apply the above convergence result to certain nonlinear elliptic partial differ- ential equations. This yields convergence rate results for some asynchronous parallel DD and MG methods for solving these equations and, in particular, the convergence factor is shown not to depend on the mesh parameters (see Section 7). When implementing multigrid methods on parallel processors, the nodal basis is often organized into different groups. The computation within each group can be sequential while the computation in different groups could be done in parallel. The asynchronous convergence rate analysis provides a convergence rate estimate when computation in different groups is not fully synchronized. Lastly, application to an overlapping DD method for obstacle problems is studied. We show that the method attains a linear rate of conver- gence with a convergence factor depending on the overlapping size, but not on the mesh size or the number of subdomains. We note that alternative approaches such as Newton-type methods have also been applied to develop synchronous DD and MG methods for nonlinear partial differential equations without constraints [2, 3, 11, 26, 41, 58, 59]. However, these methods use the traditional DD and MG approach or use a special two-grid treat- ment. Our approach is different even for nonlinear partial differential equations without constraints.

2. Problem description and space decomposition Let V be a real reflexive Banach space with norm k·k and let V 0 be its dual space, i.e., the space of all real-valued linear continuous functionals on V .The value of f ∈ V 0 at v ∈ V will be denoted by hf,vi, i.e., h·, ·i is the duality pairing of V and V 0. We wish to solve the minimization problem

(2) min F (v) , v∈K where K is a nonempty closed (in the strong topology) convex set in V and F : V 7→ < is a lower semicontinuous convex Gˆateau-differentiable function. We assume F is strongly convex on K or, equivalently, its Gˆateau derivative limt→0(F (v + tw) − F (v))/t, which is a well-defined linear continuous functional of w denoted by F 0(v)(soF 0 : V 7→ V 0), is strongly monotone on K, i.e.,

(3) hF 0(u) − F 0(v),u− vi≥σku − vk2, ∀u, v ∈ K,

where σ>0. It is known that, under the above assumptions, (2) has a unique solutionu ¯ [24, p. 23].

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1108 XUE-CHENG TAI AND PAUL TSENG

We assume that the constraint set K can be decomposed as the Minkowski sum Xm (4) K = Ki i=1

for some nonempty closed convex sets Ki in V , i =1,... ,m. ThisP means that, for ∈ ∈ m any v K, we can find vi Ki, not necessarily unique,P satisfying i=1 vi = v and, ∈ m ∈ conversely, for any vi Ki,i=1... ,m,wehave i=1 vi K. Following Xu [57], we call (4) a space decomposition of K, with the term “space” used loosely here. Then we may reformulate (2) as the minimization problem (1), with (¯u1,... ,u¯m) ∈ beingP a solution (not necessarily unique) of (1) if and only ifu ¯i Ki for i =1,... ,m m and i=1 u¯i =¯u. As was noted earlier, the reformulated problem (1) is of interest because methods such as DD and MG methods may be viewed as Jacobi and Gauss- Seidel methods for its solution. The method we study will be an asynchronous version of these methods. The above reformulation was proposed in [9, 57] (for the case where F is quadratic and K = V ) to give a unified analysis of DD and MG methods for linear elliptic partial differential equations. The general case was treated in [47, 50] (also see [48, 52] for the case of K = V ). For the above space decomposition, we will assume that there is a constant C1 > 0 such that for any vi ∈ Ki,i=1,... ,m,thereexists¯ui ∈ Ki satisfying   1 Xm Xm 2 Xm 2 (5) u¯ = u¯i and ku¯i − vik ≤ C1 u¯ − vi i=1 i=1 i=1 (see [14, p. 95], [50, 52], [57, Lemma 7.1] for similar assumptions). We will also assume F 0 has a weak Lipschitzian property in the sense that there is a constant C2 > 0 such that (6)   1   1 Xm Xm Xm 2 Xm 2 0 0 2 2 hF (wij + uij ) − F (wij ),vii≤C2 max kuij k kvik , i=1,... ,m i=1 j=1 j=1 i=1 ∀ ∈ ∈ ∈ wij K, uij Kj ,vi Ki ,i,j=1,... ,m, { − ∈ }⊂ where we define the set difference Ki = u v : u, v Ki V .Theabove assumption generalizes those in [50, 52, 53] for the case of Ki being a subspace, for

which Ki = Ki. Furthermore, we will paint each of the sets K1,... ,Km one of c colors, with the colors numbered from 1 up to c, such that sets painted the same color k ∈{1,... ,c} are orthogonal in the sense that

X 2 X k k2 ∀ ∈ ∈ (7) vi = vi , vi Ki ,i I(k), i∈I(k) i∈I(k)   X  X  X 0 0 (8) F u + vi , vi ≤ hF (u + vi),vii, i∈I(k) i∈I(k) i∈I(k) ∀ ∈ ∈ ∈ u K, vi Ki ,i I(k),

where I(k)={i ∈{1,... ,m} : Ki is painted color k} (see [14, §4.1], [53] for similar orthogonal decompositions in the case Ki is a subspace). Thus I(1),... ,I(c)are disjoint subsets of {1,... ,m} whose union is {1,... ,m} and I(k)comprisesthe indexes of the sets painted the color k. Although c = m is always a valid choice, in

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1109

some of the applications that we will consider, it is essential that c be independent of m. In the context of a network flow problem, each set Ki may correspond to a node of the network and sets are painted different colors if their corresponding nodes are joined by an arc. In the context of a partial differential equation defined d on a domain Ω ⊂< ,eachsetKi may correspond to a subdomain of Ω and sets are painted different colors if their corresponding subdomains intersect (see Sections 6 and 7 for details). Remark 1. It can be seen that condition (6) is implied by the following strengthened Cauchy-Schwarz inequality (also see [45, p. 155], [57] for the case of quadratic F and subspace Ki): h 0 − 0 i≤ k kk k ∀ ∈ ∈ ∈ F (wij + uij ) F (wij ),vi ij uij vi , wij K, uij Kj ,vi Ki , E m with C2 being the spectral radius of the matrix =[ij ]i,j=1, assumed to be symmetric.

Remark 2. For locally strongly convex problems, the constants σ, C1, C2 may depend on u, v, vi, wij , uij . In this case, the subsequent analysis should be viewed as being local in nature, i.e., it is valid when the iterated solutions lie in a neighborhood ofthetruesolution(seeSection7).

3. An asynchronous space decomposition method

Since F is lower semicontinuous and strongly convex, for each (u1,... ,um) ∈ K1 ×···×Km and each i ∈{1,... ,m}, there exists a unique wi ∈ Ki satisfying  X   X  (9) F uj + wi ≤ F uj + vi , ∀vi ∈ Ki j=6 i j=6 i

(see [24, p. 23]). Let πi(u1,... ,um)denotethiswi.Then(π1,... ,πm)may be viewed as the algorithmic mapping associated with the block Jacobi method for solving (1). Consider an asynchronous version of the block Jacobi method, parameterized by a stepsize γ ∈ (0, 1], which for simplicity we assume to be fixed, that generates a sequence of iterates (u1(t),... ,um(t)), t =0, 1,...,with (u1(0),... ,um(0)) ∈ K1 ×···×Km given, according to the updating formula,

(10) ui(t +1) = ui(t)+γsi(t),i=1,... ,m, wherewedefine  w (t) − u (t)ift ∈ T i (11) s (t)= i i i 0otherwise,  i i (12) wi(t)=πi u1(τ1(t)),... ,um(τm(t)) , i { } i and T is some subset of 0, 1,... and each τj (t) is some nonnegative integer not exceeding t. Since each Ki is convex and γ ∈ (0, 1], an induction argument shows that (u1(t),... ,um(t)) ∈ K1 ×···×Km for all t =0, 1,.... We will assume that the iterates are updated in a partially asynchronous manner [5, Chap. 7], i.e., there exists an integer B ≥ 1 such that (13) {t, t +1,... ,t+ B − 1}∩T i =6 ∅,t=0, 1,... , ∀i, ≤ − i ≤ − i ∀ ∈ i ∀ (14) 0 t τj (t) B 1andτi (t)=t, t T , i, j.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1110 XUE-CHENG TAI AND PAUL TSENG

We say that a color k ∈{1,... ,c} is active at time t if there exists an i ∈ I(k)such that t ∈ T i. Recall that I(k) indexes those sets painted the color k.Denotingby ct the total number of colors that are active at time t, we will also assume that   σ 1 (15) γ

Notice that γ does not depend on m nor on C1. Although (15) may give a very conservative value of γ, this can be remedied by starting with a larger γ and de- creasing γ whenever “sufficient progress” (defined in any reasonable way) is not made and (15) is not satisfied. Remark 3. The above asynchronous method models a situation in which compu- tation is distributed over m processors with the ith processor being responsible for i updating ui and communicating the updated value to the other processors. T is the set of “times” at which ui is updated by processor i (by applying πi to its current copy of (u1,... ,um)); ui(t)isthevalueofui known to processor i at time i t;andτj (t) is the time at which the value of uj used by processor i at time t is − i generated by processor j,sot τj (t) is the communication delay from processor j to processor i at time t. Thus, the processors need not wait for each other when m updating (ui)i=1, and the values used in the computation may be out-of-date. i Remark 4. The assumption that τi (t)=t can perhaps be removed through a more careful analysis, though this seems to be a reasonable assumption in practice. In- tuitively, (13) says that each component ui is updated at least once every B time units, and (14) says that the information used by processor i from processor j should not be out-of-date by more than B time units. This assumption of bounded communication and computation delay is needed for a convergence rate analysis.

4. Convergence rate of the asynchronous method

In this section we prove that the iterates (u1(t),... ,um(t)), t =0, 1,..., gener- ated by the asynchronous method (10)–(15) attain linear rate of convergence, with a factor that depends on σ, C1,C2,c and B,γ only (see Theorem 1). While parts of our proof use ideas from the analysis of asynchronous gradient-like methods [5, §7.5], [55], a number of new proof ideas are introduced to account for different prob- lem assumptions and different natures of the Jacobi and Gauss-Seidel algorithmic mappings. To simplify the notation in our analysis, define Xm Xm i (16) u(t)= uj(t),zi(t)= uj(τj (t)), j=1 j=1 ∈ i i for all i and t.Ift T , then the definition (12) of wi(t) and the fact that τi (t)=t and F is Gˆateau-differentiable imply wi(t) satisfy the optimality condition 0 (17) hF (zi(t)+wi(t) − ui(t)) ,vi − wi(t)i≥0, ∀vi ∈ Ki. Our analysis will be based on estimates given in the following two key lemmas.

Lemma 1 (Descent estimate). Let A1 and A2 be defined by C2B2 σ (18) A = 2 ,A= − γ2A . 2 σ 1 4 2

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1111

For t =0, 1,..., we have Xm t+XB−1 Xm Xt−1 2 3 2 F (u(t + B)) ≤ F (u(t)) − γA1 ||sj(τ)|| + γ A2 ksj(τ)k . j=1 τ=t j=1 τ=t−B+1

Proof. Fix any time t ∈{0, 1,...}. Recall that ct is the total number of colors active at time t and, without loss of generality, we assume that the first ct colors are active. Then si(t) = 0 for all i ∈ I(k)andk>ct, so by defining X ek(t)= si(t) i∈I(k) and using (16), (10) and the convexity of F ,wehave   Xm F (u(t +1)) = F u(t)+γ si(t)  i=1  Xct X = F u(t)+γ si(t) k=1 i∈I(k)   Xct = F (1 − ctγ)u(t)+ γ(u(t)+ek(t)) k=1 Xct ≤ (1 − ctγ)F (u(t)) + γ F (u(t)+ek(t))  k=1  Xct (19) = F (u(t)) + γ F (u(t)+ek(t)) − F (u(t)) . k=1 0 Since u(t) ∈ K and u(t)+ek(t) ∈ K, the strong monotonicity of F on K given in (3) implies σ (20) F (u(t)) ≥ F (u(t)+e (t)) −hF 0 (u(t)+e (t)) ,e (t)i + ke (t)k2. k k k 2 k Define Xj Xm i i φj (t)= uk(τk(t)) + uk(t),j=0, 1,... ,m. k=1 k=j+1

i i Then φ0(t)=u(t)andφm(t)=zi(t)and i − i i − ∈ φj(t) φj−1(t)=uj(τj (t)) uj(t) Kj ,j=1,... ,m.

i If t ∈ T , then setting vi = ui(t) in (17) and noting that si(t)=wi(t) − ui(t)(see (11)), we obtain that 0 0 ≤−hF (zi(t)+si(t)) ,si(t)i 0 0 0 = −hF (zi(t)+si(t)) − F (u(t)+si(t)),si(t)i−hF (u(t)+si(t)),si(t)i Xm  − h 0 i − 0 i i = F (φj (t)+si(t)) F φj−1(t)+si(t) ,si(t) j=1 0 −hF (u(t)+si(t)),si(t)i.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1112 XUE-CHENG TAI AND PAUL TSENG

i If t 6∈ T ,thensi(t) = 0 and the above inequality holds trivially. Combining the above inequality with (7) and (9) and (20), we obtain that (21)   Xct F (u(t)+ek(t)) − F (u(t)) k=1 c c Xt X σ Xt X ≤ hF 0(u(t)) + s (t)),s (t)i− ks (t)k2 i i 2 i k=1 i∈I(k) k=1 i∈I(k) Xm σ Xm = hF 0(u(t)) + s (t)),s (t)i− ks (t)k2 i i 2 i i=1 i=1 Xm Xm  Xm 0 i 0 i σ 2 ≤− hF (φ (t)+s (t)) − F φ − (t)+s (t) ,s (t)i− ks (t)k . j i j 1 i i 2 i i=1 j=1 i=1 Substituting (21) into (19) and using (6) yields F (u(t +1))≤ F (u(t))

  1   1 Xm 2 Xm 2 k i − k2 k k2 + γC2 max uj(τj (t)) uj(t) si(t) i=1,... ,m (22) j=1 i=1 σ Xm − γ ks (t)k2. 2 i i=1 − ≤ i ≤ Since t B +1 τj (t) t for all i and j, we also have from (10) and the triangle inequality that ! Xt−1 2 Xt−1 k i − k2 ≤ 2 k k ≤ 2 k k2 (23) uj(τj (t)) uj(t) γ sj(τ) γ B sj(τ) . τ=t−B+1 τ=t−B+1 Combining (22) and (23) yields F (u(t +1))≤ F (u(t))  −  1   1 √ Xm Xt 1 2 Xm 2 2 2 2 + γ C2 B ksj(τ)k ksi(t)k j=1 τ=t−B+1 i=1 (24) σ Xm − γ ks (t)k2 2 i i=1 − C2B Xm Xt 1 σ Xm ≤ F (u(t)) + γ3 2 ks (τ)k2 − γ ks (t)k2, σ j 4 i j=1 τ=t−B+1 i=1 2 2 where the second inequality uses the identity ab ≤ (a + b )/2witha andp b being 1/4 the two square-root terms multiplied and divided, respectively, by B 2γC2/σ. Applying the above argument successively to t, t +1,... ,t+ B − 1, we obtain F (u(t + B)) − F (u(t))   − − σ γ2C2B2 Xm t+XB 1 C2B2 Xm Xt 1 ≤−γ − 2 ||s (τ)||2 + γ3 2 ||s (τ)||2. 4 σ j σ j j=1 τ=t j=1 τ=t−B+1 This proves the lemma.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1113

The next key lemma estimates the optimality gap F (u(t + B)) − F (¯u), whereu ¯ is the unique solution of (2).

Lemma 2 (Optimality gap estimate). Let A3 and A4 be defined by C B2 8C2C2B 3C 6C2C2 (25) A = 2 + 1 2 ,A= 2 + 1 2 + A . 4 2 σ 3 2 σ 4 For t =0, 1,..., we have

F (u(t + B)) − F (¯u) ≤ (1 − γ)(F (u(t)) − F (¯u)) Xm t+XB−1 Xm Xt−1 2 3 2 + γA3 ||sj(τ)|| + γ A4 ksj(τ)k . j=1 τ=t j=1 τ=t−B+1

Proof. Fix any t ∈{0, 1,...}.Foreachi ∈{1,... ,m},letti denote the greatest element of T i less than t + B. Then we have from (11) and (17) that  0 i i i (26) F zi(t )+si(t ) ,vi − wi(t ) ≥ 0, ∀vi ∈ Ki.

We also have from (10) and (16) that

i i ui(t + B)=ui(t )+γsi(t ), Xm Xm Xm i i i u(t + B)= ui(t +1) = ui(t )+γ si(t ). i=1 i=1 i=1 For notational simplicity, define Xm Xm i i w(t)= wi(t ), uˆ(t)= ui(t ). i=1 i=1

By assumption, there existsu ¯i ∈ Ki, i =1,... ,m, such that (5) holds with vi = i wi(t ), i.e.,

  1 Xm Xm 2 i 2 (27) u¯ = u¯i and kwi(t ) − u¯ik ≤ C1kw(t) − u¯k. i=1 i=1

Then (¯u1,... ,u¯m) is a solution of the convex program (1) and, by F being Gˆateau- differentiable, it satisfies the optimality condition Xm 0 (28) hF (¯u),vi − u¯ii≥0, ∀vi ∈ Ki,i=1,... ,m. i=1 Defining

Xj Xm i k i i φj (t)= wk(t )+ uk(τk(t )),j=0, 1,... ,m, k=1 k=j+1

i i i we have that φ0(t)=zi(t )andφm(t)=w(t)and i − i j − i i ∈ (29) φj(t) φj−1(t)=wj(t ) uj(τj (t )) Kj ,j=1,... ,m.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1114 XUE-CHENG TAI AND PAUL TSENG

i Setting vi =¯ui in (26) and vi = wi(t ) in (28), we obtain that         F 0 w(t) − F 0(¯u),w(t) − u¯ ≤ F 0 w(t) ,w(t) − u¯       Xm 0 0 i i i ≤ F w(t) − F zi(t )+si(t ) ,wi(t ) − u¯i i=1       Xm 0 0 i i = F w(t) − F zi(t ) ,wi(t ) − u¯i i=1       Xm 0 i 0 i i i + F zi(t ) − F zi(t )+si(t ) ,wi(t ) − u¯i i=1 Xm Xm 0 i − 0 i i − = F (φj (t)) F (φj−1(t)),wi(t ) u¯i i=1 j=1       Xm 0 i 0 i i i + F zi(t ) − F zi(t )+si(t ) ,wi(t ) − u¯i i=1   1   1 Xm 2 Xm 2 ≤ k i i − j k2 k i − k2 C2 max uj(τj (t )) wj (t ) wi(t ) u¯i i=1,... ,m j=1 i=1   1   1 Xm 2 Xm 2 i 2 i 2 + C2 ksi(t )k kwi(t ) − u¯ik i=1 i=1   −  1 Xm t+XB 2 2 2 2 j 2 ≤ C1C2 4γ B ksj(τ)k +2ksj(t )k kw(t) − u¯k j=1 τ=t−B+1   1 Xm 2 i 2 (30) + C1C2 ksi(t )k kw(t) − u¯k, i=1

where the third inequality uses (6) and (29); the fourth inequality uses (27) and the fact that

k i i − j k2 k i i − j − j k2 uj(τj (t )) wj (t ) = uj(τj (t )) uj(t ) sj(t ) ≤ k i i − j k2 k j k2 2 uj(τj (t )) uj(t ) +2 sj(t )   t+XB−2 2 2 j 2 ≤ 2γ ksj(τ)k +2ksj(t )k τ=t−B+1 t+XB−2 2 2 j 2 ≤ 4γ B ksj(τ)k +2ksj(t )k τ=t−B+1

(see (10), (11), (13), (14)). Also, the strong monotonicity (3) of F 0 on K implies

hF 0(w(t)) − F 0(¯u),w(t) − u¯i≥σkw(t) − u¯k2,

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1115

which together with (30) yields   Xm  t+XB−2  1 C C 2 kw(t) − u¯k≤ 1 2 4γ2B ks (τ)k2 +2ks (tj)k2 σ j j j=1 τ=t−B+1   Xm 1 C C 2 (31) + 1 2 ks (ti)k2 . σ i i=1 Next, since F 0(w(t)) is a subgradient of F at w(t) [20, p. 23], we have F (w(t)) − F (¯u) ≤hF 0(w(t)),w(t) − u¯i,

so putting vi =¯ui in (26) and adding it to the above inequality yields F (w(t)) − F (¯u) Xm 0 0 i i i ≤ hF (w(t)) − F (zi(t )+si(t )),wi(t ) − u¯ii i=1    Xm  t+XB−2  1 C2C2 2 (32) ≤ 1 2  4γ2B ks (τ)k2 +2ks (tj)k2 σ j j j=1 τ=t−B+1 !   1 2 Xm 2 i 2 + ksi(t )k i=1  −  2C2C2 Xm t+XB 2 Xm ≤ 1 2 4γ2B ks (τ)k2 +3 ks (ti)k2 , σ j i j=1 τ=t−B+1 i=1 where the second inequality uses (30) and (31) and the last inequality follows from the identity (a + b)2 ≤ 2(a2 + b2). i Next we estimate F (ˆu(t)) − F (u(t)). Let t¯ =maxi=1,... ,m t and, for each i ∈ {1,... ,m} and τ ∈{t,...,t¯}, define Xm i (33) u˜i(τ)=ui(min{τ,t }), u˜(τ)= u˜i(τ). i=1

Then, for each i ∈{1,... ,m} and τ ∈{t,...,t¯− 1},either˜ui(τ +1)=u ˜i(τ)so that 0 hF (zi(τ)+si(τ)) , u˜i(τ) − u˜i(τ +1)i =0 i i oru ˜i(τ +1)=˜6 ui(τ)sothatτ ∈ T and τ

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1116 XUE-CHENG TAI AND PAUL TSENG

we obtain that (34)

F (˜u(τ +1))− F (˜u(τ)) ≤−hF 0(˜u(τ +1)), u˜(τ) − u˜(τ +1)i Xm 0 0 ≤ hF (zi(τ)+si(τ)) − F (˜u(τ +1)), u˜i(τ) − u˜i(τ +1)i i=1 Xm 0 0 = hF (zi(τ)) − F (˜u(τ +1)), u˜i(τ) − u˜i(τ +1)i i=1 Xm 0 0 + hF (zi(τ)+si(τ)) − F (zi(τ)), u˜i(τ) − u˜i(τ +1)i i=1 Xm Xm h 0 i − 0 i − i = F (φj−1(τ)) F (φj (τ)), u˜i(τ) u˜i(τ +1) i=1 j=1 Xm 0 0 + hF (zi(τ)+si(τ)) − F (zi(τ)), u˜i(τ) − u˜i(τ +1)i i=1   1   1 Xm 2 Xm 2 ≤ k i − i k2 k − k2 C2 max φj−1(τ) φj(τ) u˜i(τ) u˜i(τ +1) i=1,... ,m j=1 i=1   1   1 2 Xm 2 2 2 + C2 max ksi(τ)k ku˜i(τ) − u˜i(τ +1)k i=1,... ,m i=1  m  1  m  1 X  2 X 2 ≤ k − i k2 k k2 γC2 max u˜j(τ +1) uj(τj (τ)) si(τ) i=1,... ,m j=1 i=1 Xm 2 + γC2 ksi(τ)k i=1   1   1 Xm τX+1 2 Xm 2 Xm 2 2 2 2 ≤ γC2 γ B ksj(ν)k ksi(τ)k + γC2 ksi(τ)k j=1 ν=τ−B+1 i=1 i=1 C B Xm τX+1 3C Xm ≤ γ3 2 ks (ν)k2 + γ 2 ks (τ)k2, 2 j 2 i j=1 ν=τ−B+1 i=1 where the first inequality uses the subgradient property of F 0(˜u(τ + 1)) [20, p. 23]; the third inequality uses (6); the fourth and fifth inequalities use (33) and (10) and an inequality analogous to (23); the last inequality uses the identity ab ≤ (a2 +b2)/2 with a and b being the two square-root terms. Summing the above inequality over τ = t, t +1,... ,t¯− 1 and observing thatu ˜(t¯)=ˆu(t)and˜u(t)=u(t), we then have

¯− ¯− C B Xm Xt 1 Xτ+1 3C Xm Xt 1 F (ˆu(t)) − F (u(t)) ≤ γ3 2 ks (ν)k2 + γ 2 ks (τ)k2 2 j 2 i j=1 τ=t ν=τ−B+1 i=1 τ=t − − C B2 Xm t+XB 1 3C Xm t+XB 1 (35) ≤ γ3 2 ks (τ)k2 + γ 2 ks (τ)k2. 2 j 2 i j=1 τ=t−B+1 i=1 τ=t

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1117

Finally, using the convexity of F and γ ∈ [0, 1], we see from (11) and (32) and (35) that F (u(t + B)) − F (¯u)   Xm = F ui(t + B) − F (¯u)  i=1  Xm i i i = F (ui(t )+γ(wi(t ) − ui(t ))) − F (¯u) i=1 = F ((1 − γ)ˆu(t)+γw(t)) − F (¯u) ≤ (1 − γ)F (ˆu(t)) + γF(w(t)) − F (¯u) =(1− γ)(F (ˆu(t)) − F (¯u)) + γ(F (w(t)) − F (¯u)) ≤ (1 − γ)(F (u(t)) − F (¯u)) − − C B2 Xm t+XB 1 3C Xm t+XB 1 + γ3 2 ks (τ)k2 + γ 2 ks (τ)k2 2 j 2 i j=1 τ=t−B+1 i=1 τ=t − 8C2C2B Xm t+XB 2 6C2C2 Xm + γ3 1 2 ks (τ)k2 + γ 1 2 ks (ti)k2. σ j σ i j=1 τ=t−B+1 i=1 Using γ ≤ 1 then proves the lemma.

We will now use Lemmas 1 and 2 to prove our convergence rate result. To simplify the notations, define Xm kBX−1 2 ak = F (u(kB)) − F (¯u),bk = ksj(τ)k ,k=1, 2,... . j=1 τ=kB−B By Lemmas 1 and 2, we have 3 (36) ak ≤ ak−1 − γA1bk + γ A2bk−1, 3 (37) ak ≤ (1 − γ)ak−1 + γA3bk + γ A4bk−1,

where A1,A2,A3,A4 are given by (18) and (25). By (15), we have A1 > 0. Choose γ sufficiently small so that (38)      A1 −1 − A1 3/2 A1A4 −1 1/2 2 % =max 1+ 1+(1 γ) + γ A2 + ,A1 (γ + γ A2) A3 A3 A3 < 1.

3/2 Also, define a =max{a1,γ b1}/%. We claim that 3/2 n (39) max{an,γ bn}≤a% for n =1, 2,.... We prove this by induction on n. Clearly (39) holds for n =1by our definition of a. Suppose (39) holds for n = k − 1, where k>1. Multiplying (37) by A1/A3 and adding it to (36) gives      A1 A1 3/2 A1A4 3/2 1+ ak ≤ 1+(1− γ) ak−1 + γ A2 + (γ bk−1), A3 A3 A3

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1118 XUE-CHENG TAI AND PAUL TSENG

3/2 k−1 which together with the inductive hypothesis max{ak−1,γ bk−1}≤a% and (38) yields    A1 −1 A1 3/2 A1A4 k−1 k ak ≤ 1+ 1+(1− γ) + γ A2 + a% ≤ a% . A3 A3 A3

Similarly, (36) and ak ≥ 0give 3/2 1/2 2 3/2 γ A1bk ≤ γ ak−1 + γ A2(γ bk−1), 3/2 k−1 which together with max{ak−1,γ bk−1}≤a% and (38) yields 3/2 ≤ −1 1/2 2 k−1 ≤ k γ bk A1 (γ + γ A2)a% a% . Thisshowsthat(39)holdsforn = k, completing our induction proof. Thus, we have shown linear rate of convergence (in the root sense) for both an and bn, with a factor of %. The latter implies ui(t), t =0, 1,...,isaCauchy sequence for each i and hence it converges strongly. This is summarized in the theorem below. Theorem 1. Consider the minimization problem (2) and the space decomposition (4) of Section 2 (see (3), (5)–(9)).Let(u1(t),... ,um(t)), t =0, 1,..., be generated by the asynchronous space decomposition method of Section 3 (see (10)–(12) and (13), (14)) with stepsize γ satisfying (15), (38).Then,thereexista>0 and % ∈ (0, 1), depending on σ, C1,C2 and B,γ only, such that F (u(nB)) − F (¯u) ≤ a%n,n=1, 2,... , where u(t) is given by (16) and u¯ denotes the unique solution of (2). Moreover, u(t) converges strongly to u¯ and, for each i ∈{1,... ,m}, ui(t) converges strongly as t →∞.

5. Convergence rate of the synchronous sequential and parallel algorithms It is readily seen that the following Jacobi version of the method is a special case of the asynchronous space decomposition method (10)–(12) with T i = {0, 1,...} i and τj (t)=t for all i, j, t (so B =1andct = c). Thus, Theorem 1 can be applied to establish its linear convergence and obtain an estimate of the factor % under the assumptions of Section 2. Moreover, by observing that in this case the left-hand sideof(23)iszerosothatLemma1holdswithA2 = 0, the stepsize restriction (15) can be relaxed to γ ≤ 1/ct. Algorithm 1. Step 1. Choose initial values ui(0) ∈ Ki, i =1,... ,m, and stepsize γ =1/c,where c is defined as in Section 2. Step 2. For each t =0, 1,..., find wi(t) ∈ Ki in parallel for i =1,... ,m that satisfies     X X     F uj(t)+wi(t) ≤ F uj(t)+vi , ∀vi ∈ Ki . j=6 i j=6 i Step 3. Set

ui(t +1)=ui(t)+γ(wi(t) − ui(t)) , andgotothenextiteration.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1119

The following Gauss-Seidel version of the method is also a special case of the asynchronous space decomposition method (10)–(12) with γ =1,T i = { − } i i 1+km k=0,1,... and τj (t)=t for all i, j, t (so B = m and ct =1),Here Theorem 1 cannot be directly applied due to γ = 1 possibly violating (15). How- ever, by observing that in this case the left-hand side of (23) is again zero so that Lemma1holdswithA2 = 0, the proof of the theorem can be easily modified to establish linear convergence of this method under the assumptions of Section 2, with factor % depending on m, σ, C1,C2 only. Moreover, by grouping sets of the same color into one set, we can ensure that m = c,wherec is defined as in Section 2. Algorithm 2. Step 1. Choose initial values ui(0) ∈ Ki, i =1,... ,m. Step 2. For each t =0, 1,..., find ui(t +1)∈ Ki sequentially for i =1,... ,m that satisfies   X X   F uj(t +1)+ui(t +1)+ uj(t) ji   X X   ≤ F uj(t +1)+vi + uj(t) , ∀vi ∈ Ki . ji Step 3. Go to the next iteration. The above two methods for solving (2) were studied in [47] (also see [48, 49, 50]), where convergence of the methods was proved under weaker assumptions. However, no rate of convergence result was given. In [52], a linear rate of convergence for the above two methods was proved for the unconstrained case of K = V .Inthe finite-dimensional case of K = V =

with 0 <σ0 <σ. However, σ would need to be known explicitly and both γ and % would depend on σ0.

6. Applications to convex programming In this section we consider the Euclidean space V = V 0 =

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1120 XUE-CHENG TAI AND PAUL TSENG

6.1. Primal applications. Consider the problem (2), where F : 0. If G is twice differentiable, this assumption essentially amounts to G00 having bounded eigenvalues and the Hessian (G00)−1 n having bounded entries on < .Let¯u denote the unique solution of (2) and let Ai denote the ith row of A. We can decompose K in the form (4) with { ∈

First we show that, for any vi ∈ Ki, i =1,... ,m,thereexists¯ui ∈ Ki satisfying (5), where k −1 T −2 −1k (42) C1 =max DI BI (BI DI BI ) , ∅6=I⊂{1,... ,m} k T k ∈ with DI being the diagonal matrix with diagonal entries Ai , i I,andBI beingasubmatrixofAI =[Ai]i∈I comprising linearly independent columns of

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1121

T AI spanning the column space of AI . To see this, notice thatu ¯ = A λ¯ for some ¯ ∈

where rI is the subvector of r corresponding to the columns of BI . This yields ¯ −2 T −2 −1 ξI = DI BI (BI DI BI ) rI , and hence

optimal value of (43) ≤kDI ξ¯I k k −1 T −2 −1 k = DI BI (BI DI BI ) rI ≤ C1krI k

≤ C1krk = C kAT (λ¯ − µ)k 1 Xm

= C1 u¯ − vi . i=1 T Since Ki and Kj lie in orthogonal subspaces if AiAj =0,wecancolorK1,... ,Km as discussed in subsection 6.1 and show that (6) holds with C2 = Lcˆ,whereL is ∗ 0 the Lipschitz constant for (G ) andc ˆ is the maximum number of rows Aj that are not orthogonal to an arbitrary row Ai. If we replace the inequality Ax ≥ b in (40) by an equation Ax = b, the constraint ∈

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1122 XUE-CHENG TAI AND PAUL TSENG

T ∈< ∼ and vi = Ai µi for some µi +,wherek (i, j) means that column k has a 1inrowi and a −1inrowj or, equivalently, arc k is directed from node i to node j. Choose any spanning tree for the diagraph and choose a node ¯i such that ¯ ¯ ¯ − ¯ T λ¯i =mini λi.Letλi = µ¯i +(λi λ¯i)andui = Ai λi for all nodes i in the network. ≥ ¯ ≥ ¯ ≥ Since µ¯i 0andλi λ¯i,wehaveλi 0 for all nodes i. Since each node i can be reached from ¯i via a simple path P in the spanning tree, we also have i X X |λi − µi| = − (λk − µk − λl + µl)+ (λk − µk − λl + µl) − (k,l)∈P + (k,l)∈P i i X X = − (λ¯k − λ¯l − µk + µl)+ (λ¯k − λ¯l − µk + µl) − (k,l)∈P + (k,l)∈P X i i ≤ |λ¯k − λ¯l − µk + µl|

(k,l)∈Pi   1 p X 2 2 ≤ hi |λ¯k − λ¯l − µk + µl|

(k,l)∈Pi   1 p Xn 2 2 ≤ hi |λ¯k − λ¯l − µk + µl| j=1 j∼(k,l)

p Xm

= hi u¯ − vp , p=1 + − where Pi and Pi denote the set of forward arcs and backward arcs in Pi and hi denotes the number of arcs in Pi.Thus, Xm Xm k − k2 k T − k2 ui vi = Ai (λi µi) i=1 i=1 Xm Xm Xm 2 k T k2| − |2 ≤ − = Ai λi µi dihi u¯ vp , i=1 i=1 p=1 wherepdiPis the number of arcsP incident to node i. This shows that (5) holds with m m C1 = i=1 dihi.Noticethat i=1 di =2n and hi is at most the diameter of the spanning tree. Since the choice of the spanning tree is arbitrary, we can choose it T to minimize C1.Also,AiAj =0ifandonlyifnodesi and j are not joined by an arc, soc ˆ =max{d1,... ,dm} and the coloring of K1,... ,Km is equivalent to graph coloring on the diagraph. In the above case of aP nonlinear network flow problem, if G is also separable n n <7→< in the sense that G(x)= j=1 Gj(xj ) for all x =(xj )j=1 and Gj : ,then πi(u1,... ,um) given by (9) depends on only those uk for which node k is a neighbor of node i and the asynchronous method (10)–(12) reduces to the asynchronous network relaxation method studied in [5, §7.2.3] and [56]. It is known that iterates generated by this method converge for any stepsize γ ∈ (0, 1), assuming G∗ is convex differentiable and (41) has a solution (G need not be defined everywhere on

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1123

7. Applications to partial differential equations without constraints 1 { ∈ 1 In this section we consider the Sobolev spaceR PV = H0 (Ω) = v H (Ω) : } h i d v =0on∂Ω with duality pairing u, v = Ω( i=1 ∂iu∂iv + uv)dx and norm 1 kvk = kvkH1(Ω) = hv, vi 2 , where Ω is an open, bounded, and connected subset d 1 2 of < with Lipschitz continuous boundary ∂Ω, H (Ω) = {v ∈ L (Ω) : ∂iv ∈ 2 } L (Ω),i=1,...R ,d ,and∂iv is theR locally Lebesgue integrable real function defined on Ω satisfying ∂ vφdx= − v ∂φ dx for all φ ∈ C∞(Ω) = {φ ∈ C∞(Ω) : Ω i Ω ∂xi 0 φ has compact support} [17, pp. 10-13]. We will consider two nonlinear elliptic partial differential equations formulated as the minimization problem (2) and, for each, we will consider the space decomposition (4) corresponding to, respectively, DD and MG methods, and we will develop corresponding estimatesP for C1 in (5), d 1 | | 2 2 for C2 in (6) and for c in (7)–(9). Throughout, we denote x =( i=1 xi ) for any d ∈

∂ai ∂ai (46) max |ai(x, p)|, (x, p) , (x, p) ≤ L j=1,2,...d ∂xj ∂pk k=0,1,··· ,d for all (x, p) ∈ Ω ×0 a constant. Under these assumptions, the problem (2), which has the equation formulation h 0 i ∀ ∈ 1 (48) F (u),v =0, v H0 (Ω), ∈ 1 is well posed and has a unique solution u H0 (Ω) (see [18, p. 302] and [32]). Moreover, straightforward calculation shows that hF 0(u) − F 0(v),u− vi≥σku − vk2, (49) hF 0(u) − F 0(v),wi≤L(d +1)ku − vkkwk, for all u, v, w ∈ H1(Ω), so F 0 is strongly monotone and Lipschitz continuous. The second partial differential equation corresponds to the minimization problem (2) with Z   1 1|∇ |2 1 4 − (50) K = H0 (Ω),F(v)= v + v fv dx, Ω 2 4

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1124 XUE-CHENG TAI AND PAUL TSENG

where f ∈ L2(Ω) and d ∈{2, 3}. The corresponding equation is the simplified Ginzburg-Landau equation for superconductivity: −∆u + u3 = f in Ω, (51) u =0on∂Ω, where u is the waveP function which is valid in the absence of internal magnetic field d 0 [54], and ∆u = i=1 ∂i(∂iu) denotes the Laplacian of u.NoticethatF has the 3 form (44), with a0(x, p)=p0 and ai(x, p)=pi, i =1,... ,d, which does not satisfy (47). Nevertheless, straightforward calculation shows Z hF 0(u) − F 0(v),u− vi = |∇u −∇v|2 +(u3 − v3)(u − v)dx ZΩ ≥ |∇ −∇ |2 | − |2 u v dx = u v 1,Ω, Ω 1 for all u, v ∈ H (Ω). Since the semi-norm |·|1,Ω is equivalent to the norm k·kon 1 0 1 H0 (Ω) [17, p. 12], this shows F is strongly monotone on H0 (Ω). In subsections 7.1 and 7.2 below, we will study asynchronous DD and MG meth- ods for solving the above two equations (48) and (51). We will analyze the con- vergence rate of the methods by estimating the constants C1, C2 and c for the corresponding space decomposition of the finite element approximation subspace and then applying Theorem 1. In particular, we will show that the above two equa- tions can be solved in parallel with a convergence factor that is independent of the finite element mesh size h, i.e., the number of iterations to reach a desired solution accuracy is independent of h. 7.1. Domain decomposition methods. 7.1.1. Decomposition of the domain Ω. In DD methods, the domain Ω is decom- posed into the disjointS union of subdomains Ωi, i =1,... ,m, and their boundary, ∪ m ∪ ∩ ∅ 6 i.e., Ω ∂Ω= i=1(Ωi ∂Ωi)andΩi Ωj = for i = j. This is illustrated in Figure 1 where a rectangular-shaped domain in <2 is decomposed into the dis- joint union of m = 25 rectangular-shaped subdomains and their boundary. The subdomains, which are assumed to form a regular quasi-uniform division (see p. 124 and Eq. (3.2.28) of [17] for definitions) with a specified maximum diameter of H, are the finite elements of the coarse mesh. To form the fine mesh for the finite element approximations, we further divide each Ωi into finite elements of size (i.e., maximum diameter) h such that all the fine-mesh elements together form a regular finite element division of Ω. We denote this fine division by Th.For δ { ∈T ≤ } each Ωi, we consider an enlarged subdomain Ωi = e h :dist(e, Ωi) δ , | − | δ where dist(e, Ωi)=minx∈e,y∈Ωi x y . The union of Ωi , i =1,... ,m,covers ⊂ 1 ⊂ 1 Ω with overlap proportional to δ.LetK0 H0 (Ω) and K H0 (Ω) denote the continuous, piecewise rth-order polynomial (r ≥ 1) finite element subspaces, with zero trace on ∂Ω, over the H-level and h-level subdivisions of Ω, respectively. For i =1,... ,m,letKi denote the continuous, piecewise rth-order polynomial finite δ element subspace with zero trace on the boundary ∂Ωi andextendedtohavezero δ ∪ δ value outside Ωi ∂Ωi .ThenKi = Ki for i =0, 1,... ,m, and it can be shown that Xm K = Ki. i=0

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1125

a) The global fine mesh b) Color 0: the coarse mesh c) Color 1 subdomains

d) Color 2 subdomains e) Color 3 subdomains f) Color 4 subdomains

Figure 1. Decomposition of a rectangular-shaped domain in <2.

Thus the space decomposition (4), with summation index from 0 to m,holds.We δ assume that the overlapping subdomains are chosen such that each subdomain Ωi and its corresponding finite element subspace Ki can be painted one of nc colors (numbered from 1 to nc), with subdomains painted the same color being pairwise nonintersecting. The coarse mesh and its corresponding subspace K0 are painted the color 0. Moreover, nc should be independent of h. For general domain Ω, finding overlapping subdomains with such property is nontrivial. If Ω is the Cartesian product of intervals, we can easily find overlapping subdomains with nc =2if d =1,andnc ≤ 4ifd =2,andnc ≤ 6ifd = 3. For the example of Figure 1, d =2andnc = 4. Then the total number of colors needed for (7) and (9) to hold is c = nc +1. { }m 7.1.2. Estimating C1 for equations (48) and (51). Let θi i=1 be a smooth partition { }m ∈ ∞ ≥ of unityP with respect to Ωi i=1, i.e., θi C (Ω) with θi 0, θi =0outsideof m Ωi,and i=1 θi =1.LetIh be the finite element interpolation mapping onto K which uses the function values at the h-level nodes. For anyR v ∈ K,letv0 be the 2 ∈ − projection in the L -norm of v onto K0, i.e., v0 K0 and Ω(v0 v)φdx=0for ∈ − ∈ all φ K0,andletvi = Ih(θi(Pv v0)). Then, it can be seen that vi Ki for m i =0, 1,... ,m and satisfy v = i=0 vi [45, pp. 163-165], [57, p. 607]. Moreover, by further choosing θi so that |∇θi| has a certain boundedness property, it was recently shown in [53, Lem. 4.1] that, for any s ≥ 1,

! 1 ! s   1 Xm 2 s 1 H kv k ≤ Ccs 1+ kvk, i δ i=0

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1126 XUE-CHENG TAI AND PAUL TSENG

where C is a constant independent of the mesh parameters and m.Takings =2and using the subspace nature of Ki, we obtain that, for any vi ∈ Ki, i =0, 1,... ,m, there existsu ¯i ∈ Ki satisfying (5) (with summation index from 0 to m), where !   1 √ H 2 C = C c 1+ . 1 δ

(also see [14, Thm. 16] and a work of Dryja and Widlund cited therein for related results). By choosing the overlapping size δ proportional to the coarse-mesh size H, the constant C1 will be independent of the mesh parameters and the number of subdomains m.

7.1.3. Estimating C2 for equations (48) and (51). Consider F given by (50), asso- ciated with the equation (51). By the mean value theorem, for any u ∈<,v ∈<,we have |u3 − v3| =3|θu +(1− θ)v|2|u − v|≤3(|u| + |v|)2|u − v|≤6(|u|2 + |v|2)|u − v| for some θ ∈ [0, 1]. Thus, using the continuous embedding of H1(Ω) in Lp(Ω) for p<2d/(d − 2) and d =2, 3 (see [17, p. 114], [24, p. 21]), we have for any u, v ∈ H1(Ω) and any subdomain Ω0 of Ω that u, v ∈ L4(Ω) and Z Z

3 3 2 2 (u − v )wdx ≤ 6 |u| |u − v||w| + |v| |u − v||w| dx 0 0 Ω ΩZ  Z  Z  4 1 4 1 2 2 1 ≤ 6 |u| dx 2 + |v| dx 2 |u − v| |w| dx 2  Ω0 Ω0 Ω0 2 2 ≤ k k k k k − k 0 k k 0 6 u L4(Ω0) + v L4(Ω0) u v L4(Ω ) w L4(Ω )   2 2 ≤ k k k k k − k 0 k k 0 C u H1(Ω0) + v H1(Ω0) u v H1(Ω ) w H1(Ω ),

δ where C depends only on the embedding constant. Also, define Ω0 =Ωforconve- ∈ δ nience, so that every v Ki vanishes outside of Ωi (i =0, 1,... ,m). Then, for F given by (50), we have from the above inequality that, for i, j =0, 1,... ,m, a = hF 0(w + u ) − F 0(w ),v i ij Z ij ij ij i  ∇ T ∇ 3 − 3 = ( uij ) vi + uij vi + (wij + uij ) wij vi dx Ωδ ∩Ωδ  i j  ≤ k k2 k k2 1+C wij + uij 1 δ ∩ δ + C wij 1 δ ∩ δ H (Ωi Ωj ) H (Ωi Ωj )

(52) ·kuijk 1 δ ∩ δ kvik 1 δ ∩ δ , H (Ωi Ωj ) H (Ωi Ωj ) ∈ ∈ ∈ δ ∩ δ ∅ for any wij K, uij Kj,vi Ki,withaij = 0 whenever Ωi Ωj = . Assume k k2 k k2 ≤ there exists a constant α>0 such that wij + uij 1 δ ∩ δ + wij 1 δ ∩ δ α H (Ωi Ωj ) H (Ωi Ωj ) δ ∩ δ ∅ for i, j =0, 1,... ,m.Also,fori, j =1,... ,m,letij =0ifΩi Ωj = and otherwise let ij =1.Letˆc be the smallest integer such that every subdomain intersects at mostc ˆ other subdomains. It is not difficult to show that the symmetric E m matrix =[ij ]i,j=1 has the following estimate of its spectral radius (see [53, Corollary 4.1] for a proof): Xm ρ(E) ≤ max ij ≤ c.ˆ i=1,... ,m j=1

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1127

This together with the estimate (52) yields Xm Xm Xm Xm aij ≤ (1 + Cα) ij kuijkkvik i=1 j=1 i=1 j=1 Xm Xm ≤ (1 + Cα) ij max kuijkkvik i=1,... ,m i=1 j=1   1   1 Xm 2 Xm 2 2 2 (53) =(1+Cα)ˆc max kuij k kvik . i=1,... ,m j=1 i=1 δ ∈ Next, by using the fact Ωj , j I(k), are disjoint subsets of Ω for k =1,... ,c,the estimate (52) yields Xm Xm a0j ≤ (1 + Cα) ku0jkkv0k 1 δ H (Ωj ) j=1 j=1   1   1 Xm 2 Xm 2 ≤ k k2 k k2 (1 + Cα) u0j v0 1 δ H (Ωj ) j=1 j=1  m  1 √ X 2 2 ≤ (1 + Cα) c ku0jk kv0k , ∀u0j ∈ Kj, ∀v0 ∈ K0 . j=1 Similar to the above argument, the estimate (52) gives Xm Xm ai0 ≤ (1 + Cα) kui0k 1 δ kvik H (Ωi ) i=1 i=1   1   1 Xm 2 Xm 2 ≤ k k2 k k2 ∀ ∈ ∀ ∈ (1 + Cα) ui0 1 δ vi , ui0 K0, vi Ki. H (Ωi ) i=1 i=1 We combine these estimates to obtain Xm Xm Xm Xm Xm Xm aij = a00 + a0j + aij + ai0 i=0 j=0 j=1 i=1 j=1 i=1  m  1 √ X 2 2 ≤ (1 + Cα)ku00kkv0k +(1+Cα) c ku0jk kv0k j=1   1   1 Xm 2 Xm 2 2 2 +(1+Cα)ˆc max kuijk kvik i=1,... ,m j=1 i=1   1   1 Xm 2 Xm 2 k k2 k k2 +(1+Cα) ui0 1 δ vi H (Ωi ) i=1 i=1   1   1 Xm 2 Xm 2 2 2 ≤ C˜2 max kuij k kvik i=0,1,... ,m j=0 i=0   1   1 Xm 2 Xm 2 k k2 k k2 (54) +(1+Cα) ui0 1 δ vi , H (Ωi ) i=1 i=1

with C˜2 a constant depending on Cα,c,cˆ only. Compared with (6) (with i, j = 0, 1,... ,m), we see that (54) has an extra term on the right-hand side. In the

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1128 XUE-CHENG TAI AND PAUL TSENG

appendix, we will show that this extra term does not affect the convergence rate result of Section 4.√ In particular, we will show that Lemmas 1 and 2 hold with C2 = C˜2 +(1+Cα) c, so that Theorem 1 is still valid. For F specified by (44) and associated with the equation (48), it can be similarly proved using (49) that (54) holds, possibly with different constants C and α. Upon applying the asynchronous method (10)–(12) with the above choice of space decomposition and under the assumptions (13)–(14), we obtain a parallel DD method for (48) and (51) whose convergence factor, according to Theorem 1 and the above estimates of C1 and C2 and assuming the overlapping size δ is proportional to the coarse mesh size H, is independent of the mesh parameters and the number of the subdomains.

7.2. Multigrid methods.

7.2.1. Construction of the multigrid subspaces. In MG methods, Ω is divided into a finite element triangulation T by a successive refinement process. More precisely, we have T = TJ for some J>1, where Tk, k =1,... ,J, is a nested sequence of T T { k} regular quasi-uniform triangulation, i.e., k is a collectionS of simplexes k = τi k of size (i.e., maximum diameter) hk such that Ω = i τi and for which the quasi- uniformity constants are independent of k [17, Eq. (3.2.28)] and with each simplex in Tk−1 being the union of simplexes in Tk. We further assume that there is a 2k constant r<1, independent of k, such that hk is proportional to r . For example, in the two-dimensional case of d =2,ifweconstructTk by con- necting the midpoints of the edges of the triangles of Tk−1,withT1 being the given coarsest initial√ triangulation, the resulting sequence of triangulation is quasi- uniform and r =1/ 2 (see Figure 2). Corresponding to each triangulation Tk,we define the finite element subspace: M { ∈ 1 | ∈P ∀ ∈T } k = v H0 (Ω) : v τ 1(τ), τ k ,

K1, k=1, i=1 Level k = 1 1

1 1

0.5 0.5

0 0 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 Level k = 2 K2, i =1,2,...9 i

1 1

0.5 0.5

0 0 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 Level k = 3 K3, i =1,2,...49 i

1 1

0.5 0.5

0 0 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0

Figure 2. The multigrid mesh and basis functions.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1129

where P1(τ) denotes the space of real-valued linear functions of d real variables M { k}nk defined on τ. We associate with k a nodal basis, denoted by φi i=1,that k ∈M satisfies φi k and k k φi (xj )=δij , the Kronecker function,

{ k}nk T where xi i=1 is the set of all interior nodes of the triangulation k. For each such nodal basis function, we define the one-dimensional subspace

k k Ki =span(φi ). k k Then, (Ki ) = Ki and we have the space decomposition

XJ Xnk k M K = Ki with K = J . k=1 i=1

On each level k, we color the nodes of Tk so that neighboring nodes are always of a different color. The number of colors needed for a regular mesh is a constant independent of the mesh parameters, which we denote by nc. Then the total number of colors needed for (7) and (9) (with summation indices adjusted accordingly) to hold is c = ncJ.

7.2.2. Estimating C1 for equations (48) and (51). Let Qk be the projection in the 2 M 1 ⊂ 2 L -norm onto the subspace k, which is well defined on H0 (Ω) L (Ω). For any k v ∈ K,letv =(Qk − Qk−1)v, k =1,... ,J. Then, by Prop. 8.6 in [57, p. 611], we have XJ k 2 2 kv k ≤ C0kvk , k=1

where C0 is a constant independent of the mesh parameters and J.Byfurther decomposing each vk as

Xnk k k k k k k v = vi with vi = v (xi ) φi , i=1 the above estimate can be refined to show that

XJ Xnk XJ Xnk k k kk2 ≤ k k2 v = vi and vi C v , k=1 i=1 k=1 i=1 where C is a constant independent of the mesh parameters and the number of § k ∈ k levels J [53, 4.2]. Thus, for any vi Ki , i =1,... ,nk, k =1,... ,J,there k ∈ k exists√u ¯i Ki satisfying (5) (with summation indices adjusted accordingly), where C1 = C.

k 7.2.3. Estimating C2 for equations (48) and (51). Let Λi denote the support set k of the basis function φi , for all i and k. Also, recall the constant r<1 defined earlier. Then, for any k

d(l−k) k kuk 1 k∩ l ≤ C0r kuk, ∀u ∈ K , H (Λi Λj ) i

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1130 XUE-CHENG TAI AND PAUL TSENG

can be shown, where C0 is a constant independent of the mesh parameters and J [53, Eq. (49)]. Then, for F given by (50), we obtain as in (52) that (55) hF 0(w + u) − F 0(w),vi   ≤ k k2 k k2 k k k k 1+C w + u H1(Λk ∩Λl ) + C w H1(Λk∩Λl ) u H1(Λk ∩Λl ) v H1(Λk ∩Λl )  i j i j  i j i j ≤ k k2 k k2 d(l−k)k kk k 1+C w + u 1 k ∩ l + C w 1 k∩ l C0r u v , H (Λi Λj ) H (Λi Λj ) ∀ ∈ ∈ k ∈ l w K, u Ki ,v Kj, where C is the embedding constant. For any i, j, k, l, defining  C γd|l−k|, if supp(φk) ∩ supp(φl ) =6 ∅; εk,l = 0 i j i,j 0, otherwise. k k,l k,lk2 k k,lk2 ≤ Assuming there exists a constant α>0 such that wi,j + ui,j + wi,j α for all i, j, k, l, the estimate (55) then yields

XJ Xnk XJ Xnl h 0 k,l k,l − 0 k,l ki F (wi,j + ui,j ) F (wi,j ),vi k=1 i=1 l=1 j=1 X X ≤ k,lk k.lkk kk C0(1 + Cα) εi,j ui,j vi i,k j,l

XJ Xnk XJ Xnl ≤ k,l k k,lk·k kk C0(1 + Cα) εi,j max ui,j vi , i,k k=1 i=1 l=1 j=1 ∀ k,l ∈ l ∀ k ∈ k ui,j Kj, vi Ki . E k,l With proper ordering of the indices, the matrix =[εi,j ] is symmetric and its spectral radius ρ(E) has been shown to be less than a constant independent of the mesh parameters and the number of levels [45, pp. 182–184]. Therefore,

XJ Xnk XJ Xnl h 0 k,l k,l − 0 k,l ki F (wi,j + ui,j ) F (wi,j ),vi k=1 i=1 l=1 j=1  n  1  n  1 XJ Xl 2 XJ Xk 2 ≤ E k k,lk2 k kk2 C0(1 + Cα)ρ( ) max ui,j vi , i,k l=1 j=1 k=1 i=1

which shows that (6) holds, with the constant C2 = C0(1 + Cα)ρ(E) independent of the mesh parameters and the number of levels for the MG approximation. For F specified by (44), it can be similarly proved that (6) holds with C2 some constant independent of the mesh parameters and the number of levels. Upon applying the asynchronous method (10)–(12) with the above choice of space decomposition and under the assumptions (13)–(14), we obtain a parallel MG method for (48) and (51) whose convergence factor, according to the above estimates of C1 and C2 and Theorem 1, is independent of the mesh parameters. This method generalizes the BPX multigrid method proposed in [10], which was used as a preconditioner for linear elliptic problems. Here, the parallel MG method is used as a solver and is applicable not only to linear, but also to nonlinear elliptic problems. And it further allows for asynchronous updates.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1131

8. Applications to obstacle problems In this section, we will apply our asynchronous algorithm to the obstacle problem −∆u ≥ f in Ω,u≥ ψ in Ω,u=0on∂Ω,

where f ∈ L2(Ω) and ψ ∈ H2(Ω) satisfies ψ ≤ 0on∂Ω. This problem is equivalent to (2) with Z   { ∈ 1 ≥ } 1|∇ |2 − K = v H0 (Ω) : v ψ a.e. in Ω ,F(v)= v fv dx. Ω 2 We will use the overlapping domain decomposition without the coarse mesh. If a coarse mesh is added, it is not known how the coarse mesh obstacle can be chosen to obtain an algorithm whose convergence factor is independent of mesh parameters. δ Let Ωi be defined as in subsection 7.1, i =1,... ,m.Letθi be the partition of δ unity with respect to Ωi asP described in subsection 7.1.2. Accordingly, let ψi = m Ih(θiψ) for all i and let ψ = i=1 ψi,whereIh is the interpolation operator using the h-level nodal values. Thus, the obstacle function ψ is replaced by its finite- element interpolation. Defining { ∈ 1 δ ≥ | ∈ ∀ ∈T } Ki = v H0 (Ωi ): v ψi,ve P1(e), e h , and also assuming that K has been replaced by its finite-element analog, it is easy to see that (4) holds. Suppose we applyP the asynchronous algorithm with the initial ∈ m ≤ values ui(0) Ki chosen to satisfy i=1 ui(0) u¯. Then, it can be shown (see, e.g., [1] and [51, §4]) that Xm i wi(t ) ≤ u,¯ ∀t ≥ 1, i=1

i i where wi(·) is defined by (12) and t denotes the greatest element of T less than ∈ t + B. Thus (27) remains validP if we assume there existu ¯i Ki satisfying (5) ∈ m ≤ only for those vi Ki satisfying i=1 vi u¯. Since (27) is the only point in the proofs of Lemmas 1 and 2 where (5) isP used, these lemmas and Theorem 1 would m ≤ remain valid. Under the condition that i=1 vi u¯, the constant C1 in (5) can be estimated by choosing the partition of unity θi to satisfy |∇θi|≤C/δ and setting   Xm u¯i = vi + θi u¯ − vj . j=1

Then, it is straightforward to show thatu ¯i ∈ Ki and

Xm Xm Xm 2 2 −2 u¯i =¯u, ku¯i − vik ≤ C(1 + δ ) u¯ − vj , i=1 i=1 j=1

with the constant C being independent of u, vi, the mesh size h, the overlapping size δ, and the number of subdomains√ pm (cf. subsection 7.1.2). The above estimate 2 shows that (5) holds with C1 = C 1+1/δ . Also, by dropping the coarse mesh and taking into account the difference between the above F and the F given by (50), we see from the proof of (53) that (6) holds with C2 =(1+C)ˆc,withC an embedding constant. Assuming that B is bounded by a given constant and γ is

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1132 XUE-CHENG TAI AND PAUL TSENG

bounded by a constant less than 1, we then obtain from the definitions in Lemmas 1and2that

1 −2 A4 ≤ A1 ≤ D, A2 ≤ D, A3 ≤ D(1 + δ ), ≤ D, D A3 for some D>0 independent of h, δ and m. Then, the convergence factor given by (38) can be estimated by   γ 3/2 (A2 + A1A4/A3) 1/2 −1 3/2 −1 % =max1 − + γ ,γ (A + γ A2A ) 1+A−1A 1+A A−1 1 1  1 3 1 3  γ ≤ max 1 − + γ3/2(D + D2),γ1/2(D + D2) 1+D2(1 + δ−2)   ≤ − γ 3/2 1/2 max 1 −2 + γ D2,γ D2 , D1(1 + δ ) 2 2 whereweletD1 =1+D and D2 = D + D .Thus,for { −2 −2 −2} γ

9. Appendix In this appendix, we show that (54) can be used in place of (6) to prove Lemmas 1 and 2 for the DD method of subsection 7.1. Here, the indices i and j are understood to always range over 0, 1,... ,m, instead of 1,... ,m. First, we note that condition (6) is used only to show (22), (30) and (34) in the proofs. For (22), if we use condition (54) instead of (6), then (22) would have C˜2 in place of C2 and would have the following extra term on its right-hand side:   1   1 Xm 2 Xm 2 k i − k2 k k2 E =(1+Cα)γ u0(τ0(t)) u0(t) 1 δ si(t) . H (Ωi ) i=1 i=1

Correspondingly, (24) would have C˜2 in place of C2 and would have the above extra δ ∈ term on its right-hand side. Using (23) and the fact that Ωi , i I(k), are disjoint subsets of Ω for k =1,... ,c,weseethat  −  1   1 √ Xm Xt 1 2 Xm 2 ≤ 2 k k2 k k2 E (1 + Cα)γ B s0(τ) 1 δ si(t) H (Ωi ) i=1 τ=t−B+1 i=1  −  1   1 √ Xt 1 2 Xm 2 2 2 2 ≤ (1 + Cα)γ Bc ks0(τ)k ksi(t)k , τ=t−B+1 i=1 √ which implies that (24) holds with C2 = C˜2 +(1+Cα) c. The remainder of the proof of Lemma 1 then proceeds as before. For (30) and (34), a similar argument can be applied to show that Lemma 2 holds with the above choice of C2.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1133

References

1. L. Badea and J. Wang, An additive Schwarz method for variational inequalities,Math.Comp. 69 (2000), 1341–1354. MR 2001a:65075 2. R. Bank, Analysis of a multilevel for nonlinear finite element equations, Math. Comp. 39 (1982), 453–465. MR 83j:65105 3. R. Bank and T. F. Chan, PLTMGC: a multigrid continuation program for parametrized nonlinearsystems,SIAMT.Sci.Statist.Comput.,7 (1986), 540–559. MR 87d:65125 4. D. P. Bertsekas, D. Casta˜non, J. Eckstein, and S. A. Zenios, Parallel computing in network optimization, Handbooks in Operations Research and Management Science 7: Network Models (Amsterdam) (M. O. Ball, T. L. Magnanti, C. L. Monma, and G. L. Nemhauser, eds.), Elsevier Scientific, 1995, pp. 331–399. CMP 97:04 5. D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: Numerical methods, Prentice-Hall, Englewood Cliffs, 1989. 6. , Some aspects of parallel and distributed iterative algorithms—a survey, Automatica 27 (1991), 3–21. MR 91m:65137 7. J. Bramble and X. Zhang, The analysis of multigrid methods, Handbook of , Vol. VII, 173–415, North-Holland, Amsterdam, 2000. CMP 2001:08 8. J. H. Bramble, J. E. Pasciak, and A. H. Schatz, The construction of preconditioners for elliptic problems by substructuring. IV,Math.Comp.53 (1989), 1–24. MR 89m:65098 9. J. H. Bramble, J. E. Pasciak, J. Wang, and J. Xu, Convergence estimates for product iterative methods with applications to domain decomposition,Math.Comp.57 (1991), 1–21. MR 92d:65094 10. J. H. Bramble, J. E. Pasciak, and J. Xu, Parallel multilevel preconditioners,Math.Comp.55 (1990), 1–22. MR 90k:65170 11. A. Brandt, Multilevel adaptive solutions to boundary value problems,Math.Comp.31 (1977), 333–390. MR 55:4714 12. A. Brandt and C. W. Cryer, Multigrid algorithms for the solution of linear complementary problems arising from free boundary problems,SIAMJ.Sci.Comput.4 (1983), 655–684. MR 85h:65261 13. J. V. Burke and P. Tseng, A unified analysis of Hoffman’s bound via Fenchel duality,SIAM J. Optim. 6 (1996), 265–282. MR 97c:90120 14. T. F. Chan and T. P. Mathew, Domain decomposition algorithms, Acta Numerica (1994), 61–143. MR 95f:65214 15. T. F. Chan and I. Sharapov, Subspace correction multilevel methods for elliptic eigenvalue problems, Numer. Linear Algebra Appl., 1 (1993), 1–7. 16. D. Chazan and W. L. Miranker, Chaotic relaxation, Lin. Algeb. Appl. 2 (1969), 199–222. MR 40:5114 17. P. G. Ciarlet, The finite element method for elliptic problems, North-Holland, Amsterdam, 1978. MR 58:25001 18. M. Dryja and W. Hackbusch, On the nonlinear domain decomposition method,BIT37 (1997), 296–311. MR 97m:65100 19. M. Dryja and O. B. Widlund, Towards a unified theory of domain decomposition algorithms for elliptic problems, Third international symposium on domain decomposition methods for partial differential equations (Philadelphia, PA) (T. Chan, R. Glowinski, J. Periaux, and O. B. Widlund, eds.), SIAM, 1990. MR 91m:65294 20. I. Ekeland and R. Temam, Convex analysis and variational problems, North–Holland, Ams- terdam, 1976. MR 57:3931b 21. B. H. Elton, An asynchronous domain decomposition method for the two-dimensional Poisson equation, Parallel processing for scientific computing, Vol. I, II (Norfolk, VA, 1993), SIAM, Philadelphia, PA, 1993, pp. 691–695. CMP 93:10 22. A. Frommer, H. Schwandt, and D. B. Szyld, Asychronous weighted additive Schwarz methods, Electronic Trans. Numer. Anal. 5 (1997), 48–61. MR 98c:65046 23. E. Gelman and J. Mandel, On multilevel iterative methods for optimization problems,Math. Programming 48 (1990), 1–17. MR 91b:90180 24. R. Glowinski and P. Le Tallec, Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, SIAM, Philadelphia, 1989. MR 91f:73038 25. W. Hackbusch and H. D. Mittelmann, On multigrid methods for variational inequalities, Numer. Math. 42 (1983), 65–76. MR 84k:49010

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 1134 XUE-CHENG TAI AND PAUL TSENG

26. W. Hackbusch and A. Reusken, Analysis of a damped nonlinear multilevel method,Numer. Math. 55 (1989), 225–246. MR 90f:65194 27. L. Hart and S. McCormick, Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors: basic ideas, Parallel Comput. 12 (1989), no. 2, 131–144. MR 90m:65250 28. R. H. Hoppe, Multigrid solutions to the elastic plastic torsion problem in multiply connected domains, Int. J. Numer. Methods Engin. 26 (1988), 631–646. MR 89a:73001 29. R. H. W. Hoppe, Two-sided approximations for unilateral variational inequalities by multigrid methods, Optimization 16 (1987), 867–881. MR 88m:49004 30. R. H. W. Hoppe and R. Kornhuber, Adaptive multilevel methods for obstacle problems,SIAM J. Numer. Anal. 31 (1994), 301–323. MR 95c:65181 31. R. Kornhuber, Monotone multigrid iterations for elliptic variational inequalities, II,Numer. Math. 72 (1996), 481–499. MR 96k:65081 32. O. A. Ladyzhenskaya and N. N. Uraltseva, Linear and quasilinear elliptic equations,Academic Press, 1968. MR 39:5941 33. P. Le Tallec, Domain decomposition methods in computational mechanics, Comput. Mechan. Adv. 1 (1994), 121–220. MR 95b:65147 34. Z.-Q. Luo and P. Tseng, On the linear convergence of descent methods for convex essentially smooth minimization, SIAM J. Control Optim. 30 (1992), 408–425. MR 93a:90064 35. , Error bounds and convergence analysis of feasible descent methods: A general ap- proach, Ann. Oper. Res. 46 (1993), 157–178. MR 95a:90138 36. J. Mandel, A multilevel iterative method for symmetric, positive definite linear complementary problems, Appl. Math. Optim. 11 (1984), 77–95. MR 85b:90082 37. S. McCormick and D. Quinlan, Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors: performance results, Parallel Comput. 12 (1989), no. 2, 145–156. MR 90m:65251 38. S. F. McCormick, Multilevel adaptive methods for partial differential equations, Frontiers in Applied Mathematics, vol. 6, SIAM, Philadelphia, 1989. MR 91h:65200 39. J.-C. Miellou, L. Giraud, A. Laouar, and P. Spiteri, Subdomain decomposition methods with overlapping and asynchronous iterations, Progress in partial differential equations: the Metz surveys, Longman Sci. Tech., Harlow, 1991, pp. 166–183. MR 94h:65125 40. A. Quarteroni and A. Valli, Domain decomposition methods for partial differential equations, Oxford Science Publications, Oxford, 1999. 41. R. Rannacher, On the convergence of the Newton-Raphson method for strongly nonlinear finite element equations, Nonlinear computational mechanics (P. Wriggers and W. Wagner, eds.), Springer-Verlag, 1991. 42. R. T. Rockafellar, Convex analysis, Princeton University Press, Princeton, 1970. MR 43:445 43. , Network flows and monotropic optimization, Wiley-Interscience, New York, 1984. MR 89d:90156 44. I. Sharapov, Multilevel subspace correction for large scale optimization problems,East-West J. Numer. Math., 9 (2000), 233–252. 45. B. F. Smith, P. E. Bjørstad, and W. D. Gropp, Domain decomposition: Parallel multilevel algorithms for elliptic partial differential equations, Cambridge University Press, Cambridge, 1996. MR 98g:65003 46. A. K. Stagg and G. F. Carey, Asynchronous nonlinear iteration and domain decomposition, Parallel processing for scientific computing (Houston, TX, 1991), SIAM, Philadelphia, PA, 1992, pp. 281–286. CMP 93:05 47. X.-C. Tai, Parallel function decomposition and space decomposition methods with applications to optimisation, splitting and domain decomposition, Preprint No. 231–1992, Institut f¨ur Mathematik, TechnischeUniversit¨at Graz (1992), See also http://www.mi.uib.no/˜tai. 48. , Parallel function and space decomposition methods, Finite element methods, fifty years of the Courant element, Lecture notes in pure and applied mathemat- ics (P. Neittaanm¨aki, ed.), vol. 164, Marcel Dekker inc., 1994, pp. 421–432. See also http://www.mi.uib.no/˜tai. MR 95d:65008 49. , Domain decomposition for linear and nonlinear elliptic problems via function or space decomposition, Domain decomposition methods in scientific and engineering computing (Proc. of the 7th international conference on domain decomposition, Penn. State University,

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYNCHRONOUS SPACE DECOMPOSITION 1135

1993) (D. Keyes and J. Xu, eds.), Contemp. Math. 180, American Mathematical Society, 1994, pp. 355–360. MR 95k:65121 50. , Parallel function decomposition and space decomposition methods: Part II. Space decomposition, Beijing Mathematics 1, part 2 (1995), 135–152, See also http://www.mi.uib.no/˜tai. 51. , Convergence rate analysis of domain decomposition methods for obstacle problems, East-WestJ.Numer.Math.,9 (2000), 233–253. 52. X.-C. Tai and M. Espedal, Rate of convergence of some space decomposition method for linear and nonlinear elliptic problems,SIAMJ.Numer.Anal.35 (1998), 1558–1570. MR 99k:65101 53.X-C.TaiandJ.-C.Xu,Global and uniform convergence of subspace correction methods for some convex optimization problems, Math. Comp., posted on May 11, 2001, PII S0025- 5718(01)01311-4 (to appear in print). 54. M. Tinkham, Introduction to superconductivity, McGraw Hill, New York, 1975. 55. P. Tseng, On the rate of convergence of a partially asynchronous gradient projection algo- rithm,SIAMJ.Optim.1 (1991), 603–619. MR 92k:90096 56. P. Tseng, D. P. Bertsekas, and J. N. Tsitsiklis, Partially asynchronous, parallel algorithms for network flow and other problems, SIAM J. Control Optim. 28 (1990), 678–710. MR 91e:65075 57. J.-C. Xu, Iterative methods by space decomposition and subspace correction,SIAMRev.34 (1992), 581–613. MR 93k:65029 58. , A novel two-grid method for semilinear elliptic equations,SIAMJ.Sci.Comp.15 (1994), 231–237. MR 94m:65178 59. , Two-grid discretization techniques for linear and nonlinear PDEs,SIAMJ.Numer. Anal. 27 (1996), 1759–1777. MR 97i:65169 60. J. Zeng and S. Zhou, On monotone and geometric convergence of Schwarz methods for two- sided obstacle problems,SIAMJ.Numer.Anal.35 (1998), 600–616. MR 99d:65199

Department of Mathematics, University of Bergen, Johannes Brunsgate 12, 5007, Bergen, Norway E-mail address: [email protected] Department of Mathematics, Box 354350, University of Washington, Seattle, Wash- ington 98195 E-mail address: [email protected]

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use