Toward Extremely Scalable Nonlinear Domain Decomposition Methods for Elliptic Partial Differential Equations∗

SIAM J. SCI. COMPUT. c 2015 Axel Klawonn, Martin Lanser, Oliver Rheinbach Vol. 37, No. 6, pp. C667–C696 TOWARD EXTREMELY SCALABLE NONLINEAR DOMAIN DECOMPOSITION METHODS FOR ELLIPTIC PARTIAL DIFFERENTIAL EQUATIONS∗ AXEL KLAWONN†, MARTIN LANSER† , AND OLIVER RHEINBACH‡ Abstract. The solution of nonlinear problems, e.g., in material science, requires fast and highly scalable parallel solvers. Finite element tearing and interconnecting dual primal (FETI-DP) domain decomposition methods are parallel solution methods for implicit problems discretized by finite elements. Recently, nonlinear versions of the well-known FETI-DP methods for linear problems have been introduced. In these methods, the nonlinear problem is decomposed before linearization. This approach can be viewed as a strategy to further localize computational work and to extend the parallel scalability of FETI-DP methods toward extreme-scale supercomputers. Here, a recent nonlinear FETI-DP method is combined with an approach that allows an inexact solution of the FETI-DP coarse problem. We combine the nonlinear FETI-DP domain decomposition method with an algebraic multigrid (AMG) method and thus obtain a hybrid nonlinear domain decomposition/multigrid method. We consider scalar nonlinear problems as well as nonlinear hyperelasticity problems in two and three space dimensions. For the first time for a domain decomposition method, weak parallel scalability can be shown beyond half a million cores and subdomains. We can show weak parallel scalability for up to 524 288 cores on the Mira Blue Gene/Q supercomputer for our new implementation and discuss the steps necessary to obtain these results. We solve a heterogeneous nonlinear hyperelasticity problem discretized using piecewise quadratic finite elements with a total of 42 billion degrees of freedom in about six minutes. Our analysis reveals that scalability beyond 524 288 cores depends critically on both efficient construction and solution of the coarse problem. Key words. domain decomposition, FETI-DP, nonlinear, parallel computing AMS subject classifications. 65N55, 65F08, 65F10, 65Y05 DOI. 10.1137/140997907 1. Introduction. The discretization of elliptic partial differential equations leads to very large and very ill-conditioned problems. It is a challenging task to build a parallel scalable solver for the resulting discrete linear or nonlinear systems. This is partially due to the fact that iterative elliptic solvers, such as multigrid or domain decomposition methods, need a global communication mechanism to be scalable. Here, we consider new nonoverlapping domain decomposition methods which belong to the family of finite element tearing and interconnecting dual primal (FETI-DP) domain decomposition algorithms. Domain decomposition methods are divide and conquer algorithms that rely on a geometrical decomposition of the original problems. FETI-DP domain decomposition methods for linear or linearized problems were first introduced in Farhat et al. [24, 23]. They belong to the family of nonoverlapping domain decomposition methods that have ∗Submitted to the journal’s Software and High-Performance Computing section December 1, 2014; accepted for publication (in revised form) September 24, 2015; published electronically December 8, 2015. This work was supported in part by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA), project KL 2094/4-1, and RH 122/2-1. A preliminary version of this paper has been submitted as a proceedings paper [40]. The present paper has been completely revised and significantly extended. http://www.siam.org/journals/sisc/37-6/99790.html †Mathematisches Institut, Universität zu Köln, Weyertal 86-90, 50931 Köln, Germany (klawonn@ math.uni-koeln.de, [email protected]). ‡Fakultät für Mathematik und Informatik, Institut für Numerische Mathematik und Optimierung, Technische Universität Bergakademie Freiberg, 09596 Freiberg, Germany (oliver.rheinbach@math. tu-freiberg.de). C667 C668 AXEL KLAWONN, MARTIN LANSER, AND OLIVER RHEINBACH reduced communication compared to other domain decomposition approaches such as overlapping Schwarz methods [56, 58]. In finite element tearing and interconnecting (FETI) methods, the original problem is decomposed into problems on nonoverlapping subdomains. The continuity of the global solution is enforced as a linear constraint by dual Lagrange multipliers. In FETI-DP methods, additional (primal) constraints are enforced throughout the iteration; see Figure 1; for further details, see [58, 46]. A classical FETI-DP method was awarded a Gordon Bell price in 2002 [7] for a structural mechanics simulation on unstructured grids using 3 400 ASCI White cores. Modified versions, i.e., inexact FETI-DP domain decomposition methods, introduced in [43], have scaled to up to 65 536 Blue Gene/P cores in 2009 [46] for linear elasticity. For highly scalable parallel algebraic multigrid (AMG) solvers, see, e.g., [1], where the BoomerAMG preconditioner has shown to be parallel scalable to 100 000 cores in 2012. Scalability for multigrid solvers to the complete JUQUEEN machine at Forschungszentrum Jülich, Germany, has recently been shown for porous media [35] using DUNE [5, 6] and for earth mantle convection [55]. More recently, a computational scale bridging approach combined with FETI-DP methods was also scaled to thecompleteJUQUEEN[39]. The classical approach to solving nonlinear partial differential equations with FETI-DP methods uses a Newton–Krylov (NK) FETI-DP approach in which the discretized nonlinear problem is linearized by a Newton method, possibly within a globalization loop. Then, in each Newton step, the linear system obtained defined by the tangent matrix is solved iteratively using a Krylov space algorithm combined with a FETI-DP method. In [37, 38] a new approach was considered where the nonlinear problem is first decomposed and then linearized, yielding the new family of nonlinear FETI-DP methods. The numerical results presented in [37, 38] were obtained sequen- tially using MATLAB, and only scalar problems based on variants of the p-Laplace operator were considered. As in traditional (exact) FETI-DP methods, in the nonlinear FETI-DP algorithms in [37, 38], direct solvers are used for the elimination of the local subdomain problems and for the factorization of the coarse Schur comple- ment matrix. The latter prevents the use of this method on parallel supercomputers with several hundreds of thousands of cores due to the superlinear memory and time complexity of the direct solver in the coarse problem. In this current paper, we will overcome this limitation and discuss steps necessary to obtain scalability to half a million cores. This paper presents several major new contributions to the field of nonlinear FETI-DP methods. First, a nonlinear FETI-DP method allowing for the inexact solution of the coarse problem is presented together with a new scalable implementation of this method. A different formulation allows the use of an AMG preconditioner for inexact solution of the coarse problem. This is opposed to our earlier nonlinear FETI- DP methods [37, 38], where we only presented sequential results using MATLAB and solved the coarse problem exactly. The derivation of this new inexact nonlinear FETI- DP method alongside preliminary results was first presented at the 22nd International Conference on Domain Decomposition Methods; see [40]. Meanwhile, the scalability of this new method has been substantially improved. In the present paper, parallel numerical results based on these improvements are shown. Second, for the first time for any linear or nonlinear FETI-DP-type domain decomposition method, weak parallel scalability is achieved for more than 500 000 processor cores. Let us note that to date we are not aware of such a scalability result for any other domain decomposition method. Finally, we give a detailed analysis of our implementation to find components SCALABLE NONLINEAR DOMAIN DECOMPOSITION METHODS C669 that limit the scalability. The corresponding findings are discussed, and solutions to overcome these issues and to further improve the parallel performance are provided. We present weak parallel scaling results for nonlinear problems using 32 768 cores of SuperMUC at Leibniz-Rechenzentrum Munich, for 131 072 cores of the Vulcan Blue Gene/Q supercomputer at Lawrence Livermore National Laboratory, and 524 288 cores of the Mira Blue Gene/Q supercomputer at Argonne National Laboratory. In our method, the nonlinear problem is decomposed before linearization, using Lagrange multipliers. We use a hybrid domain decomposition/multigrid method that combines a nonlinear FETI-DP domain decomposition method with a parallel AMG preconditioner. The nonlinear approach serves to reduce communication, and the multilevel approach allows the extension of the parallel scalability by adding further coarse levels. If the method is applied to linear problems and the coarse solver is iterated until convergence in each outer iteration, then the method is equivalent to the standard FETI-DP method. Of course, in the nonlinear method presented in this paper, in each outer iteration, using only one or two multigrid iterations for the coarse problem will be sufficient. Note that in nonlinear FETI-DP methods some global communication still takes place in the outermost loop. With respect to software development aspects it is interesting that the algorithmic building blocks of the new nonlinear methods and of a corresponding

Load more