Numerical Libraries Solving Large-Scale Problems Developed At

Perspectives in Science (2016) 7, 140—150 Available online at www.sciencedirect.com ScienceDirect j ournal homepage: www.elsevier.com/pisc Numerical libraries solving large-scale problems developed at IT4Innovations Research Programme Supercomputing for Industryଝ a,b,∗ a,b a Michal Merta , Jan Zapletal , Tomas Brzobohaty , a a a Alexandros Markopoulos , Lubomir Riha , Martin Cermak , a,b a,b a,b Vaclav Hapla , David Horak , Lukas Pospisil , a,b Alena Vasatova a IT4Innovations National Supercomputing Center, 17. listopadu 15/2172, 708 00 Ostrava, Czech Republic b Department of Applied Mathematics VSB — Technical University of Ostrava, 17. listopadu 15/2172, 708 33 Ostrava, Czech Republic Received 26 October 2015; accepted 11 November 2015 Available online 15 December 2015 KEYWORDS Summary The team of Research Programme Supercomputing for Industry at IT4Innovations FETI; National Supercomputing Center is focused on development of highly scalable algorithms for TFETI; solution of linear and non-linear problems arising from different engineering applications. BEM; As a main parallelisation technique, domain decomposition methods (DDM) of FETI type are Domain used. These methods are combined with finite element (FEM) or boundary element (BEM) dis- decomposition; cretisation methods and quadratic programming (QP) algorithms. All these algorithms were Quadratic implemented into our in-house software packages BEM4I, ESPRESO and PERMON, which demon- programming; strate high scalability up to tens of thousands of cores. HPC © 2015 Published by Elsevier GmbH. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Introduction ଝ High performance of contemporary computers results from This article is part of a special issue entitled ‘‘Proceedings of an increasing number of compute nodes in clusters and num- the 1st Czech-China Scientific Conference 2015’’. ∗ ber of processor cores per node. While the current most Corresponding author at: IT4Innovations National Supercompu- ting Center, 17. listopadu 15/2172, 708 33 Ostrava, Czech Republic. powerful petascale or multi-petascale computers contain E-mail address: [email protected] (M. Merta). hundreds of thousands of CPU cores, the future exascale http://dx.doi.org/10.1016/j.pisc.2015.11.023 2213-0209/© 2015 Published by Elsevier GmbH. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Numerical libraries solving large-scale problems 141 systems will comprise millions of them. For efficient use of original FETI-1 method assumes that the boundary subdo- such systems, algorithms with high parallel scalability have mains inherit Dirichlet conditions from the original problem to be developed. where the conditions are embedded into the linear system Discretisation of most engineering problems describable arising from FEM. This means physically that subdomains by partial differential equations (PDE) leads to large sparse whose interfaces intersect the Dirichlet boundary are fixed linear systems of equations. However, problems that can be while others are kept floating; in the linear algebra speech, expressed as elliptic variational inequalities, such as those the corresponding subdomain stiffness matrices are non- describing the equilibrium of elastic bodies in mutual con- singular and singular, respectively. tact, lead to quadratic programming (QP) problems. The basic idea of the Total-FETI (TFETI) method (Dostál ˇ Finite element tearing and interconnecting (FETI) and et al., 2006, 2010; Cermák et al., 2015) is to keep all boundary element tearing and interconnecting (BETI) the subdomains floating and enforce the Dirichlet bound- (Langer and Steinbach, 2003; Of and Steinbach, 2009) meth- ary conditions by means of a constraint matrix and Lagrange ods form a successful subclass of domain decomposition multipliers, similarly to the gluing conditions along sub- methods (DDM). They belong to non-overlapping methods domain interfaces. This simplifies implementation of the and combine sparse iterative and direct solvers. FETI was stiffness matrix generalised inverse. The key point is that s s firstly introduced by Farhat and Roux (Farhat and Roux, kernels R of subdomain stiffness matrices K are known a 1991, 1992). The key ingredient of the FETI method is the priori, have the same dimension and can be formed with- decomposition of the spatial domain into non-overlapping out any computation from the mesh data, so that R matrix subdomains that are ‘‘glued together’’ by Lagrange multipli- (Im R = Ker K) possess also nice block-diagonal layout. Fur- ers. Elimination of the primal variables reduces the original thermore, each local stiffness matrix can be regularised linear problem to a smaller, relatively well conditioned, cheaply, and the inverse of the resulting nonsingular matrix equality constrained QP. If the FETI procedure is applied to a is at the same time a generalised inverse of the original contact problem (Dostál et al. 1998, 2000, 2005, 2010, 2012; singular one (Dostál et al., 2011; Brzobohatý et al., 2011). Dostál and Horák, 2004), the resulting QP has additional FETI methods use the Lagrange multipliers to enforce bound constraints. FETI methods allow highly accurate com- both equality and inequality constraints (gluing and nonpen- putations scaling up to tens of thousands of processors. etration conditions) in the original primal problem Our team was successful in adapting FETI approach for contact problems and designed new variants. One of them 1 uT T min Ku − u f s.t. BE u = o and BIu ≤ cI. is Total-FETI (TFETI) developed by Dostal et al. (Dostál et al., 2 ˇ 2006, 2010; Kruis et al., 2002; Cermák et al., 2015) which uses Lagrange multipliers to enforce Dirichlet boundary con- The primal problem is then transformed using duality into ditions. This enables a simpler building of the stiffness significantly smaller and better conditioned dual problem matrix kernel, as all subdomains are floating and associ- with equality constraint and nonnegativity bound ated subdomain stiffness matrices have the same kernel, obtained without any computation. Hybrid-TFETI (HTFETI) 1 T F T min − d s.t. G = e, I ≤ o reduces coarse problem (CP) size by aggregating the subdo- 2 mains into clusters, i.e. TFETI is applied twice. Resulting QP problems can be then solved by means of with efficient MPRGP and SMALBE algorithms designed again by + T T T + T Dostal et al. (Dostál et al., 2003; Dostál and Schöberl, 2005; F = BK B , G = R B , d = BK f, e = R f. Dostál, 2009) with known rate of convergence given by spec- tral properties of the solved system. After homogenisation using particular solution − We develop several software packages dealing with FETI: T T 1 T ˜ = G (GG ) e, while = ˆ + ˜, ˆ ∈ Ker G, ˜ ∈ Im G PERMON based on PETSc and ESPRESO based on Intel MKL and enforcing homogenised equality constraint by means and Cilk. The BEM4I library implements BEM discretisation, − T T 1 − Q = G GG G and together with the other two packages the BETI method. of projector P = I Q on Ker G, where ( ) is T The paper is organised as follows. After introduction, we projector to Im G , SMALSE algorithm can be applied to the describe the main principles of FETI and BETI methods. Then problem the particular libraries and their modules are introduced with the achieved highlights from various areas. 1 T T min ˆ PFPˆ − ˆ P(d − F˜) s.t Gˆ = o, Î ≥ −˜I. 2 Numerical methods For this dual problem the classical estimate of the spec- tral condition number is valid, i.e. Ä(PFP|Im P) ≤ C(H/h), FETI methods with H denoting the decomposition and h the discretisation parameter. Natural effort using the massively parallel com- FETI-1 (Farhat and Roux, 1991, 1992; Farhat et al., 1994; puters is to maximise the number of subdomains (decrease Kruis, 2006) is a non-overlapping DDM (Gosselet and Rey, H) so that sizes of subdomain stiffness matrices are reduced 2006) which is based on decomposing the original spatial which accelerates not only their factorisation and subse- domain into non-overlapping subdomains. They are ‘‘glued quent generalised inverse application but also improves together’’ by Lagrange multipliers which have to satisfy cer- conditioning and reduces the number of iterations. Negative tain equality constraints which will be discussed later. The effect of that is increase of dual and null space dimensions, 142 M. Merta et al. 1 which decelerate the coarse problem (CP) solution, i.e. solu- D0 1 ∗ 1 u = u − K u ∈ ∂ T ( )(x) (x) ( )(x)for x (1) tion of the system GG x = y, so that the bottleneck of the 2 TFETI method is the application of the projector dominating ∗ with V, K, K , and D denoting the single-layer, double-layer, the solution time. adjoint double-layer, and hypersigular boundary integral operators, respectively. The Galerkin discretisation of the Hybrid FETI method single-layer operator equation (1) leads to the system of linear equations Although there are several efficient coarse problem parallelisation strategies (Hapla and Horák, 2012; Kozubek et al., 1 = + u 2012, 2013), still there are size limitations of the coarse Vt M K 2 problem. So several hybrid (multilevel) methods were pro- posed (Lee, 2009; Klawonn and Rheinbach, 2010). The key idea is to aggregate small number of neighbouring sub- with the boundary element matrices domains into clusters (see Fig. 1), which naturally results into the smaller coarse problem. In our HTFETI, the aggre- V[k, ] := v(x, y) dsy dsx K[k, j] gation of subdomains into the clusters is enforced again k by Lagrange multipliers. Thus TFETI method is used on ∂v = : (x, y)ϕj(y) dsy dsx both cluster and subdomain levels. This approach simpli- ∂ k ∂ ny fies implementation of hybrid FETI methods and enables to extend parallelisation of the original problem up to tens of and the sparse identity matrix M. thousands of cores due to lower memory requirements.

Load more