Alya: Computational Solid Mechanics for Supercomputers

Arch Computat Methods Eng (2015) 22:557–576 DOI 10.1007/s11831-014-9126-8 ORIGINAL PAPER Alya: Computational Solid Mechanics for Supercomputers E. Casoni · A. Jérusalem · C. Samaniego · B. Eguzkitza · P. Lafortune · D. D. Tjahjanto · X. Sáez · G. Houzeaux · M. Vázquez Received: 17 July 2014 / Accepted: 22 July 2014 / Published online: 13 August 2014 © CIMNE, Barcelona, Spain 2014 Abstract While solid mechanics codes are now conven- bining MPI tasks with OpenMP threads. This paper describes tional tools both in industry and research, the increasingly the proposed strategy, programmed in Alya, a parallel multi- more exigent requirements of both sectors are fuelling the physics code. Hybrid parallelization is specially well suited need for more computational power and more advanced algo- for the current trend of supercomputers, namely large clus- rithms. For obvious reasons, commercial codes are lagging ters of multicores. The strategy is assessed through transient behind academic codes often dedicated either to the imple- non-linear solid mechanics problems, both for explicit and mentation of one new technique, or the upscaling of current implicit schemes, running on thousands of cores. In order conventional codes to tackle massively large scale computa- to demonstrate the flexibility of the proposed strategy under tional problems. Only in a few cases, both approaches have advance algorithmic evolution of computational mechanics, been followed simultaneously. In this article, a solid mechan- a non-local parallel overset meshes method (Chimera-like) ics simulation strategy for parallel supercomputers based on a is implemented and the conservation of the scalability is hybrid approach is presented. Hybrid parallelization exploits demonstrated. the thread-level parallelism of multicore architectures, com- Keywords Computational mechanics · Finite element E. Casoni (B) · C. Samaniego · B. Eguzkitza · method · Parallel computing · Chimera X. Sáez · G. Houzeaux · M. Vázquez Barcelona Supercomputing Center (BSC-CNS), Edificio NEXUS I, Campus Nord UPC Gran Capitán 2–4, 08034 Barcelona, Spain e-mail: [email protected] 1 Introduction M. Vázquez With the increasing need for more advanced modeling tech- Artificial Intelligence Research Institute CSIC (IIIA-CSIC), niques involving non-local approaches or multiphysics inter- Campus de la UAB, 08193 Bellaterra, Spain e-mail: [email protected] actions, finite element (FE) solid mechanics codes requirements have traditionally slowed down the parallel efforts A. Jérusalem · D. D. Tjahjanto aimed at increasing the computational scalability of such IMDEA Materials Institute, Tecnogetafe, C/ Eric Kandel 2, codes. One of the direct consequences of this is that most 28906 Getafe, Spain commercial codes cannot efficiently scale on parallel com- D. D. Tjahjanto puters when more than hundreds of cores are used [5,24]. Department of Solid Mechanics, KTH Royal Institute Academic FE codes, on the other hand, have often relied on of Technology, Teknikringen 8D, 10044 Stockholm, Sweden the need to develop one unique technique of interest, poten- P. Lafortune tially followed by a secondary development phase aimed at Idra Simulation S.C.P., 08004 Barcelona, Spain scaling it up because of the prohibitive cost of the technique. As a direct consequence, researchers have often focused on A. Jérusalem the scalability of one technique independently of the other Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK ones; see for example the Arbitrary Lagrangian-Eulerian e-mail: [email protected] (ALE) methods [51], Discontinuous Galerkin (DG) meth- 123 558 E. Casoni et al. ods [21,71], fluid–structure interaction (FSI) [27,51,67]or codes (and often for one unique application) without consid- even expensive constitutive models (and their also expensive eration of all the previously listed evolutions [17,33,60,79]. meshing requirements) such as crystal plasticity [70]. It is In the particular case of FE for Solid Mechanics, many com- also noticeable that the range of fields of application for these puter codes were written to solve one specific application references is extremely wide, ranging from bio-medical, mil- and are not flexible enough to adapt the newest schemes and itary, seismic, or fracture mechanics to polycrystalline texture technological improvements. These limitations of code struc- analysis. ture make it difficult to integrate the latests research that is required to solve ever larger and more complicated problems. 1.1 Fluid Mechanics Versus Solid Mechanics Instead, the rapid advances on parallel computation require the code to be developed from the beginning for any large Fluid mechanics computational codes have generally been scale application, and then enhanced with advanced tech- exploiting advanced techniques conjointly with parallel niques following the same large scale framework imposed efficiency (see for instance OpenFOAM [9], Alya [1]or by the code structure. CODE_SATURNE [4]). This can be explained by the fact that fluid problems have traditionally required larger meshes 1.3 System of Equations than Solid Mechanics problems but also by the fact that model requirements for solids may hinder the paralleliza- The cornerstone of any FE method involves an assembly and tion itself, for instance, in fracture mechanics. In that sense, a solution of a linear or non-linear system of equations. In an the development of parallel Solid Mechanics codes requires implicit scheme, this solution process requires solving a liner a more complex structure in order, to not only solve the par- system of equations, which usually has dimension of millions tial differential equation (PDE) involved, but also to add the and even billions of degrees of freedom. The solution of large complex features typical of computational Solid Mechan- sparse linear systems of equations is the main concern in any ics: e.g., non-linear behavior of materials, non-locality of the parallel code [43]. The algorithms to solve these systems are model in fracture mechanics or continuum damage theory, basically grouped into two categories: direct methods and non-linearity (and unphysical) forces in contact mechanics, iterative methods. For decades, research has mainly focused multi-scale model needed in fracture and cracks, etc. on taking advantage of sparsity to design direct methods that can be economical. However, a substantial portion of the total 1.2 FE Solid Mechanics Codes computational work and storage required to solve stiff ini- tial value problems is devoted to solve these linear algebraic Designing computational models of complex material defor- systems, particularly if the system is large (see for instance mation mechanisms such as fracture, phase change or impact the study in Ref. [38]). However, over the last decade, several requires a combination of new numerical methods and efficient iterative methods have been developed to solve large improved computing power. Beyond the advanced numerical sparse (non-symmetric) systems of linear algebraic equations techniques, an increased computational power is needed to that involve a large number of variables (sometimes of the resolve the many length and time scales of these problems. order of millions) [72]. In these cases, direct methods would Most of the modern FE codes can parallelize over distrib- be prohibitively expensive even with the best available com- uted memory clusters to allow for solution on larger meshes puting power and iterative methods appear as the only ratio- or to reduce computation time. Sandia Labs has developed nal alternative. the SIERRA framework [77], which provides a foundation for several computational mechanics codes, for instance the 1.4 Algebraic Solvers highly scalable software Salinas [23] or the open source code CODE ASTER [3]. Other open source software such An algebraic system Ax = b can be approached by two as Gmsh [36]orFEniCS[61] also offer PETSc [19] or Epe- categories of methods. Krylov subspace methods tackle the tra [45]. Recently, however, shared memory multicore tech- problem directly by solving directly for x. Some examples are nology such as graphics processing units (GPUs) [55]have the conjugate gradient (CG) method for the symmetric cases, gained prominence due to their arithmetic ability and high generalized minimal residual (GMRes) and the biconjugate power efficiency, such as in FEAST software package [41]. gradient (BiCGSTAB) method for the non-symmetric cases Large scale Solid Mechanics FE computation became a [72]. The other family of methods are the primal and dual field of research on its own as soon as in the early 80’s [37]. Schur complement methods, which solve for the interface However, if preconditioner optimization studies for large unknowns, the restriction of x on the subdomain interfaces scale computing were tackled shortly after [49], and bar- and its dual (e.g. the FETI method [32,73]), respectively. For ring a few exceptions (e.g., Ref. [35] focusing on XFEM), this last class of methods, Krylov solvers can then be used they have almost always been focused on conventional FE to solve the interface problem. In both approaches, a precon- 123 Computational Solid Mechanics for Supercomputers 559 ditioner is also required to enhance the convergence. Exam- they do not enable reproducibilty. This means that if a sim- ples of classical preconditioners are the Gauss-Seidel, block ulation is carried out on 1,024 CPUs on one day, different diagonal, Deflated [62] or Multigrid [13,57,66].

Alya: Computational Solid Mechanics for Supercomputers

3.1 Ducom-COM3

Sparse Days September 6Th – 8Th, 2017 CERFACS, Toulouse, France

A Block Sparse Shared-Memory Multifrontal Finite Element Solver For

A Survey of Direct Methods for Sparse Linear Systems

Numerical Analysis Group Progress Report

Towards Green Multi-Frontal Solver for Adaptive Finite Element Method

A Fast Direct Solver for Structured Linear Systems by Recursive Skeletonization ∗

Parallel Sparse Direct Methods: a Short Tutorial

Solver Schemes for Linear Systems Oral Comprehensive Exam Position Paper

Grammar Based Multi-Frontal Solver for Isogeometric Analysis in 1D

High Performance Block Incomplete LU Factorization

A Parallel Sparse Direct Finite Element Solver for Desktop Computers