Overview of MFEM (Modular Finite Elements Methods Library)

BOUT++ Workshop @ LLNL Tzanio Kolev August 15, 2018 with the MFEM and BLAST teams

LLNL-PRES-757077 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Finite elements are a good foundation for large-scale simulations on current and future architectures

§ Large-scale parallel multi-physics simulations • radiation diffusion • electromagnetic diffusion • compressible hydrodynamics

§ Finite elements naturally connect different physics

H(grad) H(curl) r⇥ H(div) r· L ! ! ! 2 8th order Lagrangian simulation “nodes” “edges” “faces” “elems” of shock triple-point interaction High-order High-order High-order High-order kinematics MHD rad. diffusion thermodynamics § High-order finite elements on high-order meshes • increased accuracy for smooth problems • sub-element modeling for problems with shocks • HPC utilization, FLOPs/bytes increase with the order

§ Need new (interesting!) R&D for full benefits • meshing, discretizations, solvers, AMR, UQ, visualization, … Inertial Confinement Fusion 2 Modular Finite Element Methods (MFEM)

MFEM is an open-source C++ library for scalable FE research and fast application prototyping

§ Triangular, quadrilateral, tetrahedral and hexahedral; volume and surface meshes § Arbitrary order curvilinear mesh elements § Arbitrary-order H1, H(curl), H(div)- and L2 elements Linear, quadratic and cubic finite element spaces on curved meshes § Local conforming and non-conforming refinement § NURBS geometries and discretizations § Bilinear/linear forms for variety of methods (Galerkin, DG, DPG, Isogeometric, …) § Integrated with: , SUNDIALS, PETSc, SUPERLU, PUMI, VisIt, Spack, xSDK, OpenHPC, and more … § Parallel and highly performant mfem.org (v3.4, May/2018) § Main component of ECP’s co-design Center for Efficient Exascale Discretizations (CEED) § Native “in-situ” visualization: GLVis, glvis.org

3 Example 1 – Laplace equation § Mesh § Linear solve

§ Visualization

§ Finite element space

§ Initial guess, linear/bilinear forms

§ works for any mesh & any H1 order § builds without external dependencies

4 Example 1 – Laplace equation

§ Mesh

5 Example 1 – Laplace equation

§ Finite element space

6 Example 1 – Laplace equation

§ Initial guess, linear/bilinear forms

7 Example 1 – Laplace equation

§ Linear solve

§ Visualization

8 Example 1 – parallel Laplace equation First Parallel Layer: CPU/MPI Domain Decomposition

§ParallelParallel data decomposition mesh in BLAST § Parallel linear solve with AMG Each CPU is assigned a subdomain consisting of a number of zones MFEM handles the translation between local finite element bilinear forms / grid functions and global parallel matrices / vectors. Just a few MPI calls (MPI_Bcast and MPI_Allreduce).

MPI-based parallel finite elements in MFEM Parallel mesh § Visualization First Parallel Layer: CPU/MPI Domain Decomposition ⇥(1) ⇥(2) Parallel data decomposition in BLAST Each CPU is assigned a subdomain consisting of a number of zones MFEM handles the translation between local finite element bilinear forms / grid functions and global parallel matrices / vectors. (1)JustParallel a few mesh MPI splittingcalls (MPI_Bcast (domain decompositionand MPI_Allreduce using METIS). ). § (2)ParallelParallel mesh refinement. finite element space MPI-based parallel finite elements in MFEM ParallelParallel finite mesh element space ParallelParallel sti finiteness element matrix space and load vector

Kolev et al. (LLNL) High-Order Finite Elements for Lagrangian Hydro MultiMat 2011 15 / 30

⇥(1) ⇥(2)

(1) Find shared degrees of freedom (dofs). (2) Form groups of dofs and assignP ownership.: true dof dof (3) Build a parallel Boolean matrix P = dofs truedofs identifying each dof with a master (true) dof. We use the ParCSR format in the hypre library for parallel matrix7! storage. § ParallelParallel stiness matrix initial and load vector guess, linear/bilinear forms Kolev et al. (LLNL) High-Order Finite Elements for Lagrangian Hydro MultiMat 2011 15 / 30

§ Parallel assembly § highly scalable with minimal changes A = P T aP B = P T b x = PX § build depends on hypre and METIS 9 Example 1 – parallel Laplace equation

10 MFEM example codes – mfem.org/examples

11 High-Order Methods High-order discretizations pose unique challenges

Shock triple-point interaction (4 elements) Smooth RT instability (2 elements)

13 Unstructured Mesh R&D: Mesh optimization and high- quality interpolation between meshes

We target high-order curved elements + unstructured meshes + moving meshes

High-order mesh relaxation by neo-Hookean DG advection-based interpolation (ALE evolution (Example 10, ALE remesh) remap, Example 9, radiation transport)

14 Unstructured Mesh R&D: Accurate and flexible finite element visualization

Two visualization options for high-order functions on high-order meshes

GLVis: native MFEM lightweight OpenGL VisIt: general data analysis tool, MFEM visualization tool support since version 2.9

BLAST computation on 2nd order tet mesh

glvis.org .llnl.gov

15 ceed.exascaleproject.org

2 Labs, 5 Universities, 30+ researchers

16 CEED Bake-off Problem 1 on CPU

• All runs done on BG/Q (for repeatability), 8192 cores in C32 mode. Order p = 1, ...,16; quad. points q = p + 2. • BP1 results of MFEM+xlc (left), MFEM+xlc+intrinsics (center), and deal.ii + gcc (right) on BG/Q. • Preliminary results – paper in preparation • Cooperation/collaboration is what makes the bake-offs rewarding.

17 CEED Bake-off Kernel 5 on GPU

• BK5 – BP5 kernel, just local (unassembled) matvec with E-vectors • OCCA-based kernels with a lot of sophisticated tuning • > 2 TFLOPS on single V100 GPU

18 Laghos miniapp– Initial GPU runs on Sierra

Nodes 4,320 POWER9 procs. per node 2 GV100 (Volta) GPUs per node 4 Node Peak (TFLOP/s) 29.1 System Peak (PFLOP/s) 125 Peak Power (MW) ~12

19 Unstructured AMR MFEM’s unstructured AMR infrastructure

Adaptive mesh refinement on library level: – Conforming local refinement on simplex meshes – Non-conforming refinement for quad/hex meshes – h-refinement with fixed p

General approach: – any high-order finite element space, H1, H(curl), Example 15 H(div), …, on any high-order curved mesh – 2D and 3D – arbitrary order hanging nodes – anisotropic refinement – derefinement – serial and parallel, including parallel load balancing – independent of the physics (easy to incorporate in applications) Shaper miniapp 21 AMR = smaller error for same number of unknowns

uniform refinement 1st,2nd,4th,8th order

1st order AMR

2nd order AMR

4th order AMR

8th order AMR

Anisotropic adaptation to shock-like fields in 2D & 3D

22 Static parallel refinement, Lagrangian Sedov problem

8 cores, random non-conforming ref. 4096 cores, random non-conforming ref.

Shock propagates through non-conforming zones without imprinting

23 Parallel dynamic AMR, Lagrangian Sedov problem

Adaptive, viscosity-based refinement and Parallel load balancing based on space- derefinement. 2nd order Lagrangian Sedov filling curve partitioning, 16 cores

24 Application to electromagnetic problems

Magnetiostatics in an angled pipe with Simple ZZ-like anisotropic estimator works anisotropic AMR (similar to Example 3) well for high-order H1 and H(curl) problems

25 Solvers H(curl) diffusion is difficult to solve

EM diffusion modeled by second order definite Maxwell

Challenging due to large “near-nullspace”:

The Auxiliary-space Maxwell Solver (AMS) achieves scalability by reduction to the nodal subspaces

AMS a top 10 breakthrough in 2008 DOE/SC report

Point smoother for AMG solver for AMG solver for

27 AMS has been highly scalable for low-order discretizations on conforming meshes

AMS implementation in hypre requires minimal additional01! fine234-grid25-6+789 information, that MFEM can generate and pass automatically. &#! AMS23220-(big) &"! 140 &!! %! 120 ()*+,-. $!

!"#$%&' 100 /-01( #! 80 /*,2- "! problem 60

! Seconds setup ! 40 '!!!! &!!!!! ()*+",-$.-/,$#"''$,' solve 20 Flux compression 0 generator 0 50000 100000 Number-of-Processors

• Significantly outperforms previous • Scaled up to 125K cores (12B unknowns) solvers up to 25x speedup • Solves previously intractable problems

28 These computations can be parallelized and automated in general finite element libraries,

such as (Bangerth et al., 2011; MFEM, 2014).

We now consider the general settings of high-order elements on a non-conforming mesh.

In this case, each finite element space has two versions: a conforming one, e.g. Vc(T), where

the hanging DoFs are constrained by the conforming (real) DoFs, and a non-conforming

one, e.g. Vnc(T) where the non-conforming DoFs (hanging and real) are unconstrained.

The matrix representation, PV, of the embedding operator E Vc(T) E Vnc(T) 2 7! 2 is precisely the conforming interpolation, see the section on non-conforming meshes. One

can similarly define the matrices PS and PS representing the conforming interpolation

between the nodal spaces Sc(T), Sc(T) and Snc(T), Snc(T), respectivelly. We also introduce

the matrix representations, RV, RS and RS of the restriction operators from the non-

conforming to the conforming space, which simply ignore the non-conforming components

of their input. Note that with these definitions, RVPV, RSPS and RSPS are just the

identity matrices on the conforming DoFs.

V ( ), which can be computed element-by-element, as before, without imposing any con- nc T As mentioned in the non-conforming meshes section, the definite Maxwell matrix B in straints on the hanging DoFs. We now require that this diagram commutes from Sc( )toT We recently applied(16) is defined AMS on the to space highVc(T),-order i.e. B = B Tnonc = PV-BconformingncPV,whereBnc is the Vnc(T) AMR geo-electromagnetics Vnc(T), i.e. version of B, assembled without interpolatory DoFs constraints. Therefore, in order to

apply AMS to B, we need to construct a “conforming discrete gradient” matrix Gc,h that • In the highP-orderVGc,h =case,Gnc,h G P h S and. ⇧ h are more complicated and depend on

the choice of H1 correspondsand H(curl) to thebases gradient mapping from Sc(T)toVc(T). To obtain an appropriate This means that Gc,h and• Gnc,hMFEMare compatiblehas general when objects computing to theevaluate gradient these of a conforming in parallel definition for Gc,h, we consider the diagram nodal function. Since• RDiscreteVPV = I gradient, this implies in thethe conforming non-conforming discrete gradient case is definition defined by commutativity:

Gc,h S ( ) / V ( ) c TO c OT

Gc,h = RVGnc,hPS , where PS RS PV RV

✏ Gnc,h ✏ Snc(T) / Vnc(T) which is what we pass to AMS. Note that by commutativity, the variational auxiliary space • Magnetotelluric whereapplicationGnc,h is the matrix↵ representationE + iE of the gradient mapping 'nc Snc(T) 'nc r⇥ r⇥ 2 7! r 2 matrix for Gc,h is just the constrained version of the variational auxiliary space matrix for 18 ¯ CG Refinement step # DoFs Niter Niter G : nc,h 0 647 544 22 3

T T T T T 6 786 436 22 3 Gc,hBcGc,h = Gc,hPVBncPVGc,h = PS Gnc,hBncGnc,h PS . 12 2 044 156 23 3 18 9 291 488 24 4 Similar considerations imply that the conforming N´ed´elec interpolation matrix should nd ¯ CG 2 order AMR for a Kronotsky volcano model with ~12K-170K cellsRefinement step AMS# DoFs iterationsNiter ( N iter ) for 1e-2 tolerance 29 be defined as 0 647 544 22 3 Table 5: Numerical results for the model shown in Figure 3 and 18 refinement steps. The 6 786 436 22 3 ⇧c,h = RV⇧nc,hPS . number of outer FGMRES and the average numbers of inner CG iterations are denoted by 12 2 044 156 23 3 ¯ CG Niter and Niter ,respectively. 18 9 291 488 24 4 In the high-order case, ⇧nc,h and consequently ⇧c,h can be computed element-wise by evaluating the N´ed´elec DoFs of the nodal vector basis functions as discussed above. We note however that for low-order elements on quadrilateral/hexahedralTable 5: Numerical meshes, results for the the model matrix shown in Figure 3 and 18 refinement steps. The number of outer FGMRES and the average numbers of inner CG iterations are denoted by

⇧ can still be computed based only on G and the conforming¯ vertexCG coordinates as in c,h c,h Niter and Niter ,respectively.

(17). Indeed, the entries of RV and PS are positive in this case, so

46 G = R G P . | c,h| V| nc,h| S

Furthermore, the conformity of the mesh implies Gnc,hxnc = Gnc,hPSxc = PVGc,hxc,so

RVDGnc,hxnc = DGc,hxc RV . 46 19 Applications Application to high-order ALE shock hydrodynamics

hypre: Scalable linear MFEM: Modular finite BLAST: High-order ALE shock solvers library element methods library hydrodynamics research code

www.llnl.gov/casc/hypre mfem.org www.llnl.gov/casc/blast

§ hypre provides scalable algebraic multigrid solvers

§ MFEM provides finite element discretization abstractions • uses hypre’s parallel data structures, provides finite element info to solvers

§ BLAST solves the equations using a high-order ALE framework • combines and extends MFEM’s objects

31 BLAST models shock hydrodynamics using high-order FEM in both Lagrangian and Remap phases of ALE

Lagrange phase Remap phase Physical time evolution Pseudo-time evolution Based on physical motion Based on mesh motion

t =3.0 ⌧ =0

t =1.5 ⌧ =0.5

v Galerkin FEM t =0 ⌧ =1

~ Lagrangian phase (~c = 0) Advection phase (~c = ~vm) d~v d(⇢~v) Momentum Conservation: ⇢ = Gauss-Lobatto basis Momentum Conservation: = ~vm (⇢~v) dt r · d⌧ · r d⇢ d⇢ Mass Conservation: = ⇢ ~v Mass Conservation: = ~vm ⇢ dt r · v Discont. Galerkin d⌧ · r de d(⇢e) Energy Conservation: ⇢ = : ~v Energy Conservation: = ~vm (⇢e) dt r d⌧ · r d~x d~x Equation of Motion: = ~v Mesh velocity: ~vm = dt Bernstein basis d⌧ 32 High-order finite elements lead to more accurate, robust and reliable hydrodynamic simulations

Symmetry in 3D implosion

Symmetry in Sedov blast

Robustness in Parallel ALE for Q4 Rayleigh- Lagrangian shock-3pt Taylor instability (256 cores) axisymm. interaction

33 High-order finite elements have excellent strong scalability

Strong scaling, p-refinement Strong scaling, fixed #dofs

SGH 2D ~600 dofs/zone 256K DOFs

256 cores 1 zone/core

more FLOPs, same runtime

Finite element partial assembly FLOPs increase faster than runtime

34 Petra-M: Physics equation translator with MFEM

• Integrated FEM analysis environment using MFEM for the finite element library • Developed at MIT in close collaboration with the MFEM team at LLNL – Can be used to solve many different systems and geometries • Recent models solved in Lower Hybrid scattering, core to edge ICRF, modeling ICRF far field sheath

• Workflow management using πScope – http://piscope.psfc.mit.edu

35 πScope: Model setup interface & workflow management software

Python data analysis environment Shiraiwa et al, EPJ Web of Conferences 157, 03048 (2017) – Shell/Editor/Debugger Shiraiwa, Invited talk at 59th APS-DPP (2017) 36 – Interactive plotting Shiraiwa et al Nucl. Fusion 57, 086048 (2017) Core-edge integrated solution for C-Mod field-aligned ICRF antenna Jsurf

Enorm

• Wave propagates smoothly from antenna to the core • Surface currents indicates phasing is not exactly 0-pi-0-pi 37 Far-SOL MFEM-based solver on non-field-aligned mesh is being investigated

• High order: Implemented NIMROD benchmarks with MFEM – Anisotropic transport problem with analytic solution to assess error (numerical diffusion) • Tested Cartesian (non-aligned) grid using MFEM – Demonstrates acceptable error at challenging anisotropy level • Flux-based numerical approach results in order of magnitude improvement.

MFEM NIMROD Deg=3

Deg=4

Deg=5

38 Free, lightweight, scalable C++ library for finite element MFEM methods. Supports arbitrary high order discretizations and Lawrence Livermore National Laboratory meshes for a wide variety of applications.

▪ Flexible discretizations on unstructured grids — Triangular, quadrilateral, tetrahedral and hexahedral meshes. — Local conforming and non-conforming refinement. — High-order mesh optimization. High order — Bilinear/linear forms for variety of methods: Galerkin, DG, curved elements Parallel non-conforming AMR DPG, … ▪ High-order methods and scalability — Arbitrary-order H1, H(curl), H(div)- and L2 elements. Arbitrary order curvilinear meshes. — MPI scalable to millions of cores. — Enables application development on wide variety of platforms: from laptops to exascale machines. ▪ Solvers and preconditioners — Integrated with: HYPRE, SUNDIALS, PETSc, SUPERLU, … Surface — Auxiliary-space AMG preconditioners for full de Rham meshes complex Compressible flow ▪ Open-source software Heart modelling ALE simulations — Open-source (GitHub) with thousands of downloads/year worldwide — Part of FASTMath, ECP/CEED, xSDK, OpenHPC, ... http://mfem.org 39 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-PRES-757077

Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.