Overview of the Modular Finite Element Method (MFEM)

Overview of MFEM (Modular Finite Elements Methods Library) BOUT++ Workshop @ LLNL Tzanio Kolev August 15, 2018 with the MFEM and BLAST teams LLNL-PRES-757077 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Finite elements are a good foundation for large-scale simulations on current and future architectures § Large-scale parallel multi-physics simulations • radiation diffusion • electromagnetic diffusion • compressible hydrodynamics § Finite elements naturally connect different physics H(grad) r H(curl) r⇥ H(div) r· L ! ! ! 2 8th order Lagrangian simulation “nodes” “edges” “faces” “elems” of shock triple-point interaction High-order High-order High-order High-order kinematics MHD rad. diffusion thermodynamics § High-order finite elements on high-order meshes • increased accuracy for smooth problems • sub-element modeling for problems with shocks • HPC utilization, FLOPs/bytes increase with the order § Need new (interesting!) R&D for full benefits • meshing, discretizations, solvers, AMR, UQ, visualization, … Inertial Confinement Fusion 2 Modular Finite Element Methods (MFEM) MFEM is an open-source C++ library for scalable FE research and fast application prototyping § Triangular, quadrilateral, tetrahedral and hexahedral; volume and surface meshes § Arbitrary order curvilinear mesh elements § Arbitrary-order H1, H(curl), H(div)- and L2 elements Linear, quadratic and cubic finite element spaces on curved meshes § Local conforming and non-conforming refinement § NURBS geometries and discretizations § Bilinear/linear forms for variety of methods (Galerkin, DG, DPG, Isogeometric, …) § Integrated with: HYPRE, SUNDIALS, PETSc, SUPERLU, PUMI, VisIt, Spack, xSDK, OpenHPC, and more … § Parallel and highly performant mfem.org (v3.4, May/2018) § Main component of ECP’s co-design Center for Efficient Exascale Discretizations (CEED) § Native “in-situ” visualization: GLVis, glvis.org 3 Example 1 – Laplace equation § Mesh § Linear solve § Visualization § Finite element space § Initial guess, linear/bilinear forms § works for any mesh & any H1 order § builds without external dependencies 4 Example 1 – Laplace equation § Mesh 5 Example 1 – Laplace equation § Finite element space 6 Example 1 – Laplace equation § Initial guess, linear/bilinear forms 7 Example 1 – Laplace equation § Linear solve § Visualization 8 Example 1 – parallel Laplace equation First Parallel Layer: CPU/MPI Domain Decomposition §ParallelParallel data decomposition mesh in BLAST § Parallel linear solve with AMG Each CPU is assigned a subdomain consisting of a number of zones MFEM handles the translation between local finite element bilinear forms / grid functions and global parallel matrices / vectors. Just a few MPI calls (MPI_Bcast and MPI_Allreduce). MPI-based parallel finite elements in MFEM Parallel mesh § Visualization First Parallel Layer: CPU/MPI Domain Decomposition ⇥(1) ⇥(2) Parallel data decomposition in BLAST Each CPU is assigned a subdomain consisting of a number of zones MFEM handles the translation between local finite element bilinear forms / grid functions and global parallel matrices / vectors. (1)JustParallel a few mesh MPI splittingcalls (MPI_Bcast (domain decompositionand MPI_Allreduce using METIS). ). § (2)ParallelParallel mesh refinement. finite element space MPI-based parallel finite elements in MFEM ParallelParallel finite mesh element space ParallelParallel sti finiteffness element matrix space and load vector Kolev et al. (LLNL) High-Order Finite Elements for Lagrangian Hydro MultiMat 2011 15 / 30 ⇥(1) ⇥(2) (1) Find shared degrees of freedom (dofs). (2) Form groups of dofs and assignP ownership.: true dof dof (3) Build a parallel Boolean matrix P = dofs truedofs identifying each dof with a master (true) dof. We use the ParCSR format in the hypre library for parallel matrix7! storage. § ParallelParallel stiffness matrix initial and load vector guess, linear/bilinear forms Kolev et al. (LLNL) High-Order Finite Elements for Lagrangian Hydro MultiMat 2011 15 / 30 § Parallel assembly § highly scalable with minimal changes A = P T aP B = P T b x = PX § build depends on hypre and METIS 9 Example 1 – parallel Laplace equation 10 MFEM example codes – mfem.org/examples 11 High-Order Methods High-order discretizations pose unique challenges Shock triple-point interaction (4 elements) Smooth RT instability (2 elements) 13 Unstructured Mesh R&D: Mesh optimization and high- quality interpolation between meshes We target high-order curved elements + unstructured meshes + moving meshes High-order mesh relaxation by neo-Hookean DG advection-based interpolation (ALE evolution (Example 10, ALE remesh) remap, Example 9, radiation transport) 14 Unstructured Mesh R&D: Accurate and flexible finite element visualization Two visualization options for high-order functions on high-order meshes GLVis: native MFEM lightweight OpenGL VisIt: general data analysis tool, MFEM visualization tool support since version 2.9 BLAST computation on 2nd order tet mesh glvis.org visit.llnl.gov 15 ceed.exascaleproject.org 2 Labs, 5 Universities, 30+ researchers 16 CEED Bake-off Problem 1 on CPU • All runs done on BG/Q (for repeatability), 8192 cores in C32 mode. Order p = 1, ...,16; quad. points q = p + 2. • BP1 results of MFEM+xlc (left), MFEM+xlc+intrinsics (center), and deal.ii + gcc (right) on BG/Q. • Preliminary results – paper in preparation • Cooperation/collaboration is what makes the bake-offs rewarding. 17 CEED Bake-off Kernel 5 on GPU • BK5 – BP5 kernel, just local (unassembled) matvec with E-vectors • OCCA-based kernels with a lot of sophisticated tuning • > 2 TFLOPS on single V100 GPU 18 Laghos miniapp– Initial GPU runs on Sierra Nodes 4,320 POWER9 procs. per node 2 GV100 (Volta) GPUs per node 4 Node Peak (TFLOP/s) 29.1 System Peak (PFLOP/s) 125 Peak Power (MW) ~12 19 Unstructured AMR MFEM’s unstructured AMR infrastructure Adaptive mesh refinement on library level: – Conforming local refinement on simplex meshes – Non-conforming refinement for quad/hex meshes – h-refinement with fixed p General approach: – any high-order finite element space, H1, H(curl), Example 15 H(div), …, on any high-order curved mesh – 2D and 3D – arBitrary order hanging nodes – anisotropic refinement – derefinement – serial and parallel, including parallel load Balancing – independent of the physics (easy to incorporate in applications) Shaper miniapp 21 AMR = smaller error for same number of unknowns uniform refinement 1st,2nd,4th,8th order 1st order AMR 2nd order AMR 4th order AMR 8th order AMR Anisotropic adaptation to shock-like fields in 2D & 3D 22 Static parallel refinement, Lagrangian Sedov problem 8 cores, random non-conforming ref. 4096 cores, random non-conforming ref. Shock propagates through non-conforming zones without imprinting 23 Parallel dynamic AMR, Lagrangian Sedov problem Adaptive, viscosity-based refinement and Parallel load balancing based on space- derefinement. 2nd order Lagrangian Sedov filling curve partitioning, 16 cores 24 Application to electromagnetic problems Magnetiostatics in an angled pipe with Simple ZZ-like anisotropic estimator works anisotropic AMR (similar to Example 3) well for high-order H1 and H(curl) problems 25 Solvers H(curl) diffusion is difficult to solve EM diffusion modeled by second order definite Maxwell Challenging due to large “near-nullspace”: The Auxiliary-space Maxwell Solver (AMS) achieves scalability by reduction to the nodal subspaces AMS a top 10 breakthrough in 2008 DOE/SC report Point smoother for AMG solver for AMG solver for 27 AMS has been highly scalable for low-order discretizations on conforming meshes AMS implementation in hypre requires minimal additional01! fine234-grid25-6+789 information, that MFEM can generate and pass automatically. &#! AMS23220-(big) &"! 140 &!! %! 120 ()*+,-. $! !"#$%&' 100 /-01( #! 80 /*,2- "! problem 60 ! Seconds setup ! 40 '!!!! &!!!!! ()*+",-$.-/,$#"''$,' solve 20 Flux compression 0 generator 0 50000 100000 Number-of-Processors • Significantly outperforms previous • Scaled up to 125K cores (12B unknowns) solvers up to 25x speedup • Solves previously intractable problems 28 These computations can be parallelized and automated in general finite element libraries, such as (Bangerth et al., 2011; MFEM, 2014). We now consider the general settings of high-order elements on a non-conforming mesh. In this case, each finite element space has two versions: a conforming one, e.g. Vc(T), where the hanging DoFs are constrained by the conforming (real) DoFs, and a non-conforming one, e.g. Vnc(T) where the non-conforming DoFs (hanging and real) are unconstrained. The matrix representation, PV, of the embedding operator E Vc(T) E Vnc(T) 2 7! 2 is precisely the conforming interpolation, see the section on non-conforming meshes. One can similarly define the matrices PS and PS representing the conforming interpolation between the nodal spaces Sc(T), Sc(T) and Snc(T), Snc(T), respectivelly. We also introduce the matrix representations, RV, RS and RS of the restriction operators from the non- conforming to the conforming space, which simply ignore the non-conforming components of their input. Note that with these definitions, RVPV, RSPS and RSPS are just the identity matrices on the conforming DoFs. V ( ), which can be computed element-by-element, as before, without imposing any con- nc T As mentioned in the non-conforming meshes section, the definite Maxwell matrix B in straints on the hanging

Load more