Scalable Parallel Implementation of CISAMR: a Non-Iterative Mesh Generation Algorithm

Computational Mechanics manuscript No. (will be inserted by the editor) Scalable parallel implementation of CISAMR: A non-iterative mesh generation algorithm Bowen Liang · Anand Nagarajan · Soheil Soghrati Received: date / Accepted: date Abstract We present the parallel implementation of a Keywords Parallel mesh generation, Finite element, non-iterative mesh generation algorithm, named Con- Scalability, CISAMR, Heterogeneous materials forming to Interface Structured Adaptive Mesh Refine- ment (CISAMR). The partitioning phase is tightly inte- grated with a microstructure reconstruction algorithm 1 Introduction to determine the optimized arrangement of partitions based on shapes/sizes of particles. Each processor then The availability of high-performance parallel comput- creates a structured sub-mesh with one layer of ghost ing resources has enabled scientists and engineers to elements for its designated partition. h-adaptivity and simulate a variety of physical phenomena using large- r-adaptivity phases of the CISAMR algorithm are also scale models with unprecedented geometrical details [1, carried out independently in each sub-mesh. Processors 2]. In order to perform such computationally expensive then communicate to merge mesh/hanging nodes along simulations using the finite element method (FEM), the faces shared between neighboring partitions. The final typical workflow begins with constructing the geomet- mesh is constructed by performing face-swapping and rical model and discretizing that into an appropriate element subdivision phases, after which a minimal com- conforming mesh on a workstation [2]. Next, mesh files munication phase is required in 3D CISAMR to merge are transferred into a supercomputer to perform mesh new nodes created on partition boundaries. Several ex- partitioning [3] and subsequently approximate the gov- ample problems, together with scalability tests demon- erning partial differential equations (PDEs) using a par- strating a super-linear speedup, are provided to show allel solver [4]. However, as such simulations grow be- the application of parallel CISAMR for generating mas- yond tens of millions of degrees of freedom (DOFs), sive conforming meshes. the initial phase of constructing a massive unstructured mesh could alone be a highly computationally demand- ing task that involves solving billions of nonlinear geometrical equations. Besides the high computational cost Bowen Liang Department of Mechanical and Aerospace Engineering associated with this process, it may not even be feasi- The Ohio State University ble to generate such meshes sequentially due to memory 201 West 19th Avenue, Columbus, OH limitations [5]. Therefore, significant research effort [6, Anand Nagarajan 7,8] has been dedicated to develop parallel mesh gener- Department of Mechanical and Aerospace Engineering ation algorithms that satisfy four major objectives [8, The Ohio State University 9,10]: (i) stability to enable constructing high-quality 201 West 19th Avenue, Columbus, OH meshes for a variety of geometrical models; (ii) efficient Soheil Soghrati (corresponding author) domain decomposition to achieve load balancing and Department of Mechanical and Aerospace Engineering Department of Materials Science and Engineering reduce the pre-processing overhead; (iii) code re-use to The Ohio State University allow utilizing a pre-existing optimized sequential mesh 201 W. 19th Avenue, Columbus, OH generation code; and (iv) scalability to achieve a linear E-mail: [email protected]. speedup for large-scale problems. 2 Bowen Liang et al. In order to create massive meshes, while satisfying yields a near-linear speedup and high performance on the criteria enumerated above, parallel mesh generation shared memory machines. algorithms often have a sequential pre-processing phase In the advancing front method, new elements are for partitioning the entire domain into optimized sub- progressively inserted to discretize the entire domain regions [2]. Two objectives of this partitioning phase are by applying certain geometrical constraints on active to achieve balanced load measures for all processors and fronts [26]. A parallel version of this algorithm is in- to minimize the total area of shared interfaces between troduced in [19], where all interior sub-regions are first them [11,12]. Several domain decomposition techniques meshed independently and then a buffer zone is utilized have been introduced, which can be categorized into to synchronize corresponding meshes along shared in- discrete and continuous methods [10]. The former be- terfaces of neighboring partitions. Using a discrete do- gins with generating an initial coarse background mesh main decomposition approach and surface mesh genera- that conforms to domain boundaries using a sequen- tion as pre-processing phases, a parallel advancing front tial mesh generator. Graph partitioning algorithms [13, algorithm is presented in [5] that generates the mesh for 14,15] are then employed to decompose vertices of the each sub-region independently. In the last step of this corresponding graph structure into a number of simi- method, an iterative smoothing algorithm is applied to larly sized partitions and simultaneously minimize the reconstruct elements near shared interfaces to improve number of connecting edges. Popular open-source graph the mesh quality. In edge subdivision based algorithms partitioning libraries such as Metis/Parallel Metis [3] [27,28], a triangle (2D) or a tetrahedron (3D) is bisected and Chaco [16] can be used for this purpose. In con- by its longest-edge midpoint and vertices to generate a tinuous domain decomposition methods, on the other conforming mesh. In their parallel versions introduced hand, the original domain is directly partitioned us- in [29,30], a number of subdivision templates are em- ing quadtree/octree [11,12] or medial axis [17] tech- ployed to decouple the refinement process on different niques. In addition to avoiding the overhead associ- processors and minimize the communication cost. The ated with creating an initial background mesh and the terminal-edge bisection method [8] is an inherently de- corresponding graph data structure, continuous meth- coupled algorithm and thus scalable in the parallel im- ods enable a better code-reuse by applying the sequen- plementation. A critical review of advantages and lim- tial meshing code to each partition independently [18]. itations of different parallel mesh generation algorithm However, these advantages come at the price of gen- is provided in [10]. erating polyhedral surfaces, which can deteriorate the Majority of existing algorithms for parallel mesh convergence rate of the meshing algorithm, as well as generation [5,20,31], including those discussed above, the quality of the resulting mesh [10]. require an iterative smoothing/optimization phase in- volving edge/face swapping [32], removal of bad ele- After partitioning the domain, several robust algo- ments, or Laplacian smoothing [33] to improve the qual- rithms can be utilized to build a massive conforming ity of elements. This iterative phase could be time- mesh in parallel, among which we can mention Delau- consuming and in some cases difficult to converge due to nay triangulation based techniques [17], advancing front the geometrical complexity, especially for the construc- [19,20], and edge subdivision methods [21]. In the De- tion of massive meshes [10]. This often leads to a tightly launay triangulation algorithm, a nonlinear system of coupled problem near partition interfaces, which could equations is iteratively solved and new mesh nodes are cause slow convergence, excessive communication cost, created/relocated until a set of constraints on the mesh and even inability to build the final mesh. Recently, quality and element size are satisfied [22,23]. A paral- Soghrati et al: [34,35] have introduced the Conform- lel 3D implementation of this algorithm is introduced ing to Interface Structured Adaptive Mesh Refinement in [17], which achieves a linear speedup using data- (CISAMR) algorithm, that enables the non-iterative parallel architectures and by expanding open faces via a construction of high-quality conforming meshes. In this bucketing technique. By combining boundary merging approach, an initial structured mesh is transformed into with an incremental construction algorithm, the inde- a conforming mesh with proper element shapes and as- pendent parallel near Delaunay triangulation (IPNDT) pect ratios in four consecutive steps: (i) h-adaptive re- method [24] allows efficient mesh generation with a low finement in the vicinity of material interfaces; (ii) r- overhead. Two fully decoupled parallel Delaunay meth- adaptivity to relocate selected nodes; (iii) non-iterative ods are introduced in [18,25], which utilize a continu- face-swapping to eliminate highly distorted tetrahedrons; ous domain decomposition approach, together with a and (iv) sub-division of remaining nonconforming ele- pre-processing interface zone for mesh generation. The ments. A more comprehensive overview of the sequen- parallel sparse Voronoi refinement (SVR) algorithm [7] tial CISAMR algorithm is provided in Section2. Scalable parallel implementation of CISAMR: A non-iterative mesh generation algorithm 3 In this manuscript, we present a scalable parallel subdividing algorithms. Each step is described in more implementation of CISAMR and demonstrate its ap- details next. plication for the construction of massive conforming ◦ h-adaptive refinement: In order to reduce the geo- meshes for materials

Load more