
Computational Mechanics manuscript No. (will be inserted by the editor) Scalable parallel implementation of CISAMR: A non-iterative mesh generation algorithm Bowen Liang · Anand Nagarajan · Soheil Soghrati Received: date / Accepted: date Abstract We present the parallel implementation of a Keywords Parallel mesh generation, Finite element, non-iterative mesh generation algorithm, named Con- Scalability, CISAMR, Heterogeneous materials forming to Interface Structured Adaptive Mesh Refine- ment (CISAMR). The partitioning phase is tightly inte- grated with a microstructure reconstruction algorithm 1 Introduction to determine the optimized arrangement of partitions based on shapes/sizes of particles. Each processor then The availability of high-performance parallel comput- creates a structured sub-mesh with one layer of ghost ing resources has enabled scientists and engineers to elements for its designated partition. h-adaptivity and simulate a variety of physical phenomena using large- r-adaptivity phases of the CISAMR algorithm are also scale models with unprecedented geometrical details [1, carried out independently in each sub-mesh. Processors 2]. In order to perform such computationally expensive then communicate to merge mesh/hanging nodes along simulations using the finite element method (FEM), the faces shared between neighboring partitions. The final typical workflow begins with constructing the geomet- mesh is constructed by performing face-swapping and rical model and discretizing that into an appropriate element subdivision phases, after which a minimal com- conforming mesh on a workstation [2]. Next, mesh files munication phase is required in 3D CISAMR to merge are transferred into a supercomputer to perform mesh new nodes created on partition boundaries. Several ex- partitioning [3] and subsequently approximate the gov- ample problems, together with scalability tests demon- erning partial differential equations (PDEs) using a par- strating a super-linear speedup, are provided to show allel solver [4]. However, as such simulations grow be- the application of parallel CISAMR for generating mas- yond tens of millions of degrees of freedom (DOFs), sive conforming meshes. the initial phase of constructing a massive unstructured mesh could alone be a highly computationally demand- ing task that involves solving billions of nonlinear geo- metrical equations. Besides the high computational cost Bowen Liang Department of Mechanical and Aerospace Engineering associated with this process, it may not even be feasi- The Ohio State University ble to generate such meshes sequentially due to memory 201 West 19th Avenue, Columbus, OH limitations [5]. Therefore, significant research effort [6, Anand Nagarajan 7,8] has been dedicated to develop parallel mesh gener- Department of Mechanical and Aerospace Engineering ation algorithms that satisfy four major objectives [8, The Ohio State University 9,10]: (i) stability to enable constructing high-quality 201 West 19th Avenue, Columbus, OH meshes for a variety of geometrical models; (ii) efficient Soheil Soghrati (corresponding author) domain decomposition to achieve load balancing and Department of Mechanical and Aerospace Engineering Department of Materials Science and Engineering reduce the pre-processing overhead; (iii) code re-use to The Ohio State University allow utilizing a pre-existing optimized sequential mesh 201 W. 19th Avenue, Columbus, OH generation code; and (iv) scalability to achieve a linear E-mail: [email protected]. speedup for large-scale problems. 2 Bowen Liang et al. In order to create massive meshes, while satisfying yields a near-linear speedup and high performance on the criteria enumerated above, parallel mesh generation shared memory machines. algorithms often have a sequential pre-processing phase In the advancing front method, new elements are for partitioning the entire domain into optimized sub- progressively inserted to discretize the entire domain regions [2]. Two objectives of this partitioning phase are by applying certain geometrical constraints on active to achieve balanced load measures for all processors and fronts [26]. A parallel version of this algorithm is in- to minimize the total area of shared interfaces between troduced in [19], where all interior sub-regions are first them [11,12]. Several domain decomposition techniques meshed independently and then a buffer zone is utilized have been introduced, which can be categorized into to synchronize corresponding meshes along shared in- discrete and continuous methods [10]. The former be- terfaces of neighboring partitions. Using a discrete do- gins with generating an initial coarse background mesh main decomposition approach and surface mesh genera- that conforms to domain boundaries using a sequen- tion as pre-processing phases, a parallel advancing front tial mesh generator. Graph partitioning algorithms [13, algorithm is presented in [5] that generates the mesh for 14,15] are then employed to decompose vertices of the each sub-region independently. In the last step of this corresponding graph structure into a number of simi- method, an iterative smoothing algorithm is applied to larly sized partitions and simultaneously minimize the reconstruct elements near shared interfaces to improve number of connecting edges. Popular open-source graph the mesh quality. In edge subdivision based algorithms partitioning libraries such as Metis/Parallel Metis [3] [27,28], a triangle (2D) or a tetrahedron (3D) is bisected and Chaco [16] can be used for this purpose. In con- by its longest-edge midpoint and vertices to generate a tinuous domain decomposition methods, on the other conforming mesh. In their parallel versions introduced hand, the original domain is directly partitioned us- in [29,30], a number of subdivision templates are em- ing quadtree/octree [11,12] or medial axis [17] tech- ployed to decouple the refinement process on different niques. In addition to avoiding the overhead associ- processors and minimize the communication cost. The ated with creating an initial background mesh and the terminal-edge bisection method [8] is an inherently de- corresponding graph data structure, continuous meth- coupled algorithm and thus scalable in the parallel im- ods enable a better code-reuse by applying the sequen- plementation. A critical review of advantages and lim- tial meshing code to each partition independently [18]. itations of different parallel mesh generation algorithm However, these advantages come at the price of gen- is provided in [10]. erating polyhedral surfaces, which can deteriorate the Majority of existing algorithms for parallel mesh convergence rate of the meshing algorithm, as well as generation [5,20,31], including those discussed above, the quality of the resulting mesh [10]. require an iterative smoothing/optimization phase in- volving edge/face swapping [32], removal of bad ele- After partitioning the domain, several robust algo- ments, or Laplacian smoothing [33] to improve the qual- rithms can be utilized to build a massive conforming ity of elements. This iterative phase could be time- mesh in parallel, among which we can mention Delau- consuming and in some cases difficult to converge due to nay triangulation based techniques [17], advancing front the geometrical complexity, especially for the construc- [19,20], and edge subdivision methods [21]. In the De- tion of massive meshes [10]. This often leads to a tightly launay triangulation algorithm, a nonlinear system of coupled problem near partition interfaces, which could equations is iteratively solved and new mesh nodes are cause slow convergence, excessive communication cost, created/relocated until a set of constraints on the mesh and even inability to build the final mesh. Recently, quality and element size are satisfied [22,23]. A paral- Soghrati et al: [34,35] have introduced the Conform- lel 3D implementation of this algorithm is introduced ing to Interface Structured Adaptive Mesh Refinement in [17], which achieves a linear speedup using data- (CISAMR) algorithm, that enables the non-iterative parallel architectures and by expanding open faces via a construction of high-quality conforming meshes. In this bucketing technique. By combining boundary merging approach, an initial structured mesh is transformed into with an incremental construction algorithm, the inde- a conforming mesh with proper element shapes and as- pendent parallel near Delaunay triangulation (IPNDT) pect ratios in four consecutive steps: (i) h-adaptive re- method [24] allows efficient mesh generation with a low finement in the vicinity of material interfaces; (ii) r- overhead. Two fully decoupled parallel Delaunay meth- adaptivity to relocate selected nodes; (iii) non-iterative ods are introduced in [18,25], which utilize a continu- face-swapping to eliminate highly distorted tetrahedrons; ous domain decomposition approach, together with a and (iv) sub-division of remaining nonconforming ele- pre-processing interface zone for mesh generation. The ments. A more comprehensive overview of the sequen- parallel sparse Voronoi refinement (SVR) algorithm [7] tial CISAMR algorithm is provided in Section2. Scalable parallel implementation of CISAMR: A non-iterative mesh generation algorithm 3 In this manuscript, we present a scalable parallel subdividing algorithms. Each step is described in more implementation of CISAMR and demonstrate its ap- details next. plication for the construction of massive conforming ◦ h-adaptive refinement: In order to reduce the geo- meshes for materials
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages24 Page
-
File Size-