Increased Efficiency in Finite Element Computations Through Template
Total Page:16
File Type:pdf, Size:1020Kb
Increased Efficiency In Finite Element Computations Through Template Metaprogramming Karl Rupp Christian Doppler Laboratory for Reliability Issues in Microelectronics at the Institute for Microelectronics, TU Wien Gußhausstraße 27–29/E360, A-1040 Wien, Austria [email protected] Keywords: Template Metaprogramming, Finite Elements, ble selection of the functionality needed for a particular prob- Symbolic Integration, lem at compile time. In order to decouple algorithms acting on a certain geom- Abstract etry, a general way to store and access quantities on domain In the area of scientific computing, abstraction was long said elements is necessary. Our approach works for quantities of to be achievable only in exchange for run time efficiency. arbitrary types and is discussed in Sec. 3. With the rise of template metaprogramming [1] in C++ in The first step towards FEM discretizations is to derive the the 1990s, run time efficiency comparable to hand tuned code weak formulation, which is for second order PDEs obtained could be achieved for isolated operations such as matrix- by multiplication with a test function v, integration over the vector multiplication [2]. In more complex scenarios such whole domain and integration by parts. The generic proce- as large finite element simulation packages, traditional object dure then is to select a suitable basis for a finite dimensional oriented programmingis used for most abstractions, so the re- space of test functions and similarly for trial functions, which sulting code usually suffers from reduced run time efficiency. ultimately leads to a system of linear equations that can be We have applied rigorous template metaprogramming to both solved by either direct or iterative solvers. More precisely, if the mesh handling and the mathematical algorithms acting the weak formulation can be written in the abstract form on top, and obtain a high level of abstraction at a run time Find u ∈ U s.t. a(u,v)= L(v) ∀v ∈ V (1) efficiency comparable to that of hand-tuned code. Since the weak formulation of the underlying mathematical problem is with (in case of linear PDEs) bilinear form a(·,·) and linear directly transferred to code, the code effectively meets the ab- form L(·), the resulting system matrix S is given by straction of the mathematical description, including an even- S = (S )N , S = a(ϕ ,ψ ) , (2) tual independence from the underlying spatial dimension. i j i, j=1 i, j j i where ϕ j and ψi are a basis for the finite-dimensional spaces of trial and test functions respectively. A generic finite ele- 1. INTRODUCTION AND OVERVIEW ment implementation must therefore be able to evaluate bilin- The mathematical description of the finite element method ear forms for varying arguments. Moreover, the bilinear form (FEM) is usually independent from the dimension of the is to be supplied by the user, thus the specification must be problem domain Ω, provided that this holds true for the un- as easy and as convenient as possible. Our approach accom- derlying (system of) partial differential equations (PDEs). plishes this via extensive template metaprogramming and is Thus, one can obtain a high level of abstraction in FEM codes discussed in Sec. 4. only if this holds true for the domain handling code (such By the use of metaprogramming,the full information about as iteration over cells and subcells) as well. An abstraction the weak formulation is then available at compile time. This of the underlying geometry is, if at all, in current FEM soft- allows to precompute so-called local element matrices over ware packages typically achieved by the use of standard ob- a reference cell already during the compilation process as is ject oriented programming. The drawback of this approach is presented in Sec. 5. a significant decrease of run time efficiency due to type dis- Type dispatches in standard object oriented programming patches. Moreover, the code is often not flexible enough to are carried out at run time. By the use of template metapro- selectively add or remove required functionality for a partic- gramming, such dispatches are already resolved at compile ular problem, hence a lot of CPU time and memory might time, hence the resulting code is expected to be faster at the be wasted for unnecessary calculations or data in such cases. cost of longer compilation times. We quantify this increase in Our approach to a flexible and fast domain managementusing compilation times and compare execution times of our new template metaprogramming in C++ is presented in Sec. 2. It approach with existing finite element packages and a hand- avoids unnecessary dispatches at run time and allows a flexi- tuned reference implementation in Sec. 6. 4 3 1 struct VertexTag 2 { enum { TopoLevel = 0 }; }; Level 3 1 2 3 struct LineTag 3 3 3 4 { enum { TopoLevel = 1 }; }; 4 4 4 Level 2 5 struct TriangleTag 12 1 2 1 2 6 { enum { TopoLevel = 2 }; }; 4 4 3 3 7 ... 4 3 Level 1 1 1 2 1 2 2 The next step is the specification of tags for the handling of each topological level of a cell, were we introduce the follow- Level 0 1 2 3 4 ing two models: • Figure 1. Topological decomposition of a tetrahedron TopoLevelFullHandling indicates a full storage of elements on that topological level. For example, in a tetrahedron this tag specified for the facets means that all facet triangles are set up at initialization and stored within the cell. 2. DOMAIN MANAGEMENT • If TopoLevelNoHandling is used, the cell does not For the implementation of algorithms for arbitrary spatial care about elements on that topology level. dimensions, a clean and general way to access mesh related A configuration class for each element type configures the de- quantities is necessary. Our approach is to first break cells sired implementation: down into topological levels [3] and then to use configura- 1 //declaration: tion classes that specify the desired implementation for each 2 template <typename ElementTag_, level separately. The subelements at lower topological lev- 3 long level> els of a tetrahedron are shown in Fig. 1. Elements on level 4 struct TopologyLevel; zero are called vertices, on level one they are called edges, 5 whereas elements of maximum topological level are called 6 // topological description of cells and elements with codimension one are called facets.A 7 // a tetrahedron’s vertices domain corresponds to the full problem domain and consists 8 template <> of at least one segment, which is the container for all elements 9 struct TopologyLevel<TetrahedronTag, 0> located therein. 10 { typedef PointTag ElementTag; 11 typedef TopoLevelFullHandling Our aim is to provide a simple means of customizing the 12 HandlingTag; domain management for the topological needs of a certain al- 13 14 enum{ ElementNum = 4 }; //4 vertices gorithm. The performance critical path in such algorithms is 15 }; in many cases related to the iteration over domain elements of 16 different levels, hence this functionality must be provided in 17 // topological description of a generic way at minimum computational costs. Consider for 18 // a tetrahedron’s edges example a tetrahedral mesh and an algorithm that is a-priori 19 template <> known to iterate over edges only. In such a case it does not 20 struct TopologyLevel<TetrahedronTag, 1> make any sense to explicitly store facets in memory, whereas 21 { typedef LineTag ElementTag; this would make perfect sense for a different algorithm that 22 typedef TopoLevelNoHandling needs to iterate over facets. To be able to handle both scenar- 23 HandlingTag; ios at minimum costs, each topological level can be config- 24 25 enum{ ElementNum = 6 }; //6 edges ured separately via type definitions. 26 }; 27 Following the ideas of policy classes and tagging [4], 28 // similar for other elements and levels we define small classes that indicate or realize a spe- cific behavior. The concept can be applied directly to The snippet above shows the configuration for a tetrahe- polyhedral shapes, but for the sake of clarity we re- dron that stores its vertices, but does not store its edges. strict ourselves to simplex elements in the following. The The actual realization of the storage scheme for subele- first step is the introduction of simple tag classes that ments, say for triangular facets, are then again speci- hold the topological level (dimension) of the element: fied by additional partial specializations TopologyLevel< TriangleTag, level> for each topological level level. the CellType. Since the topological level (in the above snip- This allows rather complex scenarios. Suppose an algorithm pet equal to 1) is provided as template parameter, the com- needs to iterate over all edges of all facets in the domain. piler is able to eliminate all indirections at compile time and Clearly, there is no need to store the edges on the cell, instead generate a very efficient executable. This is in contrast to tra- it is sufficient to store edges on the facets. Such behavior can ditional object-oriented programming, where the use of a sin- be configured easily with the above configuration method. gle iterator class would lead to excessive run time dispatches This way all the requirements for mesh handling can be en- in order to provide a similar functionality. coded into a class hierarchy that can be evaluated at compile time and the compiler can then select the appropriate imple- mentations. 3. QUANTITY STORAGE The final domain configuration is again supplied by type Since the domain management was not tailored to a par- definitions. For example, for a two-dimensionalmesh consist- ticular algorithm, a convenient and fast way of storing and ing of triangles and double precision arithmetic, one defines accessing quantities of arbitrary type is the key for the ap- plication of any kind of algorithms.