DISSERTATION

Concepts for Scientific Computing

ausgefuhrt¨ zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Wissenschaften

eingereicht an der Technischen Universitat¨ Wien Fakultat¨ fur¨ Elektrotechnik und Informationstechnik von

RENE´ HEINZL

Rosensteingasse 7-9/2 A-1160 Wien, Osterreich¨

Matr. Nr. 9826031 geboren am 6. September 1977, Wien

Wien, im Juli 2007 Abstract

Scientific computing has traditionally been concerned with numerical issues such as the convergence of discrete approximations to partial differential equations, the stability of integration methods for time- dependent systems, and the computational efficiency of software implementations of these numerical methods. While computer performance is steadily increasing the additional complexity of these simulation models easily outgrows this gain in computational power. It is therefore of utmost importance to employ the latest techniques of software development to obtain high performance and thereby ensure adequate simulation times even for complex problems. As a consequence the development of high performance simulation software is quite challenging. Originating in the field of technology computer aided design as an important and complex area of sci- entific computing, this work is motivated by the fact, that different concepts were developed during the last decades for the field of scientific computing. The great diversity of physical phenomena present in semiconductor devices themselves and in the processes involved in their manufacture make the field of TCAD extremely challenging. Each of the phenomena can be described by differential equations of vary- ing complexity. The development of several different discretisation schemes has been necessary in order to best model the underlying physics and to accommodate the mathematical peculiarities of each of these equations while transferring them to the discrete world of digital computing. But, up to now, no common application, nor widely useable modules for application design, has emerged which can be used in various areas. This work deals with concepts for scientific computing, especially concepts for generic and high performance application design. Therefore this work introduces different generic data models to provide effective representations across a very wide range of application areas. Finally, a set of scientific computing components that operate on any data represented in the model are presented. Not only are generic models introduced, but also the corresponding paradigms are presented which inherently model the necessary requirements for an orthogonal extension and enhancement. The final application section presents generic algorithms and implementations in a general purpose way. Cus- tomized algorithms can be implemented easily where they prove most valuable with the corresponding programming paradigm orthogonaly.

i Danksagung

At most I want to thank Prof. Selberherr for the everlasting support in many areas. The working environ- ment, from the hardware to the various people at the Instiute of Microelectronics, is more than fabulous. And most important for supporting my ideas from the beginning. Prof. Grasser for giving me the opportunity to work at the institute and for his support in various areas and for giving me the opportunity to learn a lot. My special thank goes to Philipp Schwaha who is one of the most wonderful and brilliant people I have ever worked with and from whom I have learned so much. For all the valuable discussions and for ever believing in the work. For all the travels around the world and for making the hard days much easier. I also want to thank our former collegue Michael Spevak, who have introduced me in his world of mathematics and his special view onto different topics. Many parts of my own work were influenced by these two people and so a deep gratitude will always be with you two. Franz Stimpfl for joining and fitting so well into the generic group at the institute. Andreas Hoessinger who introduced me to the ++ and process simulation world, which was the entry point for this work. My thanks goes also to . Enichlmayer (Infineon) and . Minixhofer (AMS) for valuable insights, real world examples and support from the company side. Tatsumi-san and Kimura-san (Sony) for giving me the change to work on the forefront of technology. Peter Fleischmann, Steven Cea and Stefan Halama from Intel for the great opportunity to work on the leading edge and providing real examples. Markus Schordan from the institute for computer languages for valuable support in the field of compiler optimizations and insights into compiler technology and AST handling. Hartmut Kaiser and Joel de Guzman for deep C++ insights, especially in parser and functional program- ming methodologies. Werner Benger for introducing me into the fiber bundle model and a great source of inspiration in various fields. I also want to thank Yaakoub El and the rest of the group at CCT for CFD. To Joyce Visne for her never ending help and support. Also I want to thank the “next generation” of students which are already using a lot of concepts introduced in this work, namely Josef Weinbub, Georg Mach, Clemens Kl¨ockler, ¨urgen Steininger, and Markus Schwaha. And most importantly I want to thank my parents for giving me the chance to see and becoming who I am.

ii Contents

0.1 SummaryofNotation...... 2

1 Introduction 4 1.1 ScientificComputing ...... 5 1.2 Technology Computer Aided Design ...... 7 1.3 Motivation...... 8 1.4 Contributions ...... 9

I Theoretical Concepts 10

2 Mathematical Concepts 11 2.1 Introduction...... 11 2.2 Common Terms and Linear Algebra ...... 15 2.3 Topological Toolkit ...... 17 2.3.1 OrderTheoryandSets ...... 20 2.3.2 CellTopology...... 22 2.3.3 ComplexTopology ...... 23 2.4 FiberBundles...... 25 2.5 Manifolds ...... 26 2.6 Computational Topology ...... 30 2.6.1 Chains ...... 31 2.6.2 Cochains ...... 34 2.6.3 Duality between Chains and Cochains ...... 36 2.6.4 HomologyandCohomology ...... 37 2.7 Fiber Bundles and Chains/Cochains ...... 39 2.8 Topological Data Structures ...... 40 2.9 Physical Fields and Electromagnetics ...... 43

3 Numerical Discretization Schemes 49 3.1 Generic Discretization Concepts ...... 51

iii 3.1.1 Domain Discretization ...... 51 3.1.2 Topological Equations ...... 53 3.1.3 Constitutive Relation Discretization ...... 53 3.2 FiniteVolumes ...... 55 3.2.1 BasicConcepts ...... 55 3.2.2 Analysis of the Finite Volume Method ...... 57 3.3 FiniteElements ...... 59 3.3.1 Basic Concepts for a Galerkin Method ...... 60 3.3.2 Analysis of the Finite Element Method ...... 63 3.3.3 FurtherAnalysis ...... 64 3.4 FiniteDifferences...... 65 3.4.1 BasicConcepts ...... 65 3.4.2 Analysis of the Finite Difference Method ...... 67 3.4.3 FurtherAnalysis ...... 69 3.5 Summary ...... 72

II Practical Concepts 73

4 Introduction 74

5 Computational Concepts 76 5.1 GenericDataModel...... 76 5.1.1 AbstractionLevels ...... 77 5.1.2 FiberBundleasDataModel ...... 78 5.1.3 Fiber Bundle Data Hierarchy ...... 80 5.2 Evolution of Programming Paradigms ...... 82 5.3 Library-Centric Software Design ...... 85 5.3.1 Multi-Paradigm Approach ...... 86 5.3.2 Analysis of Programming Paradigms ...... 86

6 Programming Concepts 88 6.1 GenericDataStructures...... 89 6.2 Separation of Base and Fiber Space ...... 93 6.3 Standard Algorithms for Scientific Computing ...... 96 6.3.1 GenericTraversal...... 97 6.3.2 QuantityAccessors...... 98 6.3.3 GenericCalculation ...... 98 6.3.4 Interpolation ...... 98

iv 6.3.5 PointLocation ...... 99 6.3.6 AssemblyAlgorithms ...... 99 6.4 Domain Specific Embedded Language ...... 101 6.4.1 Requirements for Programming Languages ...... 103 6.4.2 Building Elements for the Embedded Language ...... 104 6.4.3 Application...... 104 6.5 VisualProgramming ...... 106 6.5.1 Visually Aided Specification Process ...... 106 6.5.2 Stable Program Development ...... 108 6.6 SolutionTechniques...... 109 6.6.1 Requirements from Algebraic Solving Systems ...... 109 6.6.2 Linearized Functions ...... 109

III Applied Concepts 112

7 Areas of Application 113

8 A Generic Scientific Simulation Environment 114 8.1 Generic Topology Library: GTL ...... 116 8.1.1 QuantityAccessors...... 118 8.1.2 The0-CellComplex ...... 119 8.1.3 The1-CellComplex ...... 121 8.1.4 The2-CellComplex ...... 121 8.1.5 Higher-Dimensional Cell Complexes ...... 124 8.1.6 Topological Mappings ...... 129 8.1.7 Implementation Details ...... 129 8.2 Generic Functor Library: GFL ...... 131 8.3 Generic Solver Interface, GSI ...... 134 8.4 Generic Orthogonal Range Queries ...... 135 8.5 Generic File Interface, GFI ...... 136

9 Related Work 138 9.1 Wafer-StateServer ...... 139 9.2 Amigos,Fedos...... 140 9.3 Minimos-NT ...... 140 9.4 STAP,SCAP ...... 140 9.5 OtherRelatedWorks ...... 141

10 Generic Application Design 144 10.1 GenericLibrariesforTCAD ...... 144 10.2 Electrostatic Simulation ...... 145 10.2.1 Transformation into the GSSE: Finite Volume ...... 146 10.2.2 Transformation into the GSSE: Finite Elements ...... 147 10.2.3 Results ...... 147 10.3WaveEquation ...... 149 10.4 DeviceSimulation...... 150 10.4.1 TheEquations ...... 150 10.4.2 Transformation into the GSSE ...... 151 10.4.3 Results ...... 152 10.5 Biological Simulation ...... 158 10.5.1 TheEquations ...... 158 10.5.2 Transformation into the GSSE ...... 158 10.5.3 Results ...... 160 10.6 DiffusionSimulation ...... 161 10.6.1 TheEquations ...... 161 10.6.2 Transformation into the GSSE ...... 162 10.6.3 Results ...... 163

11 Performance Analysis and Optimization 164 11.1 Topological Traversal ...... 166 11.2 Equation Specification ...... 168 11.3 Interconnect Performance Analysis ...... 170

12 Own Publications 173

Curriculum Vitae 185

vi List of Tables

6.1 Classification scheme based on the dimension of cells...... 91 6.2 Classification scheme based on the dimension of cells and the cell topology...... 92 6.3 Classification scheme based on the dimension of cells, the cell topology, and the complex topology...... 92 6.4 Languagecomparison...... 103

vii List of Figures

2.1 Overview of the relations of the introduced mathematical concepts related to the separation of a physical field domain into a base space and fiber space related to the concept of fiber bundles. The concept of algebraic topology uses chain and cochain complexes to obtain vector space structures within the tangential and cotangential space. Concepts from order theory are used to describe the internal structure of the cells of the cell complex to allow a generic data structures specification as well as generic traversal mechanisms. Generic data access mechanisms are based on the concepts discrete field representations by cochains with their corresponding geometrical orientation and dimension...... 14 2.2 A relation R of a set A and a set ...... 15 2.3 The image of a under ...... 15 2.4 The kernel of f...... 17 2.5 Representation of a 1-cell complex with cells (edges, C) and vertices (V)...... 20 2.6 Left: Integers form a chain, totally ordered by ≤. Middle: Incomparable items forming an anti-chain. Right: The power-set of {a, b, c} ordered by ≤...... 21 2.7 Cell topology of a simplex cell in two dimensions...... 22 2.8 Cell topology of a cuboid cell in two dimensions...... 23 2.9 Cell topology of a 3-simplex cell...... 23 2.10 Complex topology of a simplex cell complex...... 23 2.11 Complex topology of a cuboid cell complex...... 24

2.12 Left: a fiber bundle with the homeomorphism φ. Right: A homeomorphism into Ub × F , which does not preserve the projection, thus not revealing a fiber bundle [1]...... 25 2.13 Zero section of a vector bundle ...... 26 2.14 A hierarchy of concepts with partial specialization. The most general form is represented by a sheaf concept. The concept of fiber bundles is obtained by using constant fibers. If the fiber space respects linear vector space properties, then the concept of a vector space is derived. Finally, by confining the dimension of the base and fiber space, a tangential bundleisreceived...... 28 2.15 Representation of a 1-chain with the respective boundary (left) as well as a a 2-chain with the respective boundary (right)...... 32 2.16 A three-dimensional chain complex...... 33 2.17 Overview of terms.A,B,C are cycles. C is a boundary, but not A, B...... 33 2.18 Coboundary: K1 −→δ δK1 −→δ δδK1 = 0,[2]...... 35 2.19 A graphical representation of homology and cohomology...... 38

viii 2.20 Boundary operator applied onto a simplex poset...... 41 2.21 Coboundary operator applied onto a simplex poset...... 42 2.22 Top: internal orientation. Bottom: external orientation for geometric objects in three dimensions...... 46

3.1 Discretization concept for the primary and secondary cell complex with consistent orien- tation...... 51 3.2 Time stepping for the discretization concepts...... 52 3.3 Time stepping for the discretization concepts...... 52 3.4 Finite volume requirements for a primary and secondary spatial discretization...... 55 3.5 Primary control volumes used in the finite volume method. Left: cell centered. Right: vertexcentered...... 56 3.6 Finite volume and the rendered cell average value within each cell for a one-dimensional cellcomplex...... 56 3.7 Finite elements requirements for the mesh...... 59

3.8 Geometrical interpretation of the basis of the expanded solution variable ch for finite ele- ments for a one-dimensional cell complex...... 61 3.9 Finite difference requirements for the mesh...... 65 3.10 Different geometric interpretation of the first-order finite difference approximation related to forward, backward, and central difference approximation...... 66 3.11 Approximation of two-dimensional mixed derivatives...... 66 3.12 Boundary treatment for the finite difference scheme. The necessary grid points are not available for all different types of approximation schemes...... 67 3.13 Geometrical interpretation of the ansatz functions for the (left) finite volume method and (right) for the finite element method. The FV method describes consistent crisp cells which can be interpreted as a cell complex, whereas the FE method uses spread cells, which do not conform with the properties of a consistent cell complex...... 72

5.1 Tetrahedron span by the four abstraction levels of a data model and their interfaces, con- stituting a certain application scenario all together...... 78 5.2 Fiber bundle structure [1]...... 79 5.3 Fiber data model for a two-dimensional simplex mesh...... 81

6.1 Complex topology of a single linked list...... 89 6.2 Complex topology of a double linked list...... 89 6.3 Complex topology of a tree...... 90 6.4 Complex topology of an array...... 90 6.5 Complex topology of a graph...... 90 6.6 Local cell complex (left) and a cell representation (right). Vertices are marked with black circles...... 91 6.7 Global cell complex (left) and a cell representation (right)...... 91

ix 6.8 Representation of a 1-cell complex with cells (edges, C) and vertices (V)...... 91 6.9 A fiber bundle over a 0-cell complex...... 94 6.10 A fiber bundle over a 2-cell complex...... 95 6.11 Illustration of an index space depth of zero (left) and one (right). The base space is mod- eledbya0-cellcomplex...... 95 i 6.12 Connection matrix Cj for a 3-simplex cell complex...... 97 6.13 Application design based on the object-oriented programming paradigm. Each of the blue blocks has to be implemented, resulting in an implementation effort of ( · ), where n stands for the number of algorithms and m for the number of data structures ...... 102 6.14 Application design based on the generic programming paradigm. Only the blue parts have to be implemented, which reduced the implementation effort to O(n + m)...... 102 6.15 Different equations types within a domain: boundary area (red), inner domain (blue). . . . 106 6.16 Different equations types within a domain: Dirichlet (red) and Neumann (blue) boundary conditions as well as (green) initial conditions...... 107 6.17 Visualization of different equation areas: domain inside (blue), boundary parts (red). . . . 107 6.18 Potential of a pn diode during different stages of the newton iteration. From initial (left) to the final result(right)...... 108 6.19 Element-wise assembly. All cells are traversed and sub-matrices are calculated. The sub- matrices are inserted into the system matrix...... 109 6.20 Line-wise assembly. All vertices are traversed and linear equations are assembled. Linear equations are inserted into the system matrix...... 109 6.21 Linear equation. The linear equation data structure stores the value as well as the depen- dence on the unknown variables ∆xi...... 110

8.1 Build blocks and overview of the GSSE...... 114 8.2 Traversal methods induced by the incidence relation. horizontal: traversal schemes of the same base element. vertical: traversal scheme of the same traversal elements...... 117 8.3 ConceptualviewoftheGTL...... 117 8.4 Representation of a 0-cell complex with a topological structure equivalent to a standard container...... 120 8.5 Incidence relation and traversal operation...... 122 8.6 Poset of a 4-simplex, pentachoron...... 125 8.7 Local cell complex (left) and a cell representation (right). Vertices are marked with black circles...... 125 8.8 Global cell complex (left) and a cell representation (right)...... 125 8.9 Rendering of a four-dimensional simplex...... 127 8.10 Rendering of a four-dimensional cube...... 128 8.11 Concept refinement for the generalized iterator within the topological framework...... 128

10.1 Left: the given interconnect structure modeled by the GSSE mesh engine (ref). Right: the current density depicted on the surface of the interconnect lien...... 148

10.2 Left: the current density depicted by a vector field visualization. Right: ?? ...... 148 10.3 Left: a given structure suitable for capacitance measurement, modeled by the Layout con- version tool (ref). Right: the iso-surface potential distribution within the oxide...... 148 10.4 Wave equation with a harmonic oscillating source in the x- plane at two different time steps...... 149 10.5 Domain for a two-dimensional PN-junction diode ...... 152 10.6 Netto doping concentration C of the two-dimensional PN diode. Donors are given in red, acceptorsareblue...... 153 10.7 Two-dimensional diode with potential distribution for (left) equilibrium and (right) reverse mode...... 153 10.8 Two-dimensional diode with potential distribution for forward mode...... 154 10.9 Two-dimensional diode with space charge distribution for (left) equilibrium and (right) reversemode...... 154 10.10Two-dimensional diode with space charge distribution for (left) forward and (right) heavy forwardmode...... 154 10.11Bands in forward (left) and reverse (right) operation...... 155 10.12Domain for a two-dimensional PNP-transistor...... 156 10.13Two-dimensional PNP-transistor with potential distribution for forward mode...... 156 10.14Two-dimensional PNP-transistor with potential distribution for forward mode...... 156 10.15Two-dimensional PNP-transistor with space charge distribution for (left) equilibrium and (right)reversemode...... 157 10.16Two-dimensional PNP-transistor with space charge distribution for (left) forward and (right) heavy forward mode...... 157 10.17Bands in forward (left) and reverse (right) operation...... 157 10.18A triangulated model of a electrically active fish. The electrically active organ is marked inred...... 158 10.19Different geometries and meshes ...... 159 10.20SimulationSetting ...... 159 10.21Figure showing the results of a simulation done with veryfinemesh...... 160 10.22Two-dimensional diffusion simulation for structured and unstructured meshes with the forward Euler time stepping...... 163

11.1 Topological traversal of vertices...... 165 11.2 Topological traversal of cells and incident vertices ofacell...... 166 11.3 0-cell traversal on the following computer systems (from left to right): P4, AMD64, G5, andIBM...... 167 11.4 Incidence traversal for the BGL and our approach on the following computer systems (from left to right): P4, AMD64, G5, and IBM...... 168 11.5 Performance of the evaluated expression on the following computer systems (from left to right):P4,AMD64,G5,andIBM...... 169

xi 11.6 Blitz++ benchmark with different compiler versions: GCC-4.0.3, GCC-4.1.0, GCC-4.1.1, GCC-4.2.0...... 170 11.7 Different compiler versions on the P4: GCC-4.0.3, GCC-4.1.0, GCC-4.2.0, Intel 9.0 . . . 170 11.8 Temperature distribution due to self-heating in a tapered interconnect line with cylindrical vias...... 171 11.9 Comparison of the finite element assembly times...... 172

1 0.1 Summary of Notation

Due to the wide variety of material in this work drawn from different fields in scientific computing, some conflicts or ambiguitites in the notation cannot be avoided. The following table summarized the used notations: ha, bi scalar product η of a, b (a, b) inner product of a, b

A set F field, .. R V vector space (X, ) topological space fh homeomorphism V∗∗ dual vector space M manifold K cell complex R relation f function differential operator TP tangent space ∗ TP cotangent space O (globally) ordered set P partially ordered set im image of map kerf kernel of map f (B × F,B,pr1) trivial fiber bundle (F,B,pr1) fiber bundle T (M) tangent bundle

i Cj connection matrix Mp,p−1 n-incidence matrix i τp i-th p-cell cp p-chain cp p-cochain C∗ chain complex C∗ cochain complex ∂ boundary operator, partial derivative δ coboundary operator ∂˜ boundary operator on dual mesh coboundary operator(Kotigua), exterior derivative Zp(K; R) p-cycle group of K with coefficients in module R Zp(K; R) p-cocycle group of K with coefficients in module R h, i !to be defined!

2 ε dielectric permittivity µ magnetic permibility ̺ volume electrical charge density σ electrical conductivity ϕ electric scalar potential Ω domain of Rn

E electric intensity vector/electric field intensity B magnetic flux density vector H magnetic intensity vector/magnetic field intensity D electric flux density vector/elecrical displacement field A magnetic vector potential J current flux density vector

Φ2 cochain of magnetic flux U1 cochain of electric voltage Ψ˜ 2 dual cochain of electric flux ˜ 3 dual cochain of electric charge

3 Chapter 1

Introduction

This work deals with concepts for scientific computing, especially concepts for generic and high perfor- mance application design. The institute of microelectronics has a long history from the roots of computer simulations in the field of semiconductor research and is therefore able to track the various changes in the field of computational science. Scientific computing, as a common term for different tasks in computa- tional science, is concerned with concepts of closing the gap between the theoretical science branch and the experimental one. One of the most important issue in scientific computing is the exponential advancement of the brute force number crunching possibilities of modern computer systems, predicted by Moore’ empirical law of doubling the computational power each 18 months. But in almost the same manner, hidden in the shadow of the number changing, various other paradigm shifts have had happened during the last three decades. One of them can be attributed to the rigorous shift of the interaction between the engineers and the computer systems. In the early days of computer interaction, a bunch of cards have to be used to access the resources available in huge computer systems. With the advent of terminals and the even more drastic paradigm change of multi-terminals on one screen, the user interaction has changed in a way that the currently used computer users cannot image the old fashioned style of computer interaction in the early days. It has to be highlighted that these early days are not more far away then almost three decades. Nowadays we can interact in almost all locations and parts of the world interactively with our computer system. The work stations offer complete virtual realities with real-time calculation and multi-dimensional visualization capabilities. But what can still be observed is the fact that computer implementations are implemented over and over again in almost the same way. Sometimes with minor adaptations or with small adjustments. This trend is not only noticeable in the area of computer languages with the development of a great number of different languages over and over again [3–7] This work is therefore focused on the mechanisms and concepts for the transformation processes and identifications of interactions between between theoretical scientific computing concepts and their effec- tive and efficient practical application from an engineering perspective. Concepts are summarized, newly introduced or unified and so can help to formalize the engineer’s intuition so that computers can be inte- grated more effectively into the scientific process. Computers should become the assistance as far as it can be. The main focus of this work is, based on the great and long experience at the institute, on Technology Computer-Aided Design (TCAD). The process of gaining insights in processes and mechanism in the area of semiconductor manufacturing and advancement, But the here introduced and presented concepts are not restricted to the area of TCAD. They can easily be, if not already specified in a more general context, adapted to other areas. Therefore most of this works deal with the generic part of scientific computing

4 with a strong emphasis on TCAD. Scientific computing here is understand as the collection of concepts which deals with the assembly of large equation systems by utilizing discretized partial differential equations from different fields of physics. Different types of partial differential equations and their discretization schemes such as the finite element method or the finite volume method have to be considered for the diverse types of problems. Types of topological cell complexes, different dimensions, and solving strategies have to be considered during application development.

1.1 Scientific Computing

Scientific computing has traditionally been concerned with numerical issues such as the convergence of discrete approximations to partial differential equations, the stability of integration methods for time- dependent systems, and the computational efficiency of software implementations of these numerical methods. Therefore the very heart of scientific computing can be expressed with the field of algorithms. This field has gained an enormous progress since the first day of computers. For the implementation of these algo- rithms the field of computer science (computational science) uses the appropriate techniques to transform these theoretical concepts into implementations with given performance constraints. The reuse of non-trivial data structures and algorithms suitable for scientific computing is by no means a solved problem. Currently, the development of software for numerical simulation of disparate physical phenomena such as electromagnetic wave propagation, heat transfer, mechanical deformation, fluid flow, and quantum effects is not straight-forward and even today complex. The field is getting more complex if couplings of various phenomena and different discretization schemes such as finite differences, finite elements and, finite volumes are used to obtain a solution. Each of these schemes has its merits and shortcomings and are therefore more or less suited for different classes of equations. All of these methods have in common that they require a proper tessellation and adaption of the simulation domain In the area of scientific computing there is a manifold of software applications and tools available which provide methods and libraries for the solution of very specific problem classes. But they are mostly specialized on a certain type of underlying mathematical model resulting in a solution process which is highly predictable. Only in the recent past environments for various problems at hand were developed and published, all with their advantages and disadvantages. Section 9 overviews different recent and traditional environments with the focus of scientific computing. However, it is important to note that applications not developed with interoperability impose restrictions on possible solution methods, which can not be foreseen at the beginning of the program development. This work therefore investigates, overviews and introduces concepts based not only on mathematical side, but also investigates different programming paradigms, re-factorizes design patterns and commonly used idioms. A possible implementation of all the given concepts is given with the generic scientific simu- lation environment with formal defined interoperable interfaces as well as an overall high performance comparable to all manually tuned applications and even Fortran applications. The theoretical chapter introduced different areas of mathematical concepts which provides: - a proven foundation - a roadmap for application design - greater flexibility and longevity of resulting software Mathematical abstraction can also unify similar concepts from different fields. By separating shape ma- nipulation from application specific operations, an improvement in reliability of geometric computing in

5 many domains can be obtained. An example, which experienced an ongoing attention is computational topology. Topology separates global space properties from local geometric attributes. Additionally, and more importantly for this work, it provides a precise notation and language for discussing and handling various properties. Such a notation and language is essential for composing software applications, e.g. connecting a mesh generator to a TCAD application. Software for scientific computing is normally written by domain experts. In the current cases, the domain expert, software designer, programmer, tester and end-user is one person which complicates and slows down the complete development process. Currently the scientific community, and even more the TCAD community, relies on either domain-specific closed applications, or components tied very closely to specific representations which were often built for very specific purposes and then tossed into a huge and monolithic toolkit. To this date, data structures and algorithms are implemented in a heavily application and discretization scheme specific way, making their reuse practically impossible. A little history of scientific software development: - 1960 - 1980: domain specific software development - complete and comprehensive tool sets were built around each code base - data exchange between tools and code was simply an I/O problem - 10980 - 2000: two new concepts have emerged: - data structure libraries: read and write arrays and structured objects (CDF, HDF, PDB) - scientific data base libraries: read and write scientific “objects”: (exodus 1982, silo 1988, gmv-format 1993) - 2000+: math oriented scientific data base systems - SAF: sets and field modeling system (vector bundle data model) - libsheaf/libfield: sheaf data model Of central importance has been ultimate use of these techniques in the solution of complex problems in science and engineering, e.g. technology computer aided design. However, as these applications have become more complex, the local convergence properties of numerical methods have not proven to be suf- ficient to ensure either correctness or robustness. There are a number of research areas where topological and differential methods could be integrated with existing numerical techniques in scientific computing to help resolve these difficulties. Different tasks in the regime of scientific computing require quantities originally associated with different types of storage location, e.g. vertex, edge. The projections of fluxes on edges also need to be assembled to vector-valued quantities associated with, e.g. an edge. The remaining question is still unanswered, why a super-application [8] has not emerged. This work introduces different generic data models to provide effective representations across a very wide range of application areas. Finally, a set of scientific computing components that operate on any data represented in the model are presented. Not only are generic models introduced, but also the corresponding paradigms are presented which inherently model the necessary requirements for an orthogonal extension and enhancement. The final application section presents generic algorithms and implementations in a general purpose way. Customized algorithms can be implemented easily where they prove most valuable with the corresponding programming paradigm orthogonally. A main part of this work is the introduction, overview, and mostly the transformation of theoretical con- cepts to applicable concepts for generic application design. The main purpose of the given theoretical and practical concepts is to reduce the problems, e.g. in computational electromagnetics, to a common theoretical and practical form.

6 Most importantly it has to be stated, that scientific visualization is a major part for this work, although it is not explicitly introduced or explained because of the fact, that the area of scientific visualization is an area of its own. But, some concepts were used in parallel by this work which have been introduced by the usage of fiber bundles for general relativistic visualization [1]. Many concepts can be directly transfered to the area of generic application design, but final differences remain. Sophisticated and real-time scientific visualization is an important step to develop comprehensive scientific applications. The development of applications and analysis of the results was greatly supported by the visualization tools [1, 9, 10]. This work uses some concept but in a different context, namely generic data structures and the generic and topological specification of data structures. Application design in the field of scientific visualiza- tion requires quite different concepts. Numerical application design on the other side requires the here introduced concepts.

1.2 Technology Computer Aided Design

The great diversity of physical phenomena present in semiconductor devices themselves and in the pro- cesses involved in their manufacture make the field of TCAD extremely challenging. Each of the phe- nomena can be described by differential equations of varying complexity. The development of several different discretization schemes has been necessary in order to best model the underlying physics and to accommodate the mathematical peculiarities of each of these equations while transferring them to the discrete world of digital computing. TCAD is the application and development of computational methods and software tools for the analysis and design of integrated semiconductor devices and their fabrication processes and deals with the assembly of large equation systems by utilizing discretized partial differential equations from different fields of physics. All types of PDEs (parabolic, elliptic, hyperbolic) have to be considered for the various types of problems such as process and device simulation [11–13]. Different topological cell complex types, different dimensions of topological elements, linear and nonlinear solvers with their associated numerical issues have to be considered during application development and demand great care, when writing source code to ensure high software quality while also addressing performance issues. With the steady evolution of software tools and languages, the shift and change of programming paradigms can be clearly observed. During its first years our institute only used one and two-dimensional data struc- tures due to the limitations of computer resources. The imperative programming paradigm was sufficient for this type of task [14]. With the improvement of computer hardware and the arising object-oriented programming paradigm, the shift to more complex data structures was possible. The development of complex applications for process [12] and device [15] simulation are examples of this evolutionary path. More complexity is added when modeling requires a change of the underlying topological data struc- ture, usually from structured grids to unstructured meshes, the use of alternative linear or nonlinear solver mechanisms [16, 17], different types of quantities, e.g. vectorial or tensorial quantities, or the change of the discretization scheme such as the switch from finite elements to finite volumes, or even the underlying mathematical problem, usually posed in partial differential equations. Not only does the application devel- opment in TCAD require the utilization of up-to-date data structures and algorithms, but it also demands an overall high performance. Technology computer aided design (TCAD) is a crucial tool for achieving the set goals as simulations help to greatly reduce the costs in development of new devices as well as manufacturing equipment. TCAD is a highly specialized branch of scientific computing, which is one of the most important and successful genres of computing. Due to the exceeding complexity, sound software engineering is needed even more than in other genres.

7 1.3 Motivation

Different approaches focus on topics like expression templates, high performance matrix manipulation, geometrical algorithms, and finite element discretization. The nature of dealing with partial differential equations (PDEs) with the inherent coupling of topological traversion and functional description compli- cates the usage of these libraries. During the last decades our institute has developed different applications to tackle these issues like SAP, Minimos-NT. Therefore this works do not develop another general purpose kernel but instead provides and develops generic reusable components for scientific computing. The two basic mechanism therefore are on the one side, a generic data structure specification language and on the other side, a functional equation specification language. Based on these building blocks, the complete range of application design, from small generic components up to complete applications is therefore possible. Scientific computing is an algorithm-dominated field [18], as are computational geometry, computational graph theory, computational electromagnetics, and the area of TCAD with the additional constraint of a high flexibility for the modeling of not only the physical models but also for the manufacturing processes. While computer performance is steadily increasing the additional complexity of these simulation models easily outgrows this gain in computational power. It is therefore of utmost importance to employ the latest techniques of software development to obtain high performance and thereby ensure adequate simulation times even for complex problems. As a consequence the development of high performance simulation software is quite challenging. The shift to library centric application design has shown significant advances since the advent of the C++ STL or the Java library collection. The effort of application development is thereby greatly reduced. Espe- cially the formal and abstract interface specification of the generic programming approach, implemented by the means of parametric polymorphism in C++ offers a formal way of specification to the user which the compiler can verify. Therewith re-usability and orthogonality of the libraries is accomplished. Up to now there is no comprehensive approach in the field of Technology Computer Aided Design (TCAD) due to the complex requirements arising from various discretization methods for partial differential equa- tions, assembly of large equation systems, finite numerical treatment, and the requirement of high overall performance. Our investigation and research has shown, that libraries can be extracted and implemented, which greatly ease the design of applications as a whole. This work presents the paradigms, our libraries, and different applications based on these libraries. The motivation for developing a topological framework is derived from the need for flexibility in high performance applications in the field of scientific computing, especially in Technology Computer-Aided Design (TCAD). With a growing number of different simulation tools, see Section 9, and various require- ments on the underlying data structures the question arises which part of an application can be re-used, if it is properly implemented. The main aim of our work is to provide a library that offers the functionality common to many different kinds of simulation tools in the field of scientific computing. This is especially true for functions which are evaluated on a topological cell complex. The overview given in Section 9 that so far no approach can successfully deal with the mathematical formulation of physical phenomena within a programming language efficiently. As a consequence we have extracted the most promising concepts from all of these approaches. The result of this analysis is a generic scientific simulation environment (GSSE). Concepts from the Boost MPL and Boost Phoenix libraries have been used to optimize the run-time performance of GSSE. Not only are the theoretical concepts introduced, but also the practical and computational concepts to develop and build a comprehensive environment with surpase performance. The here given parts are not only considered to be an introduction and overview of the developed envi- ronment but also as a way of how to deal with problems arising in TCAD or in more general, scientific

8 computing. The way of programming an application and then patch it up to the requirements of examples is getting more and more a handicap. The required models, physical processes, and simulation method- ologies expand in an almost exponential manner. And as the former decades have shown, the software engineering community was the most time great steps behind. Therefore the focus of how to implement an application can be shifted to identifying the most important parts, using already existing modules, and implementing the remaining parts in a library centric application design way.

1.4 Contributions

Many different works have greatly influenced my own work. The very start of my research was focused in the field of spatial discretization, mesh generation. The further way lead me to error estimation for partial differential equation and mesh adaptation. During this research I found that most of the currently work is focused in several specialized areas, see Section 9 with few consideration for interoperability, in software libraries as well as mathematical frame- works. With the review of the Boost graph library and the attached concepts within a programming language and more of the modular building principle of the Boost libraries. The mechanisms which have lead to the rapidly growing C++ library collection of the Boost library were used to developed several concepts for scientific computing. Also, different burdens of several different software projects with modern programming techniques were removed by the here proposed approach. Therefore library development can be seen as an investion into the future. My own contributions can be briefly summarized within three different areas, as given in the next: - Common theory for data structures and traversal - Identification of the cursor and property map concept with the fiber bundle/sheaf theory. - Introduction of the index space depth for distance of base space to fiber space and the corresponding backward compability. - Theoretical specification for data structures - Separation of cell topology from the complex topology - Theory and algebraic properties of data structures - Domain specific language for mathematical modeling Each theoretical concept part is finished with the presentation of a contribution of this work.

9 Part I

Theoretical Concepts

10 Chapter 2

Mathematical Concepts

2.1 Introduction

One of the most fundamental concepts in scientific computing is the transformation of a physical problem into a mathematical one. This step enables the scientific computing part to take place. The reason for the following sections and definitions is to extract the important concepts for scientific computing. The subsequent sections then uses these sections to transform the still theoretical concepts into concepts for applicable application design concepts. The result, as presented in Section III, is a minimal, concise, and yet high performant means for application design in an orthogonal and interoperable way. The main motivation for the following introduction of theoretical concepts is, that a the broad area of scientific computing requires the reduction of concepts to a minimal set of building blocks. In the spirit of the generic STL and the successor Boost for general purpose programming tasks, is a minimal but concise set of building blocks specified on which all other parts can be built. The main focus hereby is the numerical and discrete treatment of the transformation of physical field problems into mathematical problems and then into the finite and discrete area of computational feasible algorithms and data structures. Based on this task, two important parts can be separated. On the one side, the space or in a more gen- eral way the space-time have to be discretized into simple computable blocks, e.g. into a collection of consistent cells. The second part is the utilization of physical quantities with respect to their inherent structure such as the corresponding dimensionality and orientation. Therefore the two basic parts are introduced separately. First the domain of the theory of cell complexes and the topological structure of them is introduced. Special concepts from topology and fibrations are used to ease the transition between the continuous space and the finite regime of computer simulation. Secondly, parts from algebraic topology, namely cohomology (or more particularly cochains) are intro- duced to handle the different types of physical quantities with algebraic structures and therewith made available the computer resources for simulation. Based on these two concepts, different types of discretization are introduced whereas the already intro- duced concepts of cell complex, algebraic topology, fibrations are used to formulate the necessary and minimal building blocks for the subsequent sections for the practical application of the still theoretical concepts. Special focus is given on the efficient translation of theoretical concepts to applicable methods and the consistent evaluation and transformation of physical problems into computational feasible prob- lems. Topological concepts are introduced to build formal languages for data structures and algorithms.

11 The introduction and usage of terms follows mainly the works of [19], and [20] for the chain and cochain approach, [21] for the algebraic topology part The topological toolkit is based on the works of [22] for common topological terms, [18] for the cell complexes, and [23] for the computational topology part. Fiber bundles and the application to data models is based on the work of [24], the usage of CW-cell complexes for the base space [1]. Topology, chains, and the fiber model, which are introduced within this chapter, are axiomatic studies. This results in the fact that a large number of definitions are required. This thesis uses only the most basic definitions, all equipped with examples and links to the applications in further section. Based on the topological introduction which describe the discrete nature of the space and the projection onto discrete representations within a computer, the field of numerical analysis and the correlated numerical methods to project the still continuous specification and equations is manifold. In every physical theory two types of variables can be observed [19]: configuration variables that describe the configuration of the system or of the field and source variables that describe the sources of the phe- nomenon. Next to these two basic quantities some intermediate variables can also be found. The concept of intermediate variables is based on three kinds of types of equations which compose the fundamental equation: - definition equations and balance equations - constitutive equations The first part of this work is based on the gradually lifting of several concepts into an abstract form for scientific computing. The chapters of functional discretization are based on the works of [19], [20] the subsequent developments leaded by [25] and Bochev [2]. Algebraic topology [21] and differential forms are used to cover the area of discretization schemes to numerically deal with the continuous problems arising in scientific computing. Most parts of this section are introduced by the Maxwell equations, but are not constrained to this topic. Furthermore the introduced formal handling of field theories and the us- age of algebraic topology for identifying the underlying space answer the nature of different discretization methods, such as finite elements (FE) and finite volumes (FV). The apparent difficulties with finite dif- ferences can also be explained and can underline some optimization opportunities related to the scientific computing and implementation area. Concepts from algebraic topology are introduced to formalize the generic data structures and to introduce a link to the differential forms. Finally, algebraic concepts are introduced to guarantee the link of this theoretical section to the next part, the practical concepts. Another direction was investigated by [24] to generalize the data structures and data models. This direction enables the separation of data structural components from the data storage mechanisms. A further step towards a generic data structure model was investigated by [1] where also the identification of the base space with a CW-cell complex of the fiber bundle was introduced. The application to common data structures and derivation of well known problems with the generic programming iterator concept is a partially contribution of this work. A formal separation of traversal and data access by the means of the mathematical concept of fiber bundles is then presented to achieve a generic data structure. Thereby a clean separation of the combinatorial properties of data structures and the mechanisms of data storage and access is achieved. With the introduction of an additional concept (index space depth) we show that full backward compability to all existing data structures is achieved. The contribution of this section is the preparation of basic formal mechanisms for the subsequent intro- duced and used methodology of discretization schemes. Structural information and guidelines for the applied numerical analysis can therewith be obtained as well as the base for different types of error esti- mation. The result of this chapter is a base for a formal language for the following chapters. This language and concepts build a framework which introduces intuitively the common concepts hidden under the sur- face of a variety of physical and mathematical modeling concepts, normally represented by equations of different types.

12 For the subsequent parts of this work the concept of a CW-cell complex is required. We use the mecha- nism of a projection of arbitrary data structures onto finite CW-cell complexes. Two different classification mechanism are introduced to characterize a CW-cell complex: the complex topology and the cell topol- ogy. With this separation great means of orthogonality can be accomplished. These introduced concepts are used then to formalize the interfaces to various data structures and enable on the one hand high in- teroperability of different approaches and on the other hand generic traversal mechanisms. At the end, a final unique characterization for several data structures and dimensions is derived. This formal topological identification enables new possibilities for optimization of developed algorithms, as presented in Section 11. The next part for the practical chapter arise in the search from Stepanov [26] in implementing generic algorithms. Several programming languages were used but no satisfactory language were found. The advent of C++ and the template mechanisms and the related generic programming paradigm arised and was finalized with the C++ STL [27], the first generic template library where the data structures were separated from the algorithms. The further quest for a mathematical notation was greatly advanced by the work of [18] where algorithms were specified independently of the underlying mesh. The theoretical contribution of this work was very powerful but the implementation into usable software components is still missing. This work tries to close this gap with the advancement of the developed concepts of the GrAL library and further development of the started concepts to a generic topological data structure specification. In other words, the spatial discretization can be described generically. Figure 2.1 depicts an overview of the terms used and their relation to each other.

13 iue2.1: Figure scalarproduct

Chaincomplex < . , . > Cochaincomplex vriwo h eain fteitoue ahmtclco mathematical introduced the of relations the of Overview

scalarproduct

Homology < . , . > Cohomology

Data TangentBundle CoTangentBundle

scalarproduct Accessors T CT < . , . > Combinatorial Properties TangentialSpace CoTangentialSpace FiberSpace Poset Hausdorff Space 14 charts Traversal

CW Complex Tensor Fields Positions linear algebra cpsrltdt h eaaino physical a of separation the to related ncepts

BaseSpace Field Values

fibrations

Fiber Space BxF F Product

(xu) charts

Manifold

{{xy}} atlas

Topological Space

topology neighborhood

Point Set 2.2 Common Terms and Linear Algebra

First, some generic terms are introduced to obtain a consistent subsequent concept introduction. The readers’ familiarity with sets and [22] is required.

Definition 1 (Relation) A relation R between sets A and B is a collection of ordered pairs (a, b) such that a ∈ A and b ∈ B. If (a, b) ∈ R, the relationship is denoted by a b.

A B

a R b

Figure 2.2: A relation R of a set A and a set B.

Definition 2 (Function) A function or mapping f is a rule that assigns to each element a ∈ A exactly one element b ∈ B. So f maps a into b. This is denoted by f(a)= b.

A permutation of a finite set is usually given by its action on the elements of the set. An example is given by the basic set {1, 2, 3, 4, 5} with the transformation into {5, 4, 3, 2, 1} where the notation states that the permutation maps 1 → 5, 2 → 4 and so on. Therewith all permutations of a finite set can be placed into two sets.

Definition 3 (Parity) A permutation of a finite set can be expressed as either an even or an odd number of transpositions, but not both. In the former case, the permutation is even, in the latter it is odd.

Definition 4 (Image) Let A and B be sets, f be the function f : A → B, and a ∈ A. Then the image of a under f, denoted by im(f), is the unique member b ∈ B that f associates with a. f(A)= {b = f(a) a ∈ A} (2.1)

A B

a imf b

domain codomain

Figure 2.3: The image of a under f.

Definition 5 (Composite Function) If f and g are functions with f : A → B and g : B → C, then there is a natural function mapping A into C, the composite function, consisting of f followed by g. Then it can be written g(f(a)) = c the composition function is denoted by g ◦ f.

Definition 6 (Preimage) The preimage (fiber,inverse image) of a set B ⊆ Y under f is the subset of X defined by f −1(B)= {x ∈ X|f(x) ∈ B} (2.2)

15 The preimage of a singleton, f −1(y), is a fiber or a level set. A fiber can also be the empty set ∅. The union set of all fibers of a function is called the total space, E.

Definition 7 (Group) A group (G, ∗) is a set G with a binary operation ∗ on G, such that the following axioms are satisfied: - ∗ is associative - ∃e ∈ G such that e ∗ x = x ∗ e for all x ∈ G. The element e is an identity element for ∗ on G. - ∀a ∈ G, ∃a′ ∈ G, such that a′ ∗ a = a ∗ a′ = e. The element a′ is an inverse of a with respect to the operation ∗.

Definition 8 (Abelian) A group G is Abelian if its binary operation ∗ is commutative.

Definition 9 (Subgroup) A subset H ⊆ G of a group (G, ∗) is a subgroup of G if H is a group and closed under the operation ∗. The subgroup consisting of the identity element of G, whereas {e} is the trivial subgroup of G. All other subgroups are nontrivial.

Definition 10 (Vector Space) A vector space V over a field F is a set associated with two binary op- erations. The elements of V are called vectors, while the elements of F are called scalars. The binary operations are addition

+ : V×V→V (2.3) which is commutative as well as associative and which possesses a neutral as well as an inverse element. The second binary operation is scalar multiplication

· : F ×V→V (2.4) which is distributive with addition. V is closed under its two binary operations.

Definition 11 (Injective) A mapping f : A→B is called injective if there is at most one value b = f(a) ∈ B for each element a ∈ D. This means that every preimage in ⌊ has an image in A but there may be elements in A which are not obtained as images.

Definition 12 (Surjective) A mapping f : A→B is called surjective if b = f(a) has at least one solution b ∈ B for every a ∈ D. Several identical images in A may have different preimages in B.

Definition 13 (Bijective) A mapping f : A→B that is both injective and surjective is called bijective. It maps every preimage from B to a distinct image in A. There are no elements in A that cannot be obtained as images.

Definition 14 (Linear Mapping) A mapping f : A→B is called linear, if the following relation holds

f(λ a + µ b)= λf(a)+ µf(b) (2.5)

A mapping

g : L×M→N (2.6) that is linear in both of its arguments, as in the following relations

ϕ(λ a + µ b, c) = λ ϕ(a, c)+ µ ϕ(b, c) (2.7) ϕ(a,λ c + µ d) = λ ϕ(a, c)+ µ ϕ(a, d) (2.8) is called bilinear.

16 Definition 15 (Homomorphism) A map f of a group G into a group G′ is a homomorphism if f(ab) = f(a)f(b), ∀a, b ∈ G. For any groups G and G′, there’s always at least on homomorphism f : G → G′, namely the trivial homomorphism defined by f(g)= e′, ∀g ∈ G, where e′ is the identity in G′.

Definition 16 (Kernel) Let f : G → G′ be a homomorphism. The subgroup f −1({e′}) ⊆ G, consisting of all elements of G mapped by f into the identity e′ of G′, is the kernel of f, denoted by kerf.

G G’

f kerf e’

Figure 2.4: The kernel of f.

2.3 Topological Toolkit

Beside the introduced common terms based on sets and relations, this section focuses on the topic of inherent structure of spaces, topology. This part of a topological toolkit is necessary to handle the most basic parts of physical modeling, a topological spaces. To model space and time within the regime of a finite computer representation, the continuous domain has to be projected on finite domains or cells. This chapter describes how these cells can be introduced and handled. Topology separates global space properties from local geometric attributes. Additionally, and more im- portantly for this work, it provides a precise notation and language for discussing and handling various properties. Topological spaces offer several operations for sets and subsets. All sets which contain only one element (singleton) can be used later to describe a vertex. The formal definition of a topological space is given next:

Definition 17 (Topological spaces) A topological space (X, T ) consists of a set X and a family T of subsets of X such that: - (T1) ∅∈T and X ∈T . - (T2) a finite intersection of members of T is in T . - (T3) an arbitrary union of members of T is in T .

The second property (T2) states that arbitrary intersections of subsets have to be in the topological space again. Also the union (T3) of subsets is contained in the space. These properties are later used to describe inter-dimensional elements for the complex such as edges of a cell. An example is given in the following which presents a basic set X = {a, b, c} and the corresponding topology. This pair of the original set and the set of subsets generates the topological space. Vertices are modeled by the singletons {a}, {b}, and {c}. (X, T )= {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}

17 This topological space is used in several examples through this work. To handle all subsets in a concise way the concept of a subspace is introduced.

Definition 18 (Subspaces) Let (X, T ) be a topological space. Any subset Y of X inherits a topology in a natural way. It is given by:

TY := V ⊆ Y |V = U Y for someU ∈T (2.9) \ It is required to create a topology on a set X in which a specified family S of subsets of X are open sets.

Definition 19 (Bases and subbases) If S is already closed under finite intersections, then T can be de- fined to be those sets which are unions of sets in S. Then T satisfies (T1), (T2), and (T3) and S is said to be a basis for T .

In general, to obtain a topology containing S it has to be firstly formed B, the family of sets which are finite intersections of members of S, and then define T to be all arbitrary unions of members of B. In this case S is called a subbasis for T .

Definition 20 (Continuity) Let (X, T ) and (X′, T ′) be topological spaces and f : X → X′ a map. Then the following conditions are equivalent: - (i) f −1(U) is open in X whenever U is open in X′. - (i’) f −1(V ) is open in X whenever V is closed in X′. - (ii) f −1(U) is open in X for every U ∈ S, where S is a given basis or subbasis for T ′.

Definition 21 (Homeomorphism) A homeomorphism fh : X → Y is a 1 − 1 onto function, such that both f,f −1 are continuous.

Then, X is homeomorphic to Y , X ≈ Y X, and that X and Y have the same topological type. Homeo- morphisms are topology’s isomorphisms. With these definitions set of subsets can be handled with additional properties in a common way. The common definition for a topological space is very general and allows several topological spaces which are not useful in the field of data structures, e.g. a topological space (X, T ) with a trivial topology T = {∅, X}. Therefore the basic mechanism of separation within a topological space is introduced.

Definition 22 (Hausdorff spaces) Of a hierarchy of possible separation conditions augmenting the topo- logical space axioms,the most important to us is the Hausdorff condition. The topological space (X, T ) is said to be Hausdorff if, given x,y ∈ X with x 6= y, there exist open sets U1, U2 such that x ∈ U1,y ∈ U2) and U1 U2 = ∅.

Computer’sT data structures model the separation characteristics of a Hausdorff space. As soon as a topo- logical space is a Hausdorff space the view of data structures correlates with this topological space. A common traversal operation requires a basic property of how two subsets of a topological space can inter- act. Therefore the following concept is introduced:

Definition 23 (Adjacence) Given two sets a, b ∈ X,a binary adjacency relation is defined by Radj(a, b) with the following properties:

Radj(a, b) ⇐⇒ a ∩ b 6= ∅

Definition 24 (Incidence) As a special case of adjacency an incidence relation Rin(a, b) is defined by:

Rin(a, b) ⇐⇒ a ∩ b = a ∨ b

18 The concept of a cover is introduced to equip a topological space with a type of dimension.

Definition 25 (Cover) A cover of a set X is a set of nonempty subsets of X whose union is X.

A cover is an open cover, if it is contained in the topology.

Definition 26 (Covering Dimension) A topological space has a (Lebesgue) covering dimension n if any open cover C has a second open cover D, the refinement, where each element d ∈ D is a subset of an element in the first cover such that no point is included in more than n + 1 elements.

If a topological space is homeomorphic to Rn with n ∈ N, then its dimension is n. If a space does not have a Lebesgue covering dimension n for any n ∈ N, it is called infinite dimensional. The dimension of the empty set ∅ is defined as −1. Another fundamental topological concept is compactness which may be regarded as a substitute for finite- ness. It frequently compensates for the restriction to finite intersections in axiom (T2) by allowing arbitrary families of open sets to be reduced to finite families.

Definition 27 (Compactness) Let (X, T ) be a topological space and let U := Uii∈I ⊆T . The family U is called an open cover of Y ⊆ X if Y ⊆ i∈I . A finite subset of U whose union still contains Y is a finite subcover. It can then be said that Y is compact if every open cover of Y has a finite subcover. S The lemmas below contain basic results about compact spaces which are also Hausdorff. The first relates compactness and closedness and shows that continuous maps behave well.

Lemma 1 Let (X, T ) be a compact Hausdorff space. - A subset of Y of X is compact it is closed. - Let f : X → X′ be ca continuous map, where (X′, T ′) is any topological space. - f(X) is a compact subset of X′. - If (X′, T ′) is Hausdorff and f : X → X′ is bijective, then f is a homeomorphism.

An important part of the characterization of data structures is the possibility to identify the object with the highest dimension, a cell.

Definition 28 (Open Cell) A subset c ⊂ X of a Hausdorff space X is an open cell, if it is homeomorphic to the interior of the open n-dimensional ball Dn = {x ∈ Rn : |x| < 1}.

Homeomorphic means that two or more spaces have the same topological characteristics [18]. Collec- tions of cells form larger structures, so-called complexes which are identified by the cell with the highest dimension. With the above introduced concepts, the concept of a cell complex can be introduced which models the concept of a data structure directly. Definition: CW-Complex C, [22] A pair (T , E), with T a Hausdorff space and a decomposition E into cells is called a CW-Complex, iff the following axioms are satisfied: n - (C1) mapping function: for each p-cell c ∈ E a continuous function Φe : D → T exists, which transforms Dn homeomorphically onto a cell c and Sn−1 in the union of maximal (n − 1) dimensional cells. Dn represents an n-dimensional ball and Sn−1 represents the n − 1 cell complex. - (C2) finite hull: the closed hull(c) of each cell c ∈E connects only with a finite number of other cells. - (C3) weak topology: A ⊂T is open, iff each A ∩ hull(c) is open.

19 A CW-cell complex with the underlying topological space guarantees that all inter-dimensional objects are connected in an appropriate manner. The respective subspaces X(n) are called the n-skeletons of the cell complex. An p-cell describes the cell with the highest dimension: - zero-dimensional (0D) cell complex: vertex - one-dimensional (1D) cell complex: edge - two-dimensional (2D) cell complex: triangle For this work, the most important property of a CW-complex can be explained by the usage of different p-cells and the consistent way of attaching sub-dimensional cells to the p-cells. This fact is covered by the mapping function. From now, an abbreviation is used to specify the CW-complex with its dimension- ality, e.g., a 1-cell complex describes a one-dimensional CW-complex. An illustration of this type of cell complex is given in Figure 6.15.

For a common case, a p-cell can be described by the oriented collection of the 0-cells: τp = {v0, v1, .., vk}, vi ∈ K

Figure 2.5: Representation of a 1-cell complex with cells (edges, C) and vertices (V).

So far all mechanisms are introduced to handle the underlying topological space of data structures. Each subspace can be uniquely characterized by its dimension. All these subspaces are connected appropriately by the concept of a finite CW-cell complex.

Definition 29 (Decomposition) A cell complex K is said to be decomposed a region R if R is equal to the union of the cells of K.

Definition 30 (Faces) The faces of an p-cell c ∈ K are the (n − 1)-cells in K comprising its boundary. If f is a face of c, then c is a coface of f.

Definition 31 (Oriented cell) An oriented cell is a pair c = (u, o), where u is an (unoriented) cell, and o ∈ {1, −1}.

2.3.1 Order Theory and Sets

First, the most basic notion of a binary relation is introduced.

Definition 32 (Binary relation) A binary relation R is a subset of the Cartesian product of two sets A and B. That is, any R ⊆ A × B is a binary relation.

20 Based on this concept a partial order is defined. A partial order is required to create a hierarchy for our subsets, which can then be used to create our traversal mechanisms.

Definition 33 (Partial order) A partial order is a binary relation that satisfies the following three prop- erties: - (PO1) Reflexivity: a ≤ a, ∀a ∈ A. - (PO2) Antisymmetry: if a ≤ b and b ≤ a ∀a, b ∈ A, then a = b. - (PO3) Transitivity: if a ≤ b and b ≤ c ∀a, b, c ∈ A, then a ≤ c.

Definition 34 (Closure) The closure cl{y} of a subset y of a topological space (X, T ) is the intersection of all closed sets containing y.

Definition 35 (Partial order on a topological space) Let (X, T ) be a Hausdorff space. For any x,y ∈ X, binary relation ≤ on X: x ≤ y, iff x ∈ cl(y) is defined, where cl(y) represents the closure for a set within a topological space. Then this binary relation is a partial order.

This definition translates a topological space (X, T ) into a partially ordered set (P, ≤).

Definition 36 (Ordered Sets) Let P be a set and ⊆ be a (partial) order on P . Then P and ⊆ form a (partially) ordered set.

In other words a ordered set can be modeled with the following terms: - If the order is total, so that no two elements of P are incomparable, then the ordered set is a totally ordered set, called chain (Figure 2.6). - If no two elements are comparable unless they are equal, then the ordered set is an anti-chain (Figure 2.6). - If P has two or more incomparable elements, then the ordered set is a partially ordered set. One example of an incomparable expression is {a} with {b, c} in Figure 2.6 on the right.

Figure 2.6: Left: Integers form a chain, totally ordered by ≤. Middle: Incomparable items forming an anti-chain. Right: The power-set of {a,b,c} ordered by ≤.

Definition 37 (Extrema of Partial Ordered Sets) Let S be an ordered set. - u ∈ S is said to be maximal in S iff there is no v ∈ S such that u ≤ v. A set may have any number of maximal elements, including zero. - If u is S’s only maximal element, then u is the maximum of S. - The maximum element u (if it exists) is also called the top of S and is denoted by ⊤. - Dually, t ∈ S is said to be minimal in S iff there is no s ∈ S such that s ≤ t. A set may have any number of minimal elements, including zero. - If t is S’s only minimal element, then t is the minimum of S. - The minimum element t (if it exists) is also called the bottom of S and is denoted by ⊥.

21 Finally, the concept of covering is needed. For x,y ∈ P ordered by ≤, we say x is covered by y (written x ≺ y), if x

Definition 38 (Hasse Diagram) A Hasse diagram of an ordered set is a graph in which: - Each node corresponds to an element of the set, - Each edge corresponds to a covering relation between the nodes it connects, and - If x is covered by y (x ≺ y), then the node for x is drawn in a lower position than the node for y.

It can be seen that in interpreting diagrams, it does not matter whether one node is above or below another unless there is a monotonic path between them; and that if there is a monotonic path from y through one or more nodes down to x, there is no separate edge directly from y to x.

Definition 39 (Lattice [18]) A lattice is a bounded poset in which any two elements a, b posses a unique maximal lower bound ⊥ and a unique minimal upper bound ⊤.

A graded lattice of rank d + 1 is atomic, if each element is the ⊤ of elements of rank 1 (the atoms), it is coatomic if each element is the ⊥ of elements of maximal rank d (the coatoms).

The ⊥ of two elements c1, c2 corresponds to their geometric intersection c¯1 ∩ c¯2, the ⊤ to the element c of least dimension containing both of them c¯ ⊂ c1 ∪ c2. The ⊤ of two elements may be the whole mesh. The poset of a convex polytope is an atomic and coatomic lattice. For an arbitrary complex, the poset is in general not a lattice. If a grid’s lattice is atomic, this means that k-cells are uniquely determined by their vertex sets, if the lattice is coatomic, the vertex sets of k-cells for k ≤ d can be determined from the knowledge of the vertex sets of d-cells alone.

2.3.2 Cell Topology

The introduced order concepts can be used to formalize the internal structure of an arbitrary p-cell, e.g., with the Hasse diagram. Therewith a simplex cell is distinguishable from a cuboid cell type in several dimensions as depicted in Figure 2.7 and Figure 2.8, respectively.

Figure 2.7: Cell topology of a simplex cell in two dimensions.

Also, sub-cells such as the corresponding edges and facets and their direct relations to the cell can be identified. For the later used traversal operations, this internal structure can be easily used to switch between several levels of the ordered set, according to their order relation. An example can be given with the extraction of all edges of the simplex cell which corresponds with the middle layer of the Hasse diagram. Therewith all different combinations can be obtained only by this internal structure. As can already be seen the topological structure and the corresponding order hierarchy is invariant of the used type of cell. The p-1 layer of the cube cell represents the edges of this type of cell. The given relations

22 Figure 2.8: Cell topology of a cuboid cell in two dimensions. hold for arbitrary dimension. As a higher dimensional example a three-dimensional simplex example is given in Figure 2.9 where the p-1 cells are the facets, and in this particular example triangles. By using the

Figure 2.9: Cell topology of a 3-simplex cell. actual dimension of the p-cell and deriving the p-n cells only by the means of the order structure a flexible framework can be built.

2.3.3 Complex Topology

Compared to the cell topology a cell complex requires incidence information of the neighboring cells. There are different possibilities of storing this information efficiently (cite data structures for mesh) [20, 28]. This work develop an abstract means of storing all different types of cell based on the cell topology and orthogonally on the complex topology. For dimensions greater than one, Figure 2.10 depicts the complex topology of a 2-simplex cell complex where the bottom sets are now the cells. The rectangle in the figure marks the relevant cell number. The

Figure 2.10: Complex topology of a simplex cell complex. topology of the cell complex is only available locally because of the fact that the meta-cell set can have an arbitrary number of elements. In other words, there can be an arbitrary number of triangles in this cell complex attached to the innermost vertex.

23 Figure 2.11 presents the complex topology of a cuboid cell complex. The rectangle in the figure marks the main cell (5) under consideration. As can be seen, the number of attaching cells is constant inside the space. The topology of the cell complex can therewith be used as a globally known property.

Figure 2.11: Complex topology of a cuboid cell complex.

24 2.4 Fiber Bundles

For the handling of complex sets of scientific data on manifolds, different concepts are already available. In this work the concept of fiber bundles is introduced. The first concept which is then used by the concept of a fiber bundle is the projection map.

Definition 40 (Projection Map) The projection map pr1 is a function that maps each element of a prod- uct space to the element of the first space:

pr1 : X × Y −→ X (x,y) 7−→ x

Definition 41 (Fiber Bundle) A fiber bundle structure on a topological space E, with fiber F , consists of a projection map π : E → B such that each point of the topological space B has a neighborhood Ub −1 for which there is a homeomorphism f : π (Ub) → Ub × F making the following diagram commute. f −1 π (Ub) Ub × F

π pr1 Ub

E is called the total space E, B is called the base space and F is called the fiber space. Commutativity −1 of the diagram means that f carries each fiber Fb = π (b) homeomorphically onto the copy {b}× X of F . Thus the fibers Fb are arranged locally as in the product B × F , though not necessarily globally.

fiber of b fiber of b Ub × F Ub × F −1 −1 pr1 f (b) = b pr1 f (b) 6= b   Fiber space b Fiber space b Base space Base space Ub Ub

Figure 2.12: Left: a fiber bundle with the homeomorphism φ. Right: A homeomorphism into Ub × F , which does not preserve the projection, thus not revealing a fiber bundle [1].

By the introduced concept of fiber bundles, the concept of a CW-complex can be seen as a hierarchical system of spaces X(−1) ⊆ X(0) ⊆ X(1) ⊆···⊆ X(n), constructed by pairwise disjoint open cells such that X(n) is obtained from X(n−1) by attaching adjacent n-cells to each (n − 1)-cell and X(−1) = ∅. The respective subspaces X(n) are called the n-skeletons of the base space. The fiber space, usually continuous, is well described by concepts of differential geometry such as tensor algebra. In the simpler case of a vector bundle, operations from linear algebra can be utilized and the fiber space is fully described by its dimensionality, also called the multiplicity as it defines the number of quantities per point.

25 Definition 42 (Trivial Bundle) If a fiber bundle over a space B with fiber F can be written as B × F globally, then it is called a trivial bundle (B × F,B,pr1).

Definition 43 (Section) A map σ : B → E is called a section of a fiber bundle (E,B,p : E → B) if p ◦ σ : B → B = id.

A section gives an element of the fiber over every point in B.

Definition 44 (Vector Bundle) If the fiber of a bundle is a vector space and its structure group is a linear group Gln, then it is called a vector bundle.

Fb

σb Zero section

σ π B b Figure 2.13: Zero section of a vector bundle

2.5 Manifolds

Manifolds are the type of topological spaces which are used in this work for the base space.

Definition 45 (Topological Manifold) A Hausdorff space H is a topological manifold if it is locally, that means a neighborhood for every point P ∈ H, homeomorph to an open subset M of RN , where N is the dimension of H. Which implies that for a neighborhood U around a point P ∈ H a mapping of the form

κ : U→M⊆ RN (2.10) exists.

The neighborhood U is called Euclidean. κ is called a chart and assigns a set of values from RN , com- monly called coordinates, to the points in the neighborhood U. In non-empty intersections of two neigh- borhoods U1, U2 ⊆ H it is possible to define a transition form one chart to another in the following manner:

−1 κ1 ◦ κ2 : κ2(U1 ∩U2) → κ1(U1 ∩U2) (2.11) Which expresses the agreement of different charts in overlapping regions. A union of Euclidean neighbor- hoods that yields the topological space in combination with their respective charts is called an atlas. The specification of a Hausdorff space and an atlas characterizes a topological manifold.

Definition 46 (Differentiable Manifold) A differentiable manifold M is obtained by demanding that the transition functions defined in Equation 2.11 be differentiable bijections of a differentiability class Ck.

Thereby demanding the charts κ to be diffeomorphisms of class Ck. The structure of a differential mani- fold M allows to attach tangent spaces at each point.

26 Definition 47 (Tangent Space) A tangent space TP at a point P is defined as the union of the directional derivatives in this point P . The thusly defined tangent space TP is a vector space of the same dimension as M

dimTP = dimM (2.12)

Finally the following connection to the fibrations can be seen with the example of a tangent bundle. The disjoint union of the tangent spaces to all points of a manifold M is called a tangent bundle. The dimension of the tangent bundle is twice the dimension of the differentiable manifold. π One of the most important examples of a vector bundle is the tangent bundle T (M) −→ M of n-dimensional differentiable manifolds M.

Definition 48 (Tangent Bundle) The union of all tangent spaces Tp(M) on a manifold M together with the manifold is called the tangent bundle T (M):

T (M) := {(p, v) : p ∈ M, v ∈ Tp(M)} (2.13)

A vector field is a section of the tangent bundle. The zero section of a vector bundle is the submanifold of the bundle that consists of the zero vectors. In general, a vector bundle of rank (i, j) is spanned locally by i + j independent sections. Every differentiable manifold possesses a tangent bundle T (M). The dimension of T (M) is twice the dimension of the underlying manifold M, its elements are points plus tangential vectors. The mapping π : T (M) → M is called a natural projection. The inverse image of a point p ∈ M is the tangent space Tp(M). Tp(M) is the fiber of the tangent bundle over the point p. In the following some examples of different fiber bundles are given:

• The tangential bundle of a manifold is a vector bundle.

• The infinite M¨obius strip and the infinite cylinder S1 × R are vector bundles, because the fiber R is a vector space.

• The Hopf bundle is not a vector bundle, because the fiber S1 is not a vector space.

The data in the fiber space can be categorized by the amount of information that is available on it. The fiber space of a fiber bundle has a certain dimension and thus an element has the same dimensionality at each point. If the dimensionality is unknown or may vary at each point, then the generalization of fiber bundles to a sheaf with stalks is modeled. (cite kod85). If more information for an object with fixed dimension is available, e.g. some linearity relationship, then a vector bundle is specified. In the special case of the same dimensionality of the fiber space and the base space, a tangential bundle is obtained. The give an overview of these relationships, Figure 2.14 presents this abstract hierarchy. To introduce the concept of dual spaces, a non degenerative mapping between two spaces is used.

Definition 49 (Scalar Product) A scalar product is a non degenerate bijective mapping of two vector spaces into a field

η : V×V∗ → R (2.14)

Non degenerate means that η 6= 0 for any fixed vector v from one of the vector spaces, for all elements of the other vector space, except for the 0 element.

27 Sheaf Stalk

constant type

Fiber bundle Fiber section

linearity

Vector bundle Tensor field

dim(M)

Tangential bundle Vector field

Figure 2.14: A hierarchy of concepts with partial specialization. The most general form is represented by a sheaf concept. The concept of fiber bundles is obtained by using constant fibers. If the fiber space respects linear vector space properties, then the concept of a vector space is derived. Finally, by confining the dimension of the base and fiber space, a tangential bundle is received.

Definition 50 (Dual Vector Space) A vector space V∗ is called dual if it is connected to a vector space V by a scalar product.

The elements of V∗ act as linear forms on elements of V. In the same fashion that the vector space V has a dual space V∗, the dual vector space is also associated with a dual vector space V∗∗ that again contains the linear forms on the elements of V∗. These linear forms ∗∗ ∗∗ ∗ v ∈V can be obtained by applying the scalar product η to the linear forms α ∈V for fixed v0 ∈V

∗∗ hα, v0i = v0 (α) (2.15)

As the elements from V now uniquely identify linear forms on V∗ by using only the structure of the scalar product that makes V∗ the dual of V, the vector space V∗∗ can be identified with the original vector space V. Accordingly a scalar product

η∗ : V∗ ×V → R (2.16) is defined that is connected to the initial scalar product by

hv, αi∗ = hα, vi (2.17)

Definition 51 (Cotangential Space) The vector space structure of a tangent space TP along with a scalar ∗ product induces a dual vector space, the cotangent space TP .

28 Differentiable Manifold Y Hausdorff Space Points Atlas Tangent Bundle

∗ TP Scalar Product η TP Tangential Space Dual Tangential Space vectors Linear Forms on TP ∗ Isomorphic η ∗∗ TP ∗ Linear Forms on TP Scalar Product

29 2.6 Computational Topology

The name computational topology stands for the fact that this section introduces concepts from algebraic topology to enable a computational treatment of the still theoretical structures and concepts. This section follows the line of [29] where it is stated that the construction of a homology theory for oriented complexes is a purely algebraic process. Compared to the introduced theory of CW-complexes this section focuses more on the details of algebraic topology properties of chains and cochains, and therewith more on the connection between geometrical objects and the corresponding physical quantities attached to them. The usage of maps to build algebraic bodies for each type of cell and all skeletons is given. The complex is therefore equipped with an algebraic structure. The representants of this mapping, the chain groups and the boundary operators are defined, and then the homology groups follow easily. Therefore the chain and cochain concepts are introduced first, and afterwards the homology groups are introduced. The reason for this section is, that in contrast to the vast amount of literature on homology theory, there seems to be no systematic exposition on its role in boundary value problems of electromagnetics [28]. For the view of computational and implementation concepts, this section introduces properties to describe the topological space in a combinatorial manner, e.g., by a cellular decompositions such as simplices or cuboids. In other words, the geometrical objects and physical quantities are abstracted to pure algebraic concepts. For a brief overview of this section, homological concepts are used to measure connectivity and holes in a topological space. Therewith, distinguishing capabilities are derived to separate structures such as a cylinder, torus, and a sphere. Also, homology theory reduces topological problems that arise in the use of the classical integral theorems of vector analysis to more easily resolved algebraic problems. Homology is a procedure to associate a sequence of aeolian groups or modules with a given mathematical object such as topological spaces [21]. The following introduced concepts formalizes in a clean way how to handle geometrical objects and physical quantities. An object with, e.g. an orientation is really not the same object as one without. If objects are equipped with structure, then this objects cannot be seen as the same as before without this structure. Therefore this section introduces the remaining properties of objects, geometrical as well as topological, to handle the computational mechanisms appropriately. The basic idea behind homology can be estimated by an algebraization of the first layer of geometry in cell structures, e.g., how cells of dimension n attach to cells of dimensions n − 1 [21]. Like the Euler characteristic, homology is an invariant of the underlying space of the complex.

30 2.6.1 Chains

To use the elements of a cell complex in a computational manner, a mapping between the cells onto an algebraic structure. Therewith an algebraic representation of the assembly of cells with a given orientation is available. This chain (group) of oriented cells later defines the homology groups. A formal definition for a p-chain (pth chain group) is given by:

Definition 52 (P-Chain) A p-chain cp defined over a cell complex K and a vector space X is a formal sum

np i i cp = wiτp τp ∈ K, wi ∈X (2.18) Xi=1 and respecting the orientation that is closed under orientation reversal:

∀cp ∈ cp there is − cp ∈ cp (2.19)

All different topological parts are called cells, and the dimensionality can be expressed with the dimension as 3-cell for a volume, 2-cells for surface elements, 1-cells for liens, and 0-cells for vertices.

Given any n-dimensional domain K, the set of p-chains Cp(K), with 0 ≤ p ≤ n can be extended to aeolian groups, the result being the set of all linear combinations of elements of Cp(K) with coefficients in Z, the integers. This group is denoted by Cp(K, Z) and called the group of p-chains with coefficients in Z. The most simple form of representing the cells, when the Z field is used, form a cell complex can then be attributed with: - 0: if the cell is not in the complex - 1: if the unchanged cell is in the complex - −1: if the orientation is changed.

A geometrical interpretation of the wi can be given by the fact, that each cell of a complex can be weighted differently. Algebraically, the set C(K, G) of all p-chains over a given complex K with coefficients in G form a vector space, which enables us to use all vector operations on chains. In other words, a p-chain can be viewed as a map from the p-cells (cells of dimension p) of a complex K to some domain G; it assigns an element of G to each p-cell ci ∈ K. Thus, we may add two p-chains, or multiply a p-chain by a scalar. In addition, p-chains support algebraic-topological operations, including the boundary and coboundary operations. A cell complex can therewith be seen as a formal structure where cells can be added, subtracted, and multiplied. The cell complexes used in this work have a chain group in every dimension. The part of homology can introduced here for the first time, due to the fact that homology examines the connectivity between two immediate dimension. A structure-relating map between chain groups is therefore introduced.

i i Definition 53 (Boundary Homomorphismus) Let K be a cell complex and τp ∈ K,τp = [k0, k1, .., kn]. The boundary homomorphism ∂p : Cp(K) → Cp−1(K) is: i i ˜ ∂pτp = (−1) [k0, k1, .., ki, ...kn] (2.20) Xi where k˜i indicates that ki is deleted from the sequence.

It should not be confused with the geometric boundary of a point set. This algebraic-topological opera- tion defines a (p − 1)-chain in terms of a p-chain, and is compatible with the additive and the external multiplicative structure of chains and build a linear transformation: ∂p {Cp} −→{Cp−1} (2.21)

31 Therefore the boundary operator can be used linearly

i i ∂ wiτp = wi(∂τp) (2.22) ! Xi Xi which means that the boundary operator can be used for each cell separately. The cell complex properties can be easily calculated by means of chains. A 3-cell intersects on 2-cells or has an empty intersection. This operation can be described by the boundary operator ∂cp of an oriented p-cell cp which is defined as the (p-1)-chain composed by the (p-1)-cells of the cell complex and the corresponding induced orientation on them:

Definition 54 (Cell complex) A cell complex K is a set of cells that satisfy the following properties: i i m - The boundary of each p-cell τp is a finite union of (p − 1)-cells in K : ∂pτp = m τp−1 i j K K - The intersection of any two cells τp,τp in is either empty, or is a unique cellS in

To give a simple example a simple boundary operator ∂p to derive the boundary of a simplex, e.g. in one and two dimensions as a triangle, is used.

3 τ1 4 5 τ0 3 τ1 6 τ0 4 τ0 τ1 5 τ0 4 3 τ0 4 2 τ1 τ1 n τ1 2 3 τ1 τ0 1 2 1 2 3 4 τ τ ∂2τ2 = τ1 + τ1 + τ1 + τ1 1 0 τ2 1 τ0 1 1 τ1 τ0 2 τ0 Figure 2.15: Representation of a 1-chain with the respective boundary(left) as well as a a 2-chainwith the respective boundary (right).

By applying the boundary homomorphism, Figure 2.15 reads:

1 2 3 4 ∂2τ2 = τ1 + τ1 + τ1 + τ1 (2.23) 1 2 3 4 1 2 2 3 3 4 4 1 ∂1 τ1 + τ1 + τ1 + τ1 = τ0 + τ0 − τ0 + τ0 − τ0 ++τ1 − τ1 − τ0 = 0 (2.24)  Using the boundary operator on a sequence of different chains, the following concept is introduced:

Definition 55 (Chain Complex) A chain complex C∗ = {Cp, ∂p} is a sequence of modules Ci over a ring R and a sequence of homomorphisms

∂p : Cp → Cp−1 (2.25) such that

∂p−1∂p = 0 (2.26)

32 and therefore builds the following sequence:

0 ←−∂ C0 ←−∂ C1 ←−∂ C2 ←−∂ C3 ←−∂ 0 (2.27) because there are no (p+1) cells in K. The next picture depicts the homological parts of the three- dimensional chain complex with the respecting images and kernels, where the chain complex of K is defined by im∂p+1 ⊆ ker∂p, or in a less abstract way, ∂p∂p+1 = 0

C3 C2 C1 C0=Z0

Z3 Z2 Z1

B2 B1 B0

B2 0 0 0 0 0 0

Figure 2.16: A three-dimensional chain complex.

For this work, the ring R will be R or Z in which case the modules Cp are vector spaces or Abelian groups respectively. A familiar example is the chain complex:

C∗(K; R = {Cp(K; R), ∂p}) (2.28)

The cain complex for the coefficient group Z is given by:

C∗(K; Z = {Cp(K; Z), ∂p}) (2.29)

Theorem 1 (im∂p+1,ker∂p [23]) im∂p+1 and ker∂p are free Abelian normal subgroups of Cp. im∂p+1 is a normal subgroup of ker∂p

The following two definitions are the detailed definition of these subgroups:

th Definition 56 (Cycle) The p cycle (group) is Zp = ker∂p. A chain that is an element of Zp is a k-cycle.

th Definition 57 (Boundary) The p boundary (group) is Bp = im∂p+1. A chain that is an element of Bp is a p-boundary.

A

B

C

Figure 2.17: Overview of terms.A, B, C are cycles. C is a boundary, but not A, B.

33 2.6.2 Cochains

Before the concept of chains only the simple structure of a cell complex was available. The cell complex does only contain the set of cells and their connectivity. The introduced concept of a chain provides the concept of an assembly of cells and the corresponding algebraic structure. Therewith chains can be seen as mappings from oriented cells as part of a cell complex to another space. This definition establishes the algebraic access of computational methods to handle the concept of cell complex. The only requirement for this mapping is, that it is closed under orientation reversal. At this state, no declaration is available to associate values on collection of cells. Next to cell complexes, scientific computing requires the notation and access mechanisms for global quantities related to macroscopic p-dimensional space-time domains. This collection of possible quanti- ties, which can be measured, can then be called field, which permit the modeling of these measurement as a field function that can be integrated on arbitrary p-dimensional subdomains. An important fact which has be stated here is, that all quantities which can be measured are always attached to a finite region of space. A field function can then be seen as the abstracted process of measurement of this quantity. The concept of cochains, which is introduced next, is this required mechanism, which associates numbers not only with single cells, as chains do, but also with assemblies of cells. Briefly, the necessary require- ments are that this mapping is not only orientation dependent, but also linear with respect to the assembly of cells, modeled by chains. A cochain representation is now the global quantity association with subdo- mains of a cell complex, which can be arbitrarily build to discretize a domain. The fields will therefore be manifested on a linear assembly of cells. Based on cochains topological laws can be given a discrete representation.

Definition 58 (Cochains [2]) A linear transformation σ of the p-chains into a vector space G of real numbers:

σ Cp −→ G (2.30) is a vector valued p-dimensional cochain or short p-cochain.

p p The space of all bounded linear functionals on Cp is denoted by C where the elements of C are called cochains and the duality pairing between chains and cochains is denoted by h.i, Therewith cochains ex- presses a representation for fields over discretized domains. Addition and multiplication by a scalar are defined for the field functions and so for cochains. To extended the expression possibilities, coboundaries of cochains are introduced.

Definition 59 (Coboundary) The coboundary δ of a p-cochain is a (p + 1)-cochain defined as:

δcp = wiτi, where wi = σ(f,τi)cp(f) (2.31) Xi f∈facesX (τi)

Thus, the coboundary operator assigns zero coefficients only to those (p + 1) cells that have cp as a face. As can be seen, δcp depends not only on cp but on how cp lies in the complex K. This is a fundamental difference between the two operators ∂ and δ. The coboundary of a p-cochain is a p + 1 cochain, which assigns to each (p + 1) cell the sum of the values that the p-cochains assigns tot he p-cells which form the boundary of the (p + 1) cell. Each quantity appears in the sum multiplied by the corresponding incidence number. The adjoint of ∂, δ : Cp → Cp+1 is defined by:

p p sh∂cp+1, c i = hcp+1, δc i (2.32)

34 + / −

+ / − + / −

Figure 2.18: Coboundary: K1 −→δ δK1 −→δ δδK1 =0, [2]

Cochain complexes [21, 28] are defined in a way similar to chain complexes except that the arrows are reversed.

Definition 60 (Cochain Complex) A cochain complex C∗ = {Cp, δp} is a sequence of modules Cp and homomorphisms:

δp : Cp → Cp+1 (2.33) such that

δp+1δp = 0 (2.34) building the following sequence with δδ = 0:

0 −→δ C0 −→δ C1 −→δ C2 −→δ C3 −→δ 0 (2.35)

Generic field representations are described by cochains complexes. An example of a cochain complex is

C∗(K; R = {Cp(K; R), δp) (2.36)

35 2.6.3 Duality between Chains and Cochains

A statement [29] was given that the concepts of chains and cochains on finite complexes coincide. Ge- p ometrically Cp and C are distinct [2] despite an isomorphism. An element of Cp is a formal sum of p p-cells, where an element of C is a linear function that maps elements of Cp into a field. But, the chain parts are dimensionless multiplicities whereas those associated by cochains are physical quantities [20]. The extension of cochains from single cell weights cells to quantities associated with assemblies of cells is not trivial and makes cochains very different from chains, even on finite cell complexes. K p K For a chain c ∈ Cp( ) and a cochain ω ∈ C ( ), the integral of ω over c is denoted by c ω, and integration can be regarded as a mapping: R p : Cp(K) × C (K) → R, for 0 ≤ p ≤ n (2.37) Z where R is the set of real numbers. Integration with respect to p-forms is a linear operation: given p a1, a2 ∈ R,ω1,ω2 ∈ C (K) and c ∈ Cp(K), we have

a1ω1 + a2ω2 = a1 ω1 + a2 ω2 (2.38) Zc Zc Zc Reversing the orientation of a chain means that integrals over that chain acquire the opposite sign:

ω = − ω (2.39) Z−c Zc

Using the set of p-chains with vector space properties Cp(K, R), e.g., linear combinations of p-chains with coefficients in the field R. p For this case, for a1, a2 ∈ R, c1, c2 ∈ Cp(K, R),ω ∈ C (K, R),

ω = a1 ω + a2 ω (2.40) Za1c1+a2c2 Zc1 Zc2 Similar, taking a ring R and forming linear combinations of p-chains with coefficients in R, we have an R-module Cp(K, R). This definition has the previous two as special cases. For coefficients in R, the operation of integration can be regarded as a bilinear pairing between p-chains and p-forms. Furthermore, for reasonable p-chains and p-forms this bilinear pairing for integration is nondegenerate,

if ω = 0 ∀c ∈ Cp(K), then ω = 0 (2.41) Zc and

if ω = 0 ∀ω ∈ Cp(K), then c = 0 (2.42) Zc p In conclusion, it is important to regard Cp(K) and C (K) as vector spaces and to consider integration as a bilinear pairing between them. Then the duality pairing between chains and cochains is denoted by h, i, given in the following example:

p p hCp, C i = C (2.43)

36 p where Cp is a p-chain and C is a p-field representation over the cell complex. The cochains expresses a representation for fields over discretized domains. Addition and multiplication by a scalar are defined for the field functions and so for cochains. Stoke’s theorem on manifolds is the generalization of these classical integral theorems. To see the topological nature of these problems, the process of integration must be reinterpreted algebraically [28]. In a more general way the adjoint of the exterior differential as the boundary of a weighted domain can be defined formally:

dwp = wp (2.44) Zwp+1 Z∂wp+1 It has to be noted, that the given differential form expression is more general than the div (a) notation due to the fact, that the expression is valid for div (a) grad () , curl () where the vector calculus notations are derived with the type of the underlying field automatically with the general notion of:

(p) (p) hC(p+1), δC i = h∂C(p+1), C i (2.45)

Examples of cochain complexes for differential operators encountered in different works [19, 30] are ex- amples for vector analysis in three dimension:

grad() curl() 0 → {scalar functions} −−−−→{field vector} −−−→ (2.46) div() {flux vectors} −−−→{volume densities}→ 0 and for two dimensions

grad() curl() 0 → {scalar functions} −−−−→{field vector} −−−→{densities}→ 0 (2.47)

2.6.4 Homology and Cohomology

The introduced concepts of chains can also be used to characterize properties of spaces, the homology and cohomology. Here it is only necessary to use Cp(K, Z). The set of all n-cycles Zn and the set of all 2 n-boundaries Bn are vector spaces over Z2. Since ∂ = 0, we have Bn ⊂ Zn. The homology is then defined by Hn = Zn/Bn. The homology of a space is a sequence of vector spaces. The topological classification of homology can therewith be introduced.

Bp = im∂p+1 and Zp = ker∂p (2.48) so that

Bp ⊂ Zp and Hp = Zp/Bp (2.49) where

βp = RankHp (2.50)

For cohomology

Bp = imdp+1 and Zp = kerdp (2.51)

37 so that

Bp ⊂ Zp and Hp = Zp/Bp (2.52) where

βp = RankHp (2.53)

C3 C2 C1 C0=Z0

Z3 Z2 Z1

B2 B1 B0

B2 0 0 0 0 0 0

Figure 2.19: A graphical representation of homology and cohomology.

The dimension of the vector spaces Hn are the Betti numbers βn. The Betti numbers represents the number of non-homologous cycles which are not boundaries = “n-dimensional holes”. In 3d cell complexes made from full 3-dimensional cubes:

- β0 counts the number of connected components - β1 counts the number of tunnels - β2 counts the number of enclosed cavities Examples are:

- cylinder: β0 = 1, β1 = 1, βn = 0 ∀n ≥ 2 - sphere: β0 = 1, β1 = 0, β2 = 1, βn = 0 ∀n ≥ 3 - torus: β0 = 1, β1 = 2, β2 = 1, βn = 0 ∀n ≥ 3 The Euler characteristic of a cellular complex is a topological invariant which for 2-dimensional com- plexes is defined by: ξ = vertices − edges + 2 − cells. The Euler characteristics can be recovered from the Betti numbers by: ξ = β0 − β1 + β2.

Definition 61 (Homology Maps) A continuous map f : X → Y induces a linear transformation on homology. fn : Hn(X) → Hn(Y )

The definition of this transformation is simple when f maps, e.g., cubes to cubes. Otherwise the map must first be approximated by a special map. Homology maps can be useful in comparing topologies, especially inclusion, projection, and translation maps.

38 2.7 Fiber Bundles and Chains/Cochains

Based on the concepts of fiber bundles and chains and the corresponding cochains, a combined model is introduced. Therewith a formal base for the practical concepts Section II and the subsequent application design section is introduced. Based on the fiber bundle approach and the identification of the base space offers great means to separate different types of issues during the development of applications for scientific computing. A complete modular system of implementation parts is therewith directly given which always communicate over a formal interface, the preimage property. Not only is the fiber bundle approach the most general approach to deal with scientific data. It is also the basic separation for application design concepts and scientific visualization [1, 24]. The identification of the already developed and used STL/container and the further development of cursor and property map with the fiber bundle approach offers new means to applied concepts, as given in Section 6.2. The concept of chains transforms the properties of a cell complex directly to a computational manageable algebraic structure. The practical concepts in the subsequent sections uses these mechanisms to derive different operations of the cells. The identification and separation of the cell and complex topology of- fers possibilities to develop means of generic data structures and traversal mechanism. Not only can the dimensional attribute be removed, but also can the concepts which have to be implemented be drastically reduced. Section 6.1 presents this approach. Due to the identification on the one hand side of the base space and fiber space and on the other hand side the chain and cochain approach various new concepts can be derived. As presented in the following picture, an example based on a tangent bundle and cotangent bundle, algebraic topology, and especially homology and cohomology, can be identified with a common vector bundle structure with a common base space Y.

Scalar Product η

Homology < . , . > Cohomology

Scalar Product η ∗ TP < . , . > CTP Tangential Space Cotangential Space Atlas Tangent Bundle Cotangent Bundle

Y Hausdorff Space Points

Based on these concepts, the main concepts for the transition to computational mechanisms can be clearly seen. All of the given concepts require a common base space identification with a common atlas mech- anism. The tangential and cotangential space require algebraic properties for the used objects which can be seen can be implemented easily by chain and cochain models.

39 2.8 Topological Data Structures

The concept of chains transforms the properties of a cell complex directly to a computational manageable algebraic structure. The practical concepts in the subsequent sections uses these mechanisms to derive dif- ferent operations of the cells, such as incidence, adjacence, and boundary operations. The introduction of this section related to a topological data structure concept and separation of the cell and complex topology offers possibilities to develop means of generic data structures as well as generic traversal mechanism. Not only can the dimensional attribute be removed, but also can the concepts which have to be implemented be drastically reduced. The first requirement for a topological data structure is to satisfy the homological concepts of a chain and cochain in a computational environment and therewith the concept of the the boundary homomorphism

∂p {Cp} −→{Cp−1} (2.54) thus obtaining, for a complex of dimension n as a collection of Abelian groups Cp(K, G) and boundary homomorphisms, the following sequence:

∂1 ∂2 ∂3 0 ← C0 ←− C1 ←− C2 ←− C3 ← 0 (2.55)

These requirements have to be transfered into an applicable data structure. As already introduced, the complete chain complex of K is defined by im∂p+1 ⊆ ker∂p, or in a less abstract way, ∂p∂p+1 = 0 It is of particular interest because while im∂p+1 ⊆ ker in general im∂p+1 6= ker∂p and the part of ker∂p not in the inclusion contains useful information formulated concisely via homology groups and the exact homology sequence.

The basic property for a consistent data structure is the total ordering of the mn n-cells of an n-dimensional complex K (the chain concept) whereas the vertices of a cell are modeled by a partial ordering, as intro- duced in Section 2.3.1. Based on the lattice property of a cell complex, and in this work all used cell complexes fulfill the lattice property, each cell can be uniquely identified by its set of vertices, e.g. a p-cell:

τp = {vk0 , .., vkp } (2.56)

Therefore the only information to be stored is the connectivity information of the cell-vertex relation with the corresponding orientation.

The following connection matrix with a size of mn × m0 is obtained. With additional data structural con- cepts, this information can be stored as sparse mn×k matrix, with a constant k = number of vertices of a cell for homogeneous cell complexes. The connection matrix, where j stands for the n-cells and i for the 0- cells:

i j +1 if vertex τ0 is a consistently oriented part of cell τn i i j Cj = −1 if vertex τ0 is not a consistently oriented part of cell τn (2.57) 0 otherwise

 This efficient way of storing the connection matrix gives the possibilities for: - Boundary operators - Incidence relations

40 - Adjacence relations - Basis for the chain group Cp(K) Based on the depiction of a poset by a Hasse diagram and the cell topology, a boundary operator can be used to traverse the levels of the poset, e.g. a three-dimensional simplex, Figure 2.21.

{a,b,c,d} c

{a,b,c} {a,b,d} {a,c,d} {b,c,d}

d {a,b} {a,c} {a,d} {b,c} {b,d} {c,d}

a b {a} {b} {c} {d}

{a,b,c,d} c ∂p c c {a,b,c} {a,b,d} {a,c,d} {b,c,d} d

d {a,b} {a,c} {a,d} {b,c} {b,d} {c,d} a

a {a} {b} {c} {d} b b

Figure 2.20: Boundary operator applied onto a simplex poset.

As can be seen, the boundary operator decreases simply the level of evaluation of the cell topology poset. The boundary operator ∂ which transforms a p-chains in a (p-1)-chains which is compatible with the additive and the external multiplicative structure of chains. By the linearity property of the boundary operator a unique and simple way of deriving consistently the boundary of a chain is given by.

i i wi(∂τp)= ∂ wiτp (2.58) ! Xi Xi For simplex cells, the following extraction map [28] can be derived which extracts the jth p-face of the k-th p + 1-cell :

fj(σp+1,k)= {σ0,0, .., σ˜0,j , ..σ0,p+1, } (2.59) This particular map is only valid for a simplex. For different types of cells, the extraction map is more complex. The given incidence matrix for the cochain and coboundary representation [20] can therewith be derived i i by: Given the ith p-cell τp and the jth p − 1-cell τp−1 of a cell complex, the specification of an incidence i j number [τp,τp−1]:

j i 0 if τp−1is not a face ofτp i j j i [τp,τp−1]= +1 if τp−1is a face of τpand has the induced orientation (2.60)  j i −1 if τp−1is a face of τpand has not the induced orientation  Therewith n-incidence matrices can be specified for an n-dimensional cell complex K:

i j Mp,p−1 = τp,τp−1 (2.61) h i 41 where the index i runs over all the p-cells and j runs over all the p − 1-cells. i The dual coboundary operator acts in the same way. The main difference is, that the topological informa- tion is only simple for structured topologies. For the unstructured approach, additional information has to be stored (incidence/adjacent information).

{1} {2}{4} {5} 2 cochain 7 8 9

1 cochain p 4 5 6 {1 | 2} {2 | 4} {4 | 5} {5 | 1} δ

0 cochain 1 2 3 {1 | 2 | 4 | 5}

+ / −

+ / − + / −

0 cochain 1 cochain 1 cochain

Figure 2.21: Coboundary operator applied onto a simplex poset.

Using the definition of the coboundary operator:

δcp = wiτi, where wi = σ(f,τi)cp(f) (2.62) Xi f∈facesX (τi) If the quantities within a fiber space are seen as an abstract cell complex, the coboundary operator can be calculated by going one level up within the complex topology poset.

42 2.9 Physical Fields and Electromagnetics

For concepts of transformations for physical fields into a concise framework of differential forms. The important part for the subsequent sections is the fact, that it is necessary to equip the inherently discrete parts of the computer implementation with at most of possible information about the to be projected physical entity. Therewith not only the dimension, but also the orientation and the information of a primary and secondary quantity has to be transfered correspondingly. One of the fundamental concepts of mathematical physics is that of a field. Briefly, a spatial distribution of mathematical object representing a physical quantity [20]. The transition of this modeling step into the regime of the computer necessitates another approximation, the reformulation of the given problem in discrete terms, as a finite set of algebraic equations. A common structure behind most of the currently known physical theories build on a long tradition [31], emphasizing the following: - The existence within physical theories of a natural association of many physical quantities, with geo- metric objects in a space-time. - The necessity to consider as oriented the geometric object to which physical quantities are associated. - The existence of two kinds of orientation for these geometric objects. - The primacy and priority, in the foundation of each theory, of global physical quantities associated with geometric objects, over the corresponding densities. Based on this topics a natural classification of physical quantities [32], called Tonti-diagrams, follows based on the type and kind of orientation of the geometric objects associated with quantities the following definitions are required:

Definition 62 (Global quantities) Physical quantities which are associated with macroscopic space-time domains.

Compared to the global quantities, which are actually field measurements which are always performed on domains having finite extension, local quantities are defined as:

Definition 63 (Local quantities) Local quantities are quantities with a local support, e.g., densities and rates.

This section briefly introduces Maxwell’s equation, as a special field of scientific computing. It was shown, that [19, 32] most of the physical theories can be treated in a similar way, sometimes called the Tonti-diagram, due to the intrinsic nature and structure of the physical concepts. Here, the classic theory of electromagnetics treats electric and magnetic macroscopic phenomena includ- ing their interaction are introduced. The following four space-time dependent vector fields are here con- sidered: - the electric field density denoted by E [V/m] - the magnetic field density denoted by H [A/m] - the electric displacement field (electric flux) D [As/m2] - the magnetic induction field (magnetic flux) B [V s/m2] The sources of electromagnetic fields are electric charges and currents described by - the charge density ̺ [As/m3] - the current density function J [A/m2] where the SI units denotes meter m, seconds s, Amper´e A, and Volt V . The equations related to electromagnetics in integral form read:

43 d E · dl = − B · Faradays law (2.63) dt Z∂S B · dS = 0 Gauss law for magnetic charge (2.64) ′ Z∂V d H · dl = D · dS + J · dS Amperes Law (2.65) ′ dt ′ ′ Z∂S ZS ZS D · dS = ̺ dV Gauss law (2.66) Z∂V ZV If S = ∂V i Amp´ere’s law and using ∂∂ = 0, then the field vector can be eliminated between Amp´ere’s and Gauss’ law to obtain a statement of charge conservation: d 0= ̺ dV + J · dS (2.67) dt ZV Z∂V If the surfaces and volumes are stationary with respect to the inertial reference frame of field vectors, the local differential versions of the Maxwell equation can be used:

curl (E)= −∂tB (2.68) div (B) = 0 (2.69)

curl (H)= ∂tD + J (2.70) (2.71)

The following equations are the local differential forms of the conservation.

div (D)= ̺ (2.72)

div (J)= −∂t̺ (2.73)

The differential versions of Maxwell equations are much less general than the integral laws for three main reasons [28]: - They need to be modified for use in problems involving noninertial reference frames. Differential forms are an essential tool for this modification. - The differential laws do not contain “global topological” information which can be deducted from the integral laws. The machinery of cohomology groups is the remedy for this problem - The differential laws assume differentiability of the field vectors with respect to spatial coordinates and time. However, there can be discontinuities in fields across medium interfaces, so it is not possible to deduce the proper interface conditions from the differential laws. For the third issue, the global integral laws have to be used to derive interface conditions which give relations for the fields across material or media interface:

n × (Ea − Eb) = 0, (2.74) n · (Ba − Bb) = 0, (2.75) n × (Ha − Hb)= K (2.76) a b n · (D − D )= σs (2.77)

44 where the superscripts refer to limiting the values of the field at on the interface when the interface is approached from the side of the indicated medium. It is important to highlight that the source vectors do not really exist but rather represent a way of modeling current distributions in the limit of zero skin depth For general field problems, the given equations are not sufficient for determining the electromagnetic field since there are six independent equations in twelve unknowns E, B, D, H. A step to obtain a consis- tent equation system, constitutive laws have to be introduced additionally. For the simple case of linear isotropic media the constitute laws are given by: - D = εE - B = µH - J = σEx Here, two different classes of equations can be seen. The first class represents equations which are based on conservation of a physical quantity.

Definition 64 (Structural law) Structural laws are related to conservation, balance, and equilibrium laws.

Structural laws can be treated based on topological invariants and can be expressed using operations of boundary and coboundary The second class are constitutive laws which act as link between different orientation of structural law.

Definition 65 (Constitutive law) Constitutive laws are related to local links.

An example is given by the well known correspondence of the electric field and the dielectric flux

D = εE (2.78)

With the assumption of a capacitor with two planar parallel plates of area A, having a distance l between them and filled with a uniform, linear, isotropic medium having relative permittivity εr, the equation can be written as: Ψ V = ε (2.79) A l Unlike topological laws, constitutive equations imply the recourse to metrical concepts. This is not appar- ent in the first equation due to the use of vectors to represent field quantities tend to hide the geometric details of this theory. If the first equation is written in terms of differential forms or a geometric expression the natural connection to geometry is easily seen. The result will be a distinction of topological laws, which are intrinsically discrete, from constitutive relations, which admit only approximate discrete rendering. One problem with the differential forms can be easily seen. A normal expression, where no idea of the corresponding orientation can be obtained is given by, where the association with geometric objects is reinserted:

B · ds (2.80) ZS ̺ dV (2.81) ZV A better suitable representation would build on a referencation directly to the oriented geometric object that a quantity is assigned. Such a representation is given within the formalism of ordinary and twisted

45 differential forms (Burke, deRham 1931). Based on this formalism, the vector field B becomes an ordinary 2-form b2 and the scalar field ̺ a twisted 3-form ̺˜3:

B → b2 (2.82) ̺ → ̺˜3 (2.83)

The corresponding numbers and orientation give the important dimension on which these quantities can only be integrated. Each internally oriented geometric object induces an internal orientation on the objects that constitute its boundary. differential form physical quantity associated internal oriented object associated external oriented object 1-form E → (L × I) line 2-form B → (S × T ) area/surface

twisted 2-form J → (S˜ × I) line area/surface twisted 3-form ̺ → (V˜ × T ) point volume

twisted 1-form H → (L˜ × I) area/surface line twisted 2-form D → (S˜ × T ) line area/surface

0-form φ → (P × I) point 1-form A → (L × T ) line Also, the necessity of a primary and secondary cell complex can therewith be easily seen. The requirement of housing the global physical quantities of a problem implies that both objects with internal orientation and objects with external orientation must be available. Hence, two logically distinct meshed must be defined, one with internal and the other with external orientation.

P L S V

source / sinkd

V˜ S˜ L˜ P˜ Figure 2.22: Top: internal orientation. Bottom: external orientation for geometric objects in three dimensions.

Next, the transition from global exact physical fields into the finite and discrete regime of cochains is given. Starting with the global form

B · dS = 0 Gauss law for magnetic charge (2.84) ′ Z∂V which can be written in global form by:

46 B = 0 Gauss law for magnetic charge (2.85) ZT Z∂V ZT ZV This expression can be stated by global quantities only and in a four dimensional space-time where also a 4D geometrical depiction, see Section (ref) can be given:

Φb(∂V × T ) = 0(V × T ) (2.86) where it has to be mentioned that this form does not use any material properties and is therefore no approximation. It can be seen that these physical fields posseses an intrinsic discrete nature. deRham (cite dehram) proved in 1931 that from the global validity of these given statements in a homologcially trivial space follows the existence of field quantities that can be considered the potentials of the densities of the physical quantities appearing in these global statements. With the chain and cochains concepts the following can be obtained:

Φb(∂V ) = 0(V ) (2.87) to:

b Φ (∂τ3) = 0(τ3) ∀τ3 ∈ K (2.88)

This expression can then be rewritten as a chain/cochain group:

2 3 (∂τ3, Φ ) = (τ3, 0 ) ∀τ3 ∈ K (2.89) list equivalences of global physical quantities but still compares to different dimensional cochains. The coboundary operator can then be used to obtain the following expression:

2 3 (τ3, δΦ ) = (τ3, 0 )∀τ3 ∈ K (2.90) which states the identity of the two cochains:

δΦ2 = 03 (2.91) which can be identified by the vector calculus notation of div () for a two-dimensional field. A graphical representation of this coboundary operator is depicted in Figure. With the concept of the topological data structure, the following expression can be stated as:

p p δc = Mp+1,p · c (2.92)

Finally, the following discrete system is obtained:

47 δΦ2 = 03 (2.93) δQ˜ 3 = 0˜4 (2.94) δU1 = Φ2 (2.95) δΨ˜ 2 = Q˜ 3 (2.96) which represents a representation of cochains of fields in terms of cochains. The introduced concept of constitutive links close the gap between ordinary and twisted cochains with discrete links between them. Two different types can be obtained:

p q L1 : C (K) → C (K˜ ) (2.97) r s L2 : C (K˜ ) → C (K) (2.98)

From here the p-forms are introduced where it can finally be seen that a discrete p-coboundary operator mimics a continuous p-forms.

48 Chapter 3

Numerical Discretization Schemes

An important step for handling partial differential equations is to use and develop stable, consistent, and accurate algebraic replacements where most of the global/continuous information of the original problem and more important, the inherent structure, is retained. Several methods are currently in use, such as the finite element (FE), finite volume (FV), and finite difference (FD) each with their specific approaches to discretization. Topological equations have an intrinsic discrete nature. Compared with the constitutive parts of the field equations, which are the central issues in the construction of effective discretization schemata and the only place where the recourse to local representations is fully justified. Introduces the nature of numerical discretization schemes. Also the necessary requirements are introduced for the practical section ref() for the implementation part. Numerical discretization schemes can be briefly explained by a model reduction, e.g.:

Lu = f → Ax = b (3.1) which transforms an infinite dimensional operator equation into a finite dimensional algebraic equation. Here it can already be seen that this is always accompanied by an inevitable loss of information due to the dimensional reduction. Briefly, the given discretization schemes addresses the task of replacing the partial differential equations system with an algebraic one differently. Therefore, a generic discretization concepts, based on [20, 33] where this part is called reference discretization scheme, is introduced first. These concepts are then presented within the context of the other methods. - The finite volume method is, with respect to the global and discrete formulation based on topological laws, the most natural. The nature of this method is based on the approximation of conservation laws directly in its formulation and is therefore conservative by construction. This method is suitable for arbitrary meshes with Delaunay property. This approach optimizes the approximation of the constitutive equation, the flow through the boundary of the cell. The topological laws as well as time stepping procedures can be integrated easily. - The finite element can be seen as a remarkably flexible and general method for solving partial differ- ential equations. Compared to the finite volume method, the spatial discretization can be much more arbitrary with only some quality constraints. The finite volume method can be modeled with special assumption by the finite element one, which are presented in Section 3.3.2. The continuous problem with an infinite amount of degrees of freedom is reformulated as an equivalent variational problem with a finite dimensional space. As described in the corresponding finite element section, this method is able to incorporate the constitutive relations appropriately. The incorporation of topological laws and time-dependent problems are more complex to integrate.

49 - The last scheme used in this work is the finite difference scheme which addresses the problem of numer- ical analysis from a quite different approach. Instead of using the conservation of the original problem (FV) or projecting the continuous problem into a finite dimensional space (FE), the finite difference method uses a finite difference approximation for the differential operators. Or in other words, it for- mulates an approximation of nodal derivatives. Therewith this approach optimizes the approximation for the laplacian operator in the central point of the patch. Despite the fact, that this method is simple and effective, easy to derive and implement, this approach gives an optimal solution to a problem which is not the one the discretization method for the field problems are expected to face. Next, this method is limited to structured topological cell complexes. For arbitrary meshes it is difficult to obtain and accurate or higher-order formulas for field problems. An important fact is, that in the one-dimensional case the balance equations are written for secondary 1-cells, the boundaries of which are formed by 0-cells. Therefore, in this case the FD strategy which optimizes onpoints can coincide with the FV strategy which optimizes on 1-cell boundaries.

50 3.1 Generic Discretization Concepts

This section overviews the basic concepts for finite discretization schemes in scientific computing. The most important mechanisms are introduced which are then later converted into applicable and well known discretization schemes, such as finite differences, finite volumes, and finite elements. The reason for this section is the approach to unify all different discretization schemes to a common kernel which is then directly converted to software design concepts in section 6.3. The presented concepts for discretization are mostly derived from [20], where an abstract reference discretization scheme is given which is not a full discretization scheme by itself. Here only the necessary concepts for time-dependent electromagnetic problems for the subsequent chapters are introduced, with additional concepts from the following works [19, 33–35] A field can be thought of as a collection of cochains, each of which is a projection of the field on a particular cell complex. The natural representation of a cochain as a vector of global physical quantities associated with cells

3.1.1 Domain Discretization

The simulation domain, either a simple space or a more complex space-time, is discretized by means of two dual oriented cell complexes for the primary and secondary mesh. The dual oriented cell complexes and therewith the primary and secondary mesh are not required to be physically or geometrically distinct. The logical separation is required to distinguish the different cochains/differential forms only. With dual cell complexes the attribution of boundary conditions and the treatment of material discontinuities can therewith be greatly simplified. The usage of non-dual cell complexes, e.g. operator adjointness would be more complex to obtain. i The primary p-cells are identified by p−τn where as the corresponding secondary n−p-cells are identified i by (n − p) − τn−p with the same index i and the default orientation.

c secondary cell τ2 primary cell a τ d τ0 3

d τ3 a τ0 c b τ2 τ1

b τ1

Figure 3.1: Discretization concept for the primary and secondary cell complex with consistent orientation.

The domain in time is also subdivided into two dual cell complexes. Here the 0-cells are time instants indexed in increasing time. Derived from the Yee formulation, see Section 3.4.2, to simplify the notation n n+1 n+1/2 ˜n the time interval going from t to t is indexed by t . The secondary cells t1−p Given an arbitrary time dependent equation

curl (G)= ∂tF (3.2) which can be interpreted as a higher dimensional object with a Cartesian product of the spatial domain × the time domain.: G(∂A × I)+ F (A × ∂I) = 0(A × I) (3.3)

51 primary cells

n+1/2 t1 n t0 n+1 t0 t

n−1/2 ˜n+1/2 t˜0 t0 t˜n secondary cells 1

Figure 3.2: Time stepping for the discretization concepts.

n A graphical representation is given by where two 0-cells are used for the corresponding time steps t0 and n+1 n+1/2 t0 and one 1-cell for the time interval t1 . The corresponding quantity G is attached to the time n+1/2 interval t1 , whereas the quantity F corresponds to the t0 cells.

n+1 t0

e b Fb,n+1/2 Gi,n+1

n+1/2 t1 e Fa,n+1/2

e Fc,n+1/2

e F +1 2 n d,n / t0

b Gi,n

Figure 3.3: Time stepping for the discretization concepts.

This topological time stepping was proven [20] to be consistent with the charge conservation and void of sources for the Faraday and Maxwell law. Based on this unique topological time stepping and various discrete constitutive links, a variety of numerical time-stepping procedures, explicit and implicit ones can be derived . Using the representations of cochains as vector of global physical quantities of the incidence matrices:

b b e Φn+1 =Φn − D2,1Φn+1/2 (3.4)

52 3.1.2 Topological Equations

A subset of an equation which form a physical field theory links a global quantity associated with an oriented geometric object to the global quantity that, within the theory, is associated with the boundary of that object These laws are intrinsically discrete, for they state a balance of these global quantities (or conservation, if one of the terms is zero) whose validity does not depend on metrical or material properties and is therefore invariant for very general transformations. This gives them a topological significance which is the basic reason for calling them topological equations. The discretization of a simple balance equation means that the equation must be written for domains of finite extensions. Starting from a discretized simple balance equation,

div (D)= ̺ Qsrc(D)= Qflow(∂D) (3.5) where the left equation represents a local or differential form and the right equation represents a global or discrete form. This transition takes place without any approximation error [33]. The balance equa- tion by itself does not require uniform fields, homogeneous materials or any other condition. The name topological equation was used [33] to express an idea of invariance under arbitrary homeomorphic trans- formations. For a topological equation, the discrete or global version appears as the fundamental one, with the differential statement proceeding from it if additional requirements are fulfilled.

3.1.3 Constitutive Relation Discretization

Discretizing constitutive relations determines a link between cochains which approximate the local consti- tutive equation. This step offers a great number of different choices. Here only two of them are discussed which are later used to discuss the finite element, finite volume, and finite difference schemes. A third one is given in [20].

Field Function Reconstruction and Projection

For most of the finite volumes and finite element schemes, a field reconstruction and projection is used with...

B = fµH (3.6)

In the discrete setting, the field functions B and H do not belong to the problem’s variable. Instead the magnetic flux cochain Φb and the electric flux cochain Ψ˜ h are given which have to be linked by the relation b h Φ = Fµ(Ψ˜ ). From the cochain Ψ˜ h a field function is derived as a reconstruction operator:

h H = Rh(Ψ˜ ) (3.7)

Next, the local constitutive link, Equation 3.6, is used to derive B. Then, the cochain Φb has to be obtained by means of a projection operator P b which produces a cochain for each field function B:

Φb = P b(B) (3.8)

53 The discrete constitutive link Fµ is finally given by:

b h b h Φ = Fµ(Ψ˜ )= P (fµ(Rh(Ψ˜ ))) (3.9)

A natural requirement for the reconstruction and projection operators is that for each cochain cp the following relation holds true:

∗ p p P (R∗(c )) = c (3.10)

∗ This means that the combined operator P R∗ must be the identity operator. For the later section indices for the reconstruction and projection operators were used to highlight the fact, that these operators have to be evaluated in the correct context of their operating field. To obtain a sparse matrix representation the reconstruction process is usually performed locally, so that the value of the reconstructed field function on each point depends only on the values taken by the original cochain on the cells of sufficiently small neighborhoods of the point. During the last decade, a comprehensive framework for the reconstruction and projection operators was derived, called mimetic discretization [2, 25, 36–39]. This framework analysis in detail how these two operators can be used to mimic the analytical and continuous nature of partial differential equation.

Global Application of Local Constitutive Relations

For a minor part of problems with the assumption of uniform fields in space and time as well as homo- geneous materials the local constitutive link, e.g. B = µH holds also true in a macroscopic space-time region [20]. With the process of sufficient spatial and time cell complex refinement (adaptive mesh and time step controls) this assumption can be made valid. The computational costs can be increase greatly. It is important to mention, that this approach can only be valid if the fields do not vary rapidly in space or time or the mesh refinement can be tolerated. Based on these local approximation a series of links between cochains are obtained which are the required discrete constitutive links. The presented finite discretization time domain scheme of Section 3.4.2 uses this type of approximation. It is therefore important to highlight that the this type of discretization based on this constitutive approximation can only be used on structured topologies and under the here given special material property and space constraints.

54 3.2 Finite Volumes

A numerical method is called finite volume method, if it uses a tessellation of the domain and writes the field equations in integral form on the resulting cells. In the area of TCAD, this method is normally called finite boxes [40] to express the geometrical origin of the discretized domain. The finite volume method is, with respect to the global and discrete formulation based on integral forms of conservative laws instead of partial differential equations, the most natural. It subdivides the domain into cells and evaluates the field equations in integral form on these cells. Most applications of the finite volume method do not include the time variable, which has to be enforced by a separate discretization. This approach optimizes the approximation of the constitutive equation, the flow through the boundary of the cell. The topological laws as well as time stepping procedures can be integrated easily The concepts and implementation are simple for different dimensions and different types of cell complexes. In the recent past, the finite volume has been greatly extended (cite ..). Following the work of [41].

3.2.1 Basic Concepts

In the finite volume method, the computational domain Ω ⊂ Rn is first subdivided into non-overlapping control volumes Vi of a conforming cell complex. In each control volume, an integral conservation law is imposed:

Definition 66 (Integral conservation law) An integral conservation law asserts that the rate of change of the total amount of a quantity with density u in a fixed control volume V is equal to the total flux of the quantity through the boundary ∂V .

∂t u dV + f(u) · dA = 0 (3.11) ZV Z∂V The step towards the discrete space, the integral conservation law is transfered to small control volumes:

N V = Vi, Vi ∩ Vj = 0, ∀i 6= j (3.12) i[=1

2D volume

Figure 3.4: Finite volume requirements for a primary and secondary spatial discretization.

The integral conservation law is readily obtained upon spatial integration of, e.g., a divergence equation in a region Ωi. The choice of control volume tessellation is flexible in the finite volume method. The primary requirement is a secondary mesh with special properties related to the primary mesh. In a vertex-centered method, the control volumes are formed as a geometric dual to the cell complex and solutions unknowns are stored on a per vertex basis. In the cell-centered method, the cells/triangles serve directly as control volumes with solutions unknowns (degree of freedom) stored on a per cell basis.

55 Figure 3.5: Primary control volumes used in the finite volume method. Left: cell centered. Right: vertex centered.

The integral conservation law is satisfied for each control volume and for the entire domain. To obtain a linear system, integrals must be expressed in terms of mean values. Fundamental to the finite volume method are two issues. First, the introduction of an piecewise constant cell average for each control volume:

1 u = u dV (3.13) i |V | i ZVi This cell average renders a discontinuity at the cell interfaces. The corresponding single solution flux is then ambiguous at the interface. A geometrical interpretation is given in Figure 3.6 for a one-dimensional cell complex, e.g. for cell ci.

f ci

x

Figure 3.6: Finite volume and the rendered cell average value within each cell for a one-dimensional cell complex.

Secondly, the replacement the true flux at the interfaces by a numerical flux function g(u, v) : R×R → R. For arbitrary space dimensions the flux integral is approximated by a numerical integration scheme, e.g. quadrature rule:

f(u)dA ≈ gjk(uj, uk) (3.14) ∂V Z ejkX∈∂V The numerical flux has to satisfy the following properties: - Flux conservation: the flux resulting from adjacent control volumes sharing an interface exactly cancel under summation:

gj,k(u, v)= −gkj(v, u) (3.15)

56 - Consistency: the numerical flux, evaluated with identical statements has to reduce to the true total flux passing through ejk.

gjk(u, u)= f(u)dA (3.16) Zejk Using 3.14 and the following interpretation of finite volumes for stationary meshes as producing an evo- lution equation for cell averages:

d d u dV = |V | u (3.17) dt i dt i ZVi one of the simplest finite volume schemes in a semi-discrete formulation can be obtained utilizing contin- uous in time solution representations, t ∈ [0, ∞], and piecewise constant solution representation in space, 0 uh(t) ∈ Vh such that: 1 u (t)= u (x ,t)dV (3.18) j |V | h i i ZVi with initial data 1 u (0) = u (x )dV (3.19) j |V | 0 i i ZVi and numerical flux function gjk(uj, uk) is given by the following system of ordinary differential equations:

d 1 uj + gjk(uj, uk) = 0 ∀Vi ∈ V (3.20) dt |Vi| ejkX∈∂V The final issue remains, the solution is available only at the computational nodes, the control volume centers. Interpolation is needed to obtain the function values at vertices. Using an explicit or implicit time integration mechanism, e.g. a forward Euler scheme:

n+1 d u − un u ≈ j j (3.21) dt j ∆t producing a fully-discrete finite volume form.

n+1 n ∆t n n uj = uj − gjk(uj , uk ) = 0 ∀Vi ∈ V (3.22) |Vi| ejkX∈∂V

3.2.2 Analysis of the Finite Volume Method

The finite volume method uses a dual mesh based on a primary with internal orientation and a secondary with external orientation cell complex.

Global values are directly attached to cells. The E · sk of the electric field intensity vector E which is k assumed to be constant along the cell, is projected onto primary 1-cells τ1 . Therewith global values are associated with each primary 1-cell.

57 Balance equations expressed by the finite volume method directly by

D = ̺dV ∀V ∈ D (3.23) Z∂V ZV where it can be applied to each 3-cell of the cell complex. The discrete notation reads:

(p−1) (p) h∂c(p), Dflow i = hc(p−1), Dsourcei (3.24) Here only global quantities are involved. No average values are required. A geometrical interpretation can be obtained that finite volumes describes the balance equations on crisp cells. Compared to finite differences, finite volumes and even finite elements do not discretize the differ- ential operators representing the local version of the equation. Finite volume, and finite element, uses the global version directly. The constitutive equations,e.g. D = εE, can be seen as a local transformation which links the field functions D, E and describes the property of the surrounding medium. This transformation must not be necessarily representable with a tensor. The finite volume formulates the relation of two cochains E1 and D2.

58 3.3 Finite Elements

The finite element method is a systematic approach to approximate the unknown exact solution of a partial differential equation based on basis functions and the projection of a given domain onto a consistent finite cell complex where the finite elements correspond with the n-cells of the complex for low order h-approximation. This cells are then later algebraically represented by homological chains. The origin of the finite element methods is in solid mechanics whereas the first formulation was based on a direct physical approach. Some constraints should be placed onto the shape functions to guarantee the fact that with an arbitrary number of shape functions the exact solution should be best approximated, in the limit, the exact solution should be obtained. A good approximation is obtained by a residual formulation where the residuum is formulated by the difference of the unknown exact solution and the calculated approximation solution. This residuum is weighted over the simulation domain and integrated with the requirement that the integral vanishes with a set of linear independent weighting functions. The other possible mechanism is the variational formulation of the partial differential equation. The resulting functional is then getting to be static, method of Ritz One of the main advantage of the finite element method is the possible adaptation of the shape function to the eigenfunctions of the differential operators. A high precision of this method can be obtained with a moderate number of mesh elements. Different areas of materials as well as anisotropic, inhomogeneous, and non-linear quantities can be integrated. If no boundary conditions are declared, homogeneous/natural Neumann boundary conditions are implicitly given.

2D element

Figure 3.7: Finite elements requirements for the mesh.

Section 6.3 overviews the implementation details of this method in detail.

59 3.3.1 Basic Concepts for a Galerkin Method

In the following, the theoretical part of the finite element method is introduced which is a special Galerkin method [42, 43] based on the following construction of finite dimensional subspaces Vh: - Ω is triangulated into a consistent cell complex - Vh consists of piecewise polynomials - Vh has a finite basis ϕi with local support The main part of this section and the corresponding notation is based on the following works [43, 44]. Therefore Galerkins method can be briefly explained as a general technique for the construction of solution approximations not represented by infinite space basis functions ϕi ∈V but instead of a finite dimensional space Vh ⊂V of variational problems. The examples presented in this work are modeled by the following general form:

L(c)= f(x) (3.25) which is defined on arbitrary dimensional, single connected domains Ω with boundary ∂Ω. The PDE 3.25 is fulfilled by functions

2 c = c(x) ∈Vt = C (Ω) (3.26)

The notation L() being a spatial differential operator and f = f(x) ∈ C(Ω). Additionally, it is assumed that the domain Ω has a piecewise smooth boundary Γi ∈ Γ(∀i 6= j → Γi ∩ Γj = ∅, Γ = i Γi). Therewith a general form of boundary conditions can be specified by: S

Ai c + Bi∂nc = gi(c) on Γi (3.27) where Ai and Bi are matrices consisting of functions sufficiently smooth on Γi and gi is a vector of continuous linear functionals. ∂nu denotes the outward normal derivative. A weak formulation of the PDE 3.25 is obtained by using a test function v:

Find c ∈V : (v, L(c)) = (v,f) ∀v ∈ C1(Ω) (3.28)

The infinite-dimensional space V is then replaced by a sequence of finite-dimensional spaces Vh with N linear independent functions ϕi ∈V which span the space Vh, e.g.

N Vh = a : a(x)= siϕi(x) , si ∈ R (3.29) ( ) Xi=1 The basis should build a complete function system to guarantee the quality of the approximation for m → ∞. Any function vh ∈ Vh is uniquely determined by a finite number of degrees of freedom, e.g. function values.This results in the following discrete variational problem for conforming methods (Vh ⊂V):

Find ch ∈Vh : (vh, L(ch)) = (vh,f) ∀vh ∈Vh (3.30)

By selecting ϕi ∈ Vh, or in other words, using the ansatz function for the test function, the following system is obtained:

(ϕi, L(ch)) = (ϕi,f) for i = 1, .., N (3.31)

60 The solution variable ch is then also expanded in terms of the basis (ϕi)1≤i≤N of Vh.

N ch = ciϕi (3.32) Xi=1

A graphical representation of this approximation of the solution variable ch is given in Figure 3.8. f

ci

x

Figure 3.8: Geometrical interpretation of the basis of the expanded solution variable ch for finite elements for a one-dimensional cell complex.

By the given selection of the test function and the expansion of the solution variable, it can be observed [44], that the scalar product in Equation 3.31 transforms the differential operator L() into the operator

l : RN → RN (3.33)

The residuum is obtained by using the operator 3.33

Ri = l(c1, .., cN ) − (ϕi,f) (3.34)

The next step for the discrete approximation of a given problem is the subdivision of the domain Ω with respect to cell complex properties, see Section 2.3. The elements of the cell complex, the collection of cells, are then used as finite elements for the finite space Vh. The basis functions ϕi ∈ Vh are defined on i the global vertices τ0, the 0-cells, only and are therefore called basis nodal functions and can be expressed as:

i ϕi(τ0)= δik, i, k = 1, .., N (3.35)

Based on this subdivision of the finite element space, the functions c can be expressed in local cell terms:

Np

ce = ϕici (3.36) Xi=1 where i represents the local index of the element and Np the number of element vertices. Now, the operator FEM operator from Equation 3.33 can be defined locally:

le : R4 → R4 (3.37) and the local residuum is then defined by:

e Ri = l (c1, .., cN ) − (ϕi,f) (3.38)

61 To determine the operator 3.38 for a given cell type, the basic nodal functions have to be calculated, e.g. for a 3-simplex cell (tetrahedron): 1 ϕ (x)= ϕ (x ,x ,x )= (a x + b x + c x + d ) (3.39) k k 1 2 3 3V k 1 k 2 k 3 k where the coefficients ak, bk, ck, dk are functions of the vertex coordinates and V is the volume of the element. Next, the following integral has to be calculated:

Mpq = ϕp(x)ϕq(x) dΩ, p,q = 1, 2, 3, 4 (3.40) i Zτ3 The key for an efficient practical realization is, that global finite elements are defined as transformations i of a reference element in a normalized coordinate system. Each point of the cell (x1,x2,x3) ∈ τ3 can be i expressed as a bijective function onto a reference point (ξ,η,ζ) ∈ τ˜3:

1 2 1 3 1 4 1 x1 = x1 + (x1 − x1)ξ + (x1 − x1)η + (x1 − x1)ζ, (3.41) 1 2 1 3 1 4 1 x2 = x2 + (x2 − x2)ξ + (x2 − x2)η + (x2 − x21 )ζ, (3.42) 1 2 1 3 1 4 1 x3 = x1 + (x3 − x3)ξ + (x3 − x3)η + (x3 − x3)ζ (3.43)

i The basis nodal functions on the reference element τ˜3 are:

0 ϕ1(ξ,η,ζ) = 1 − ξ − η − ζ, (3.44) 0 ϕ2(ξ,η,ζ)= ξ, (3.45) 0 ϕ3(ξ,η,ζ)= η, (3.46) 0 ϕ4(ξ,η,ζ)= ζ. (3.47) (3.48)

The Equation 3.40 is then calculated by:

0 0 Mpq = det(J) ϕp(ξ,η,ζ)ϕq (ξ,η,ζ) dξdηdζ, p,q = 1, 2, 3, 4 (3.49) 0 Zτ˜3 where J is the Jacobian of the projection Equation 3.41. The final assembly process consists of calculating i stencil matrices Π(τn) for each element of the cell complex, mapping the local vertex indices to a global index, and then building the global matrix A For an electrostatic problem formualted by the Laplace equation:

−div (ε grad (ϕ))=0 (3.50) the stencil matrix is represented by:

i i Π(τ3)= Kpq(τ3)= grad (ϕp(x)) grad (ϕq(x)) dΩ, p,q = 1, 2, 3, 4 (3.51) i Zτ3 This problem is linear and the global matrix A is simply obtained by adding Π with the corresponding index transformation into the global matrix.

62 3.3.2 Analysis of the Finite Element Method

The introduced finite element method is analyzed with respect to the chain and cochain model of Sectin 2.6. Here a simple example, the Poisson equation, is used to discuss the method. As shown, the finite element methods requires the complete field equations to be formulated whereas the continous formulation consists of a reconstruction of the unknown quantities based on its discrete representation: −div (εgrad (ϕ))) = ̺ (3.52)

Based on this equation and the corresponding boundary conditions the following Galerkin expression is obtained: If the weighting functions are differentiable and the operator L(u) is simple, the domain integrals can be transformed by Green’s theorem. The requirements for the differentiability for the basis functions are therefore reduced, but information related to the weightings functions are lost. Therefore a broader class of the shape functions can be used and the resulting formulation is called a weak ormulation. Based on the choice of weighting functions of the boundary residuum with ¯i = −wi vanish both of the boundary integrals with the conormal derivation. Using the Galerkins approach of using the weighting functions equal to the ansatz functions wi = ni the following final expression is obtained: Starting from the local version of the balance equation and using the weighted residual formulation, where i the cell τn is used as the support of the weight function ni and Greens theorem:

i ni div (D) dτn = (3.53) i Zτn i i ni n · D d(∂τn) − grad (ni) · D dτn = (3.54) i i Z∂τn Zτn i ni ̺ dτn (3.55) i Zτn Equation 3.53 and Equation 3.55 represent the balance between the electric charge associated with the corresponding spread cell and the electric flux with the boundary of that spread cell. Using the divergence theorem, the following expresssion is obtained:

n · D d(∂V )= ̺ dV (3.56) i i Z∂(niτn) Zniτn A geometrical interpretation [20] can be obtained with the weighted domain as a chain composed by i infinitesimal cells with real multiplicity and therefore on each weighted cell niτn and the divergence theorem:

div (D)= ̺ (3.57) i i Zniτn Zniτn D = ̺ (3.58) i i Z∂(niτn) Zniτn Equation 3.58 constitutes the balance equations as written by FE. Finally, the following implicit definition, by comparing Equation 3.58 and Equation 3.54, reads:

D := niD − grad (ni) · D (3.59) i i i Z∂(niτn) Z∂τn Zτn As can be seen from the last expressions the boundary of a weighted domain is spread over the whole domain except when the weighting function is constant on its support, in which case the term containing grad (ni) on the right-hand side of Equation 3.59 vanished and no internal parts remain. The weight function w defines the countinous counterpart of a chain.

63 3.3.3 Further Analysis

Related to the projection and reconstruction concepts, the finite element uses a reconstruction of the field v ϕ based on the 0-cochain ϕ = {ci} with the basis functions. This can be represented as the action of a reconstruction operator Rv as follows:

ϕ = Rv(ϕv) (3.60)

Then the topological equation −grad (ϕ) = E is used for the reconstructed field to obtain the E field, where the constitutive relation D = fε(E) is applied to obtain D.

v v D = fε(grad (R (ϕ ))) (3.61)

Another anaylysis is accomplished by geometrically interpret the weight functions. Weight functions define the spread cells which contistitute a continous counterpart of the secondary mesh to which the cor- responding topological equations are applied. Different categories of methods for the solution process can be obtained with different choices of these functions. The subdomain method is based on a characteristic i function of the supported cell for the weight functions wi = 1 inside the cell τn and wi = 0 elsewhere. Based on the equation ∂V D = V ̺ and the fact that this formulation results in crisp cells, this method can be seen as a finite volume method. R R Compared to the generic discretization concepts, see Section 3.1, which enforces both topological equa- tions in discrete terms on crisp cells, the approach of the finite element methods differs in its appliance on topological equations in differential terms and the other one in integral terms on spread cells. A possibil- ity to reduce this difference [20] is obtained by a reformulation of the reconstruction of E and making it start from the Ve cochain, and then applying to it the corresponding topological equation in coboundary terms. In any case the projection of the reconstructed field is not performed, because of the presence of (secondary) spread cells, which, contrary to the case of crisp cells, do not require this step. Note that if the reconstruction is based on Ve, edge elements and not nodal interpolation must be used to obtain a physically sound reconstruction of E. This is always true in the case of magnetostatics problems, since the magnetic potential A is a 1-field and a correct reconstruction of it must start from the cochain ϕa and use (ordinary) 1-edge elements. The realization of this fact was one of the reasons that led to the introduction of Withney forms (edge elements). The important part of the integration by parts in the FE method can be thereby explained with the necessity to operate with the boundary of the FEM spread cells in order to express topological equations. The integration by parts, shown in Equations 3.62-3.64, can therewith be identified by the FV counterpart of the the generalized Stokes theorem.

div (D) dV = D · dA (3.62) i i Zniτ3 Z∂(niτ3)

curl (A) · dA = A · dL (3.63) i i Zniτ2 Z∂(niτ2)

grad (T) · dL = T dP (3.64) i i Zniτ1 Z∂(niτ1)

64 3.4 Finite Differences

The finite difference discretization scheme is one of the simplest form of discretization and does not easily include the topological nature of equations. A classical finite difference approach approximates the differential operators constituting the complete field equation locally. Therefore a structured grid is required where local field quantities are stored. For each of these points the differential operators appearing in the main problem specification are rendered in a discrete expression. The order of the differential operator of the original problem formulation directly dictates the number of nodes to be evaluated. In other words, differential operators in the original formulation are approximated by linear combinations of function values at the cell complex points. Here, the main drawback of finite differences can already be seen. The association of physical field values only to points cannot handle higher dimensional geometrical objects. Next, the n-point discretization given by the order of the original formulation does include the direct logically orthogonal neighbors only. The corresponding other neighbors are neglected. 2D stencil i,j+1 i−1,j

i,j i+1,j i,j−1

i,j,k

2D stencil

Figure 3.9: Finite difference requirements for the mesh.

The advantages of this method are, that it is easy to understand and to implement, at least for simple ma- terial relations. So, the finite difference method optimizes the approximation for the differential operator in the central node of the considered patch. Enhancements related to the usage of non-orthogonal grids and the low order of accuracy were developed but have not been proven successful. (taflove, 1998).

3.4.1 Basic Concepts

The derivatives of the partial differential equation are approximated by linear combinations of function values at the structured grid points. Arbitrary order approximations can be derived from the Taylor series expansion:

∞ (x − x )n ∂nu u(x)= i , u ∈ C∞ ([0, X]) (3.65) n! ∂xn n=0 i X   For a geometric interpretation the different equations from the given definition can be depicted as:

65 central difference u(x) backward difference ∂u u − u ≈ i+1 i forward difference ∂x ∆x   ∂u u − u ≈ i−1 ∂x ∆x   ∆x ∆x ∂u u − u ≈ i+1 i−1 ∂x 2∆x  

Figure 3.10: Different geometric interpretation of the first-order finite difference approximation related to forward, backward, and central difference approximation.

For a second order derivatives the central difference scheme can be used: ∂u (∆x)2 ∂2u (∆x)3 ∂3u T 1 : u = u +∆x + + (3.66) i+1 i ∂x 2 ∂x2 6 ∂x3  i  i  i ∂u (∆x)2 ∂2u (∆x)3 ∂3u T 2 : u = u − ∆x + − (3.67) i−1 i ∂x 2 ∂x2 6 ∂x3  i  i  i

∂2u u − 2u + u T 1+ T 2 ⇒ = i+1 i i−1 + O(∆x)2 (3.68) ∂x2 (∆x)2  i The accuracy of the finite difference approximations is given by: - forward difference: truncation error: O(∆x) - backwards difference: truncation error: O(∆x) - central difference: truncation error: O(∆x)2 Mixed derivatives can be approximated by, e.g. for two dimensions:

∂2u ∂ ∂u ∂ ∂u = = (3.69) ∂x∂y ∂x ∂y ∂y ∂x    

∂u − ∂u ∂2u ∂y ∂y = i+1,j i−1,j + O(∆x)2 ∂x∂y   2∆x  yj+1  i,j ∂u u − u yj = i+1,j+1 i+1,j−1 + O(∆y)2 ∂y 2∆y yj−1  i+1,j ∂u u − u = i−1,j+1 i−1,j−1 + O(∆y)2 ∂y 2∆y  i−1,j xi−1 xi xi+1 Figure 3.11: Approximation of two-dimensional mixed derivatives.

A second order finite difference approximation for two dimensions is given: ∂2u u − u − u + u = i+1,j+1 i+1,j−1 i−1,j+1 i−1,j−1 + O (∆x)2, (∆y)2 (3.70) ∂x∂y 4∆x∆y  i,j  66 x−1 x0 x1 x2z

Figure 3.12: Boundary treatment for the finite difference scheme. The necessary grid points are not available for all different types of approximation schemes.

Boundary treatment within the finite difference scheme:

∂u u − u = 1 0 + O(∆x) (3.71) ∂x ∆x  0 In this case with the boundary on the left side, the backward and central difference approximations would need u−1 which is not available. Therefore the boundary treatment for finite difference is more complex related to the other discretization schemes. This particular problem is getting more complex, if higher order discretization schemes are used because of the fact, that more grid points are used to evaluate the corresponding approximation.

∂u 2u + 3u − 6u + u = i+1 i i−1 i−2 + O(∆x)3 backward difference (3.72) ∂x 6∆x  i ∂u −u + 6u − 3u + 2u = i+2 i+1 i i−1 + O(∆x)3 forward difference (3.73) ∂x 6∆x  i ∂u −u + 8u − 8u + u = i+2 i+1 i−1 i−2 + O(∆x)4 central difference (3.74) ∂x 12∆x  i ∂2u −u + 16u + 30u − 16u + u = i+2 i+1 i i−1 i−2 + O(∆x)4 central difference (3.75) ∂x2 (12∆x)2  i

3.4.2 Analysis of the Finite Difference Method

One method of directly transferring the discretization concepts is the finite difference time domain method, here analyzed related to the time dependent Maxwell equations, which was first introduce by Yee [45]. It is one of the exceptional examples of engineering which anticipated great insights into discretization processes. The Section 10.3 gives a very efficient transformation of the here introduced concepts into an application by the GSSE. With this method, the partial spatial and time derivatives are replaced by a finite difference approximation. This system is solved by an explicit time evaluation. One of the main advantage of this method is, that no matrix operations or algebraic solution methods have to be used. The space domain is discretized by two dual orthogonal regular Cartesian grids based on cubes with spatial subdivisions of ∆x, ∆y, ∆z whereas the time domain is subdivided into intervals of ∆t. The original formulation was based on half-step staggered grids in space and time. The quantities from the second complex are denoted with ∆˜x, ∆˜y, ∆˜z, ∆t˜. It is important to highlight, that two different grids are

67 necessary due to the fact, that different quantities with different orientation reside on these two different grids, even if in this method, the secondary quantities coincide numerically with the primary ones. In a linear, homogeneous, isotropic material the Maxwell equations are given by: B = µH, D = εE, J = γE (3.76)

1 1 γ ∂ H = − curl (E) , ∂ curl (H) − E (3.77) t µ t ε ε div (H) = 0, div (E = 0) (3.78)

Projected onto a regular Cartesian structured grid, the following six coupled scalar equations are equivalent to the given Maxwell equations:

1 ∂ H = (∂ E − ∂ E ) (3.79) t x µ z y y z 1 ∂ H = (∂ E − ∂ E ) (3.80) t y µ x z z x 1 ∂ H = (∂ E − ∂ E ) (3.81) t z µ y x x y 1 ∂ E = (∂ H − ∂ H − γE ) (3.82) t x ε y z z y x 1 ∂ E = (∂ H − ∂ H − γE ) (3.83) t y ε z x x z y 1 ∂ E = (∂ H − ∂ H − γE ) (3.84) t z ε x y y x z These equations are the basic expressions for the finite difference time method. The divergence relations are implicitly fulfilled by this method.

The components of the electric and magnetic field E and H with their corresponding xi projection onto the coordinate axes, are the used variables. These variables and the local values of material properties are attached to the midpoint of the grid edges. The variable indexing scheme is also used consistently with [45] n Ex(i∆x, j ± 1/2∆y, k∆z,n∆t)= Ex|i,j±1/2,k (3.85) With this notation the following expression by using the central difference approximation are obtained:

n n Ez|i+1/2,j,k − Ez|i−1/2,j,k ∂ E (i∆x, j∆y, k∆z,n∆t)= + O(∆x)2 (3.86) x z ∆x n+1/2 n−1/2 Ez|i,j,k − Ez|i,j,k ∂ E (i∆x, j∆y, k∆z,n∆t)= + O(∆t)2 (3.87) t z ∆t

The FDTD time-stepping formulas for Hx are: n+1 n Hx|i+1/2,j,k = Hx|i+1/2,j,k+ (3.88) ∆t n+1/2 n+1/2 Ey|(i+1/2,j,k+1/2) − Ey|(i+1/2,j,k−1/2) − (3.89) ∆zµ|i+1/2,j,k   ∆t n+1/2 n+1/2 Ez|(i+1/2,j+1/2,k) − Ez|(i+1/2,j−1/2,k) (3.90) ∆yµ|i+1/2,j,k  

68 and for the time-stepping of Ex:

n+1/2 n−1/2 Ex|(i,j+1/2,k+1/2) = Ex|(i,j+1/2,k+1/2)+ (3.91) ˜ ∆t n n Hy|(i,j+1/2,k+1) − Hy|(i,j+1/2,k) − (3.92) ∆˜z ε|(i,j+1/2,k+1/2)   ˜ ∆t n n Hz|(i,j+1,k+1/2) − Hz|(i,j,k+1/2) (3.93) ∆˜y ε|(i,j+1/2,k+1/2)  

3.4.3 Further Analysis

Compared to the basic concepts for discretization, where only the discretization of constitutive relations should be improved, the finite difference method tries to increase the accuracy of the approximation of derivatives appearing int he partial differential equations [20]. As can be easily seen, these issue circum- vents greatly the generalization to different types of meshes and therefore reduces the applicability of this method greatly. As can be seen here, the FDTD method does not use global quantities. Instead only local nodal values of the corresponding vector values are used. Based on the problem formulation, it can be seen, that these local values are the projections of averaged field components onto 2-cells, and therefore are local representatives of the global quantities. With local constitutive relations where only the x part is given:

n n Bx|(i+1/2,j,k) = µ|(i+1/2,j,k)Hx|(i+1/2,j,k) (3.94) and

n+1/2 n+1/2 Dx|(i,j+1/2,k+1/2) = ε|(i,j+1/2,k+1/2)Ex|(i,j+1/2,k+1/2) (3.95) the following expressions are obtained for a global quantity approach:

n+1 n+1 ∆y∆zµ|(i+1/2,j,k)Hx|(i+1/2,j,k) =Φb,x|(i+1/2,j,k) (3.96) n+1/2 n+1/2 ∆y∆tEy|(i+1/2,j,k±1/2) =Φe,y|(i+1/2,j,k±1/2) (3.97) n+1/2 n+1/2 ∆z∆tEz|(i+1/2,j±1/2,k) =Φe,z|(i+1/2,j±1/2,k) (3.98) and

n+1/2 n+1/2 ∆˜y∆˜zε|(i,j+1/2,k+1/2)Ex|(i,j+1/2,k+1/2) =Ψd,x|(i,j+1/2,k+1/2) (3.99) ˜ n n ∆˜z∆tHz|(i,j+1,k+1) =Ψh,z|(i,j+1,k+1/2) (3.100)

The following conversions of Equation 3.96 and Equation 3.100 for the x-case, yield:

n 1 n Hx|(i+1/2,j,k) = Φb,x|(i+1/2,j,k) (3.101) ∆y∆zµ|(i+1/2,j,k) 1 Hn = Ψn (3.102) x|(i+1/2,j,k) ∆˜y∆t˜ h,x|(i+1/2,j,k) which can be expressed for Φ:

69 n n Φb,x|(i+1/2,j,k) Ψh,x|(i+1/2,j,k) = µ (3.103) ∆y∆z |(i+1/2,j,k) ∆˜x∆t˜

The corresponding conversions for Ψ yields:

n+1/2 n+1/2 Ψd,x|(i,j+1/2,k+1/2) Φe,x|(i,j+1/2,k+1/2) = ε (3.104) ∆˜y∆˜z |(i,j+1/2,k+1/2) ∆x∆t which can be interpreted as the constitutive link between Φb,x and Ψh,x and therefore this equation rep- resents a discrete constitutive equation of the simplest type, obtained by extending the local constitutive equations B = µH, and D = εE with the assumption of planarity, regularity, and orthogonality of the cells.

Evaluating the time stepping formulae for Hx becomes the following time stepping for Φb,x:

n+1 n Φb,x|(i+1/2,j,k) =Φb,x|(i+1/2,j,k)+ (3.105) n+1/2 n+1/2 Φe,y|(i+1/2,j,k+1/2) − Φe,y|(i+1/2,j,k−1/2)− (3.106) n+1/2 n+1/2 Φe,z|(i+1/2,j+1/2,k) +Φe,z|(i+1/2,j−1/2,k) (3.107) (3.108) and for the Ex and the corresponding Ψd,x part:

n+1/2 n−1/2 Ψd,x|(i,j+1/2,k+1/2) =Ψd,x|(i,j+1/2,k+1/2)− (3.109) n n Ψh,y|(i,j+1/2,k+1) +Ψh,y|(i,j+1/2,k)+ (3.110) n n Ψh,z|(i,j+1,k+1/2) +Ψh,z|(i,j,k+1/2) (3.111) (3.112)

The expressions can then be transformed into equations only depending on two global values: ∆x∆ Φn+1 =Φn + (3.113) b,x|(i+1/2,j,k) b,x|(i+1/2,j,k) ∆˜y∆˜z 1 n+1/2 1 n+1/2 ( Ψd,x|(i+1/2,j,k+1/2) − Ψd,x|(i+1/2,j,k−1/2)− (3.114) ε|(i+1/2,j,k+1/2) ε|(i+1/2,j,k−1/2) 1 n+1/2 1 n+1/2 Ψd,x|(i+1/2,j+1/2,k) + Ψd,x|(i+1/2,j−1/2,k)) (3.115) ε|(i+1/2,j+1/2,k) ε|(i+1/2,j−1/2,k)

For the special case of dominant systems, the following equations represent a dominant magnetic system, TM-mode: 1 ∂ H = (∂ E − ∂ E ) (3.116) t x µ z y y z 1 ∂ H = (∂ E − ∂ E ) (3.117) t y µ x z z x 1 ∂ E = (∂ H − ∂ H − γE ) (3.118) t z ε x y y x z

70 For a dominant electric system, the TE-mode is given by: 1 ∂ E = (∂ H − ∂ H − γE ) (3.119) t x ε y z z y x 1 ∂ E = (∂ H − ∂ H − γE ) (3.120) t y ε z x x z y 1 ∂ H = (∂ E − ∂ E ) (3.121) t z µ y x x y With this expression the TM mode at t = n∆t is discretized by:

1 ∂ H = (∂ E − ∂ E ) (3.122) t x µ z y y z 1 ∂ H = (∂ E − ∂ E ) (3.123) t y µ x z z x

∆t Hn+1/2 − Hn−1/2 = En − En (3.124) x|i,j,k x|i,j,k µ∆y z|i,j+1/2,k z|i,j−1/2,k ∆t   Hn+1/2 − Hn−1/2 = En − En (3.125) y|i,j,k y|i,j,k µ∆x z|i+1/2,j,k z|i−1/2,j,k   With this expressions the TM mode at t = (n + 1/2)∆t is discretized by:

1 ∂ E = (∂ H − ∂ H − γE ) (3.126) t z ε x y y x z

n+1/2 n+1/2 n+1/2 n+1/2 ∆t Hy|i+1/2,j,k − Hy|i−1/2,j,k Hx|i,j+1/2,k − Hx|i,j−1/2,k En+1 − En = − − γEn+1/2 z|i,j,k z|i,j,k ε  ∆x ∆y z|i,j,k   (3.127)

As can be seen, the Ez values for the full time steps and the Hx, Hy for half time steps are already available, and therewith the update for Hx, Hy can be done without any further calculation. On the n+1/2 contrary, Ez|i,j,k is not available and has to be approximated by: 1 En+1/2 = En+1 + En2 (3.128) z|i,j,k 2 z|i,j,k z|i,j,k   The following expression is finally obtained:

γ∆t ∆t n+1/2 n+1/2 n+1/2 n+1/2 1 − Hy|i+1/2,j,k − Hy|i−1/2,j,k Hx|i,j+1/2,k − Hx|i,j−1/2,k En+1 = 2ε En + ε − z|i,j,k γ∆t z|i,j,k γ∆t  ∆x ∆y  1+ 2ε 1+ 2ε  (3.129)

Based on the last equations, it can be seen that only the neighboring values Hx, Hy are used to evaluate the spatial derivation Ez. Also, only the neighboring elements Hx, Hy to Ez are used to calculate the spatial derivatives of Hx, Hy. Therefore, this method calculates both fields, E and H based on the curl expressions with special requirements on the given field. Therewith a broader class of problems can be calculated. Additionally this discretization can also be derived by the global integral discretization. This eases the evaluation of boundary conditions.

71 3.5 Summary

When the Equations 3.24 and 3.58 are compared it can be seen that FE writes balance equations that are very similar in spirit to the ones written by FV. FV writes them on crisp cells, whereas FE does it on spread cells. This issue can be a starting point for further research on the issue of quantity conservation in FE. Note that FV writes the balance equations in terms of global quantities, whereas FE, making use of spread cells with spread boundary, is forced to use field functions defined over the whole domain. Despite this difference, neither FV nor FE discretize the operator representing the local version of the balance equation. Instead they resort both to the global version of it, and with reason, since a topological equation applies directly to regions with finite extension. Figure 3.13 depicts the different approaches within FV and the FE method. The x-axes is divided into one-dimensional cells, marked with vertical lines. The FV method uses crisp cells which are accordingly aligned with dual complex of the given cells of the underlying cell complex. The FE method uses spread cells which do not build a consistent cell complex. Each cell, depicted by the ansatz function, spread over, in this example, two cells. From the point of view of the dual complex, which is not used by the FE method, the cells leave the area of a dual cell.

f f ci ci

x x

Figure 3.13: Geometrical interpretation of the ansatz functions for the (left) finite volume method and (right) for the finite element method. The FV method describes consistent crisp cells which can be interpreted as a cell complex, whereas the FE method uses spread cells, which do not conform with the properties of a consistent cell complex.

72 Part II

Practical Concepts

73 Chapter 4

Introduction

In many areas of scientific computing there is a manifold of software applications and tools available which provide methods and libraries for the solution of very specific problem classes. They are mostly specialized on a certain type of underlying mathematical model, resulting in a solution process which is highly predictable. However, it is important to note that such applications impose restrictions on possible solution methods which cannot be foreseen at the beginning of modern program development. But, as it can be observed from the last section, the development of highly reusable software components for this highly complex area demands different programming paradigms for an efficient realization. A software component is reusable if it can be used beyond its initial use within a single application or group of applications without modification. Initially only one- and two-dimensional data structures were used due to the limitations of computer re- sources. The imperative programming paradigm was sufficient for this type of task. With the improvement of computer hardware and the rise of object-oriented programming paradigm, the shift to more complex data models was possible. Finally, the generic programming paradigm has emerged and has eased the development of independent algorithms. Therewith more operations on each data structure were possible and the distance to the mathematical semantic was steadily reduced. The following chapter serves therefore as a catalyzer to separate the still theoretical concepts into appli- cable concepts necessary to reduce the parts of application to a consistent kernel. Therewith a consistent minimal base for application design can be specified without restricting the global applicability. Briefly, the following sections transforms the already introduced concepts to a concise and minimal bases related to the following issues: - The fiber bundle concept is used to introduce concepts of separation the structure of data structures and the structure of the stored data and the corresponding mechanism to guarantee the conservation of neighborhood. As presented in the next sections, not all currently used data structure concepts can be used to model this property appropriately. The main benefit of using this concept in the area of application design is a clean and consistent interface and therewith a set of orthogonal libraries. - The concept of mapping cell complex properties into algebraic structures by the concept of chains and projecting the physical fields by means of cochains into a dual algebraic space translates to practical concepts of generic data structures and generic quantity accessors. Generic traversal mechanism and the identification and unification of already developed iteration mechanism can be derived. - Next to these two transformations it is shown, that each task requires a different programming paradigm for transformation if an overall high performance is required. Therewith each concept can be transfered to the best suitable programming paradigm and even domain specific languages can be derived by these concepts. A great amount of software engineering reduction can therewith be obtained. - Additionally, it can be shown, that only a small set of standard algorithms for scientific computing is

74 required for different tasks such as discretization or interpolation. As this section introduces, the language of choice is going to be C++ with various reasons. It has to be mentioned, that up to now only C++ can transform these concepts into a code with an overall high performance and one of the shortest distances between the specification of concepts and their realization.

75 Chapter 5

Computational Concepts

5.1 Generic Data Model

For the practical realization of the transition of the given theoretical concepts into practical concepts, an effective data model is essential to the design and development of scientific software [46]. The concept of a data model, denoting a data structure with a set of operations on them, is exactly this type of concept and was first considered by F. Codd [47] by defining a relational scheme for structuring data and providing access and query operations for relational databases. So a data model is a primary consideration in the design of any scientific software system.

Definition 67 (Data Structure) A data structure is defined via its representation of data and a set of operations that are valid on that representation.

Definition 68 (Data Model) A data model is an abstract model that describes how data is represented and used and has three components: - The structural part: a collection of structured data - The integrity part: a collection of rules governing the constraints on these data structures - The manipulation part: a collection of operators which can be applied to these data structures

Data exchange and data access, whether it may be internally within an application via independent soft- ware components, or externally among applications, requires interfaces which explicitly communicate all properties of a given dataset. The more explicit a description of a certain dataset is through a certain interface, the wider is its coverage as it avoids implicit assumptions that are only known within a small set of applications or application components. Especially by the current worldwide development of shared and open source resources, the concept of data or even application interoperability is a growing demand and a necessity. A generic data model is therefore of utmost importance and required to handle the following parts of dealing with different data occurring in scientific computing: - Various types of used cell types and cell complex types. The spatial, or even spatial-temporal, discretiza- tion is an important part for the transition of given problems to a discrete calculation area. Various types of cells, such as vertices, simplices, hypercubes, or space-time objects, equipped with orientation and topological structure are required to represent a domain with physical quantities. The issues of domain partitioning and mesh generation are tasks within this part. - Different coordinate systems. Scientific computing can benefit greatly from the utilization of symme- tries, such as rotational or mirror symmetries. Order of magnitudes of simulation time can be saved by

76 using the symmetry of a given problem. Also, the issue of local charts, combined into an atlas, are tasks for this part. - Various basis functions and interpolation schemes are required. Discretization schemes require these different types. - Sparse and dense fields representations. The computational aspect of scientific computing require the efficient transformation of given problems into an application. Therefore the utilization of sparsity is an important fact to enable high performance computing by developing and using various data structures. - Various types of field values. The introduced different types of physical quantities are modeled best by maintaining their inherent orientation and data locality. Thereby a semantic language for describing data is necessary and is introduced in the next section based on the identification and separation properties of the fiber bundle concept. Applied to the concept of application design for scientific computing the following section introduced abstraction levels for each part of these tasks.

5.1.1 Abstraction Levels

The implementation transformation in this work of concepts for scientific computing into generic appli- cable concepts is based on an identification and separation of different levels of abstraction. Identifying each level within a complex context eases the development of reusable software components because each level can be implemented independently. Therefore four different concepts for application design and sci- entific visualization are identified which are required to describe scientific data (similar to the scheme in the Sophus library [48]). The different requirements for the interfaces between these levels are analyzed through their relationships as depicted in the simplex of Fig. 5.1. From the experience (and the evaluation in the related work section) single applications with proprietary I/O mixes the here given abstraction lev- els. Another important part of this analysis of abstraction levels is, to highlight the necessity for different programming paradigms, as presented in Section 5.2. Each level intrinsically requires a special transfor- mation for efficient means to apply it in application design. Therefore each link between these levels is classified by computational issues of scientific computing. The expressiveness of a data model can be measured by its coverage with respect to these abstraction levels. A model of higher abstraction level can always be imposed on a model of lower abstraction level. Also, the needs for various programming paradigms can be identified by the respective couplings among each of these abstraction layers. These levels of abstraction are introduced in more detail:

I.) The level of binary representation is the most bottom level of each implementation related design. Various specifications have to be introduced to guarantee an operational module. An important step of abstracting these levels is, that the high-level application must not depend on each of these assumptions within this low-level. For a modern application design approach, and the therewith correlated rapid development in computer hardware and computer systems, it is not even allowed for high performance calculation to depend on this layer. Therefore a low-level input/output layer has to take care of, e.g. IEEE float numbers such as float or double and their byte ordering.

II.) The semantic topological information of multiplicity and parallelism is required to state how many of atomic types are required to describe the domain of interest. For instance, how many atomic types, e.g., floats or doubles, are required to numerically describe coordinates given on a two- dimensional domain. This structural multiplicity information allows to build the “carrier” of ele- mentary, possibly parallelized, numerical operations.

77 Figure 5.1: Tetrahedron span by the fourabstraction levels of a data model and their interfaces, constituting a certain application scenario all together.

III.) Mathematical semantics deals with the identification of the mathematical purpose of a certain ob- ject. The separation of a tangential vector, a normal vector, or a coordinate location, which are all represented numerically by same number of atomic types is located within this level. The mathemat- ical properties for the given objects, as defined from their chart transition rules, are quite different. Other examples are objects implementing the semantics of differential geometry, such as ∂x vs. dx. IV.) The final level of physical semantics represents the physical meaning of a certain data set. A certain physical meaning has to be associated in order to determine its purpose, e.g. given a vector field and the corresponding role in a differential equation. The physical semantic is therefore related to instantiate given multiple scalar fields which related to to,e.g., temperature or pressure. While the mathematical semantics (level III) specifies which operations are possible on a certain field, the specification of which operations are meaningful on it (level IV) are mostly user-driven and on the appli- cation level. Currently no ontological scheme for physical quantities exists and a textual representation which is subject to human interpretation is used instead. A possible automatized approach where to utilize physical quantities or to refer to the governing equations that were used to create the respective data sets.

5.1.2 Fiber Bundle as Data Model

Using the introduced levels of abstraction, the requirements for a generic data model are deployed. One possible data model was therefore suggested by Butler and Pendley [24, 49] which identified an abstraction for an object-oriented scientific data model of considerable generality based on the mathematical concept of fiber bundles. This introduction was up to now related to scientific visualization which is not part of this work. Here, large-scale simulations can generate a large amount of numerical data. The analysis of this data has become an increasingly important research problem, and various overviews and related implementations are available [1, 8, 9, 46]. Butlers approach was originally limited to trivial bundles and implemented in Eiffel. Butlers approach was the first step towards a generic data model for scientific computing, but not without practical issues [1].

78 With this approach there was no concept of cell and complex topology (cells and skeletons) on a discretized manifold. The use of different types of p-cells is a fundamental requirement for arbitrary p-forms, e.g. the use of cells instead of vertices is important when interpolating data on an arbitrary location. Finally, a semantic language was obtained to fully cover the complete information of a discretized manifold [1] with the following structure:

Tensor Fields linear algebra Positions Field Values

fibration Base space Fiber space

B F B × F Product (xu) charts Manifold {{xy}} atlas Topological Space Tneighborhood information Point Set

Figure 5.2: Fiber bundle structure [1].

Compared to the object-oriented programming concept, which is based on a hierarchical abstraction of data and associated functions, the fiber bundle concept as a data model uses a shared abstraction of the corresponding scientific information. In this work, the theory of fiber bundles, and in more general fibrations are introduced with a broader area of application, but also gives examples and the importance in several areas despite scientific visualization. Also, fibrations are introduced to show, that the cell complex properties reside within an own space, the base space. Attached to this space, a fiber space can be introduced which can use different algebraic structures, e.g. vector space. The concept of fiber bundles is then later used to separate the arbitrary quantities from the cell complex and to house the algebraic chain and cochain concept in dual spaces related to the base space. One of the contributions of this work is that the concept of fiber bundles is not only used for visualization, but also for the following issues: - One of the most important properties of the fibrations used in the subsequent chapters is the usage as an organizing device. It separates the application modules into base space and fiber space modules. Base space modules are used for traversal within the cell complex and cell complex issues, e.g., derivation of cell and complex topology as well as an efficient data structure for storing the connection matrix. Fiber space modules deal with the access of data and the structure, orientation, and semantic information. - The unification of various concepts into a generic data model for scientific data and concepts for appli- cation design. - The fiber bundle concept can also be used to introduce a data model to house all different types of file

79 formats as well as it can explain the problems of currently used computer implementations. A common layer for a file format, data storage, data identification, and access mechanism can thereby be identified. - Supporting topological mechanism for chains and cochains and the corresponding operations for the corresponding spaces. - Another important contribution of the fibration approach is the uniform perspective for the introduced domain specific embedded language to implement a powerful mechanism of specifying non-trivial topo- logical data structures (see Section 8.1). This work also introduces concepts cell and complex topologies, where the concepts of a cell complex and skeletons are extended to an algebraic system called chain, see Section 2.6.1.

5.1.3 Fiber Bundle Data Hierarchy

A data model, e.g. fiber bundles, is necessary, but not sufficient to complete an exchange of data. A data model provides a languages for data description. Therefore, an application of the concept of fiber bundles was introduced [1] to model even the most bottom layer of a file format. The data structure of the fiber bundle data hierarchy is formed by a tree-like structure with cross references. The nature of fiber semantics is mapped to associative arrays which allow to map certain semantic objects to another hierarchy level. This data structure together with a set of related functions that are parameterized by these semantic objects Form the final fiber bundle data model. The model is represented by an acyclic graph of five levels with actual data sets only at the end nodes of the graph. The location of a data set throughout the path in the graph defines the semantics of the specific data set, allowing to naturally group data sets [into “index spaces”] which share common properties. These levels are

1. slice (grouping all data which belong to a certain point in the parameter space)

2. grid (grouping data which refer to a certain data source)

3. topology (grouping data which describe a - mostly topological - property of a grid instance)

4. representation (grouping all data which describe a topological property with respect to “something else”, e.g. a coordinate system or cell - vertex relationships)

5. field (actual data sets storing numerical information).

This model eases database-like queries such as which data exist for a certain parameter step or which vertices exist per cell of a certain mesh. Internal reverse representations also allow inverse queries, such as asking for which time steps or on which grids a certain field exists, or for the cells per vertex. From these five levels, only the grid and field levels are exposed to the application. Their identifiers carry semantic information beyond the pure mathematical description. The topology level corresponds to the sub-complex space of a cell complex and is used to identify the topological properties of a mesh. The representation level allows multiple coordinates of the same field to co-exist, such that a numerical scheme may choose the best one suitable for a certain problem. Not all data sets need to be stored explicitly, e.g. functional representations such as the vertex coordinates of a uniform grid may be built procedurally from geometrical and topological parameters only.

80 Mesh

Vertices

Cartesian chart 3D

Position: {x,y,z} cartesian coordinates

Connectivity

Position: {i1,j1, ..} triangles per vertex

Connectivity

[Vertices]

Position: {i0,j0,k0}

Figure 5.3: Fiber data model for a two-dimensional simplex mesh.

81 5.2 Evolution of Programming Paradigms

Several programming paradigms are currently available such as the - Imperative programming - Object-oriented programming - Functional programming - Generic programming Within this classes of paradigms, on important issue has to be highlighted: monomorphic and polymorphic programming. Whereas several new programming languages such as Python, Ruby, and other derivatives does not bother the user with the data types and therefore offer various automatically casted types, stati- cally typed programming languages such as C, C++, Java offer different strategies. By investigating the development and application of the object-oriented paradigm as it evolved from im- perative programming, the implications for the development of applications can be observed clearly. The object-oriented programming paradigm [50] has eased the software development for complex tasks greatly, due to the decomposition of problems into modular entities. Concerning software reuse it allowed the specification of class hierarchies using so called sub-typing polymorphism which was a major en- hancement for many different types of applications. But another important goal in the field of scientific computing, orthogonal libraries, can not be achieved easily by this paradigm due to the type of poly- morphism [51, 52]. Another drawback of this paradigm is the divergence of generality and specialization [53–55]. Implications to application development can be observed clearly by studying the evolution of the object- oriented paradigm from imperative programming. The object-oriented programming paradigm [50] has significantly eased the software development of complex tasks, due to the decomposition of problems into modular entities. It allowed the specification of class hierarchies with its virtual class polymorphism (subtyping polymorphism), which was a major enhancement for many different types of applications. But another important goal in the field of scientific computing, orthogonal libraries, cannot be achieved easily by this paradigm. A simple example for an orthogonal library is a software component, which is completely exchangeable, e.g. a sorting algorithm for different data structures [52]. An inherent property of this paradigm is the divergence of generality and specialization [53–55]. Hereby the object-oriented programming paradigm is pushed to its limit with the requirements in the field of scientific computing, due to interface specifications, performance issues, and orthogonality. Even though the trend of combining algorithms and data structures is able to provide generalized access to the data structures through objects, it is observable that the interfaces of these objects become more complex as more functionality is added. The intended generality often results in inefficiency of the programs, due to virtual function calls which have to be evaluated at run-time. Compiler optimizations such as inlining or loop-unrolling cannot be used efficiently, if at all. A lot of research was carried out to circumvent these issues [56], but major problems arise in the details [57]. Modern paradigms, such as the generic programming paradigm (GP [26, 58]), have the same major goals as object oriented programming, like re-usability and orthogonality. However, the problem is tackled from a different point of view [59]. Together with meta-programming (MP) [60], generic programming accomplishes both a general solution for most application scenarios and highly specialized code parts for minor scenarios without sacrificing performance [61–63] due to partial specialization. The C++ language supports this paradigm with another type of polymorphism which is realized with template programming [64], the parametric polymorphism. Combining this type of polymorphism with meta-programming, the compiler can generate highly specialized code without adversely affecting or- thogonality. This allows the programmer to focus on libraries which provide concise interfaces with an emphasis on orthogonality, as can be found in the STL[27] or the BGL[52].

82 Although, Java has gained more functionality with respect to a multi-paradigm approach [65, 66] the performance up to now cannot be compared to the C++ run-time performance [7]. Another comparison of several languages can be found here [67]. In the following a simple problem is used to demonstrate the discussed issues of several programming paradigms with a simple inner product

v = xi · yi (5.1) Xi The first implemented presents an implementation of the given problem in a fully object-oriented language, Java.

Object-oriented implementation in Java public class WaferVector { public int dimension = 0; public double[] value; public WaferVector(int i) { // .... } public double getValue(int i) { return this.value[i]; } public double inner product (WaferVector vec) { double val = 0; for(int i=0; i < this .dimension; i++) tempval += this.getValue(i) ∗ vec.getValue(i ); return val; } } public class WaferVectorTest extends WaferVector { public WaferVectorTest(int i) { super(i ); } public static void main(String[] agrs) { WaferVectorTest wvt1 = new WaferVectorTest(3); WaferVectorTest wvt2 = new WaferVectorTest(3); System.out. println(”output:”+ wvt1.inner product (wvt2)); } }

As can be observed, the connection between the implementation of the algorithm and the corresponding class WaferVector is very close. All classes and methods which want to have access to this algorithm have to interface this class structure. Otherwise, this whole class has to be transfered to an interface class, where all other class which want to use this algorithm have to be derived. In either way, this approach is intrusive, which means, that the application of this algorithm requires an additional modification of existing code. The object-oriented modeling, as presented, is strictly narrowed down for reusability within the class hierarchy. Despite the different aids for crossing this border the inherent restriction of this example shows, that pure object-orientation does not led to fully reusable components. Functional programming (FP) [68] eases the specification of equations and offers extendable expressions while retaining the functional dependence of formulae by higher order functions. An implementation of the given problem is presented in the following source snippet with Haskell as host language.

83 Functional programming in Haskell inner product :: [[Integer]] −> Integer inner product = foldr (+) 0 . map (foldr ( ∗ ) 1) . transpose transpose:: [[a]] −> [[a]] transpose[] =[] transpose([]:xss) = transpose xss transpose((x:) : xss) = (x : [h | (h:t) <− xss]) : transpose (xs : [t | (h:t) <− xss ]) val = inner product x

As can be seen, this representation of the algorithm is complete generic for all different types of input se- quences which can be added, folded, and transposed. The drawback of this mechanism is, that it is already detached from the conventional programming style of C, Java, C++, or C#. The functional modeling is fully reusable because no assumption about the used data types are made. Fully functional programming languages such as Haskel or ML, are complete polymorphic concept based programming languages. The problem remains twofold. The compiler has to do a lot of optimization work to reach the excellent per- formance of other programming languages such as Fortran or C++ [69]. On the other side are some tasks inherently not functional, such as input/output. These part of an application is always state driven due to the fact, that that sources, e.g. harddisk or memory, is always state based. Functional programming is functional and does not support states from the beginning. The generic programming paradigm establishes homogeneous interfaces between algorithms and data structures without sub-typing polymorphism by an abstract access mechanism, the iterator concept. This concept specifies an abstract mechanism for accessing data structures in a concise way.

Generic programming in C++ template NumericT inner product(InputIterator1 start1 , InputIterator1 end1, InputIterator2 start2 , NumericT startval) const { for (; start1 != end1; ++start1 , ++start2) startval += ( (∗ start1) ∗ (∗ start2 ); return startval; } vector type vec1, vec2; vector type :: numeric t val; val = inner product(vec1.begin(), vec1.end(), vec2.begin() , vector type :: numeric t(0));

This algorithm can be used with any data structure which offers an iterator access where the most simple implementation of an iterator is a simple pointer to the data storage of a data structure. Based on this indirection of the iterator concept the overall implementation effort can be greatly reduced. Problems arise

84 in details with the requirements for the iterators, which are discussed in Section 6.1. But nevertheless, an overall high performance can be easily obtained and additionally this approach is complete orthogonal. The generic modeling with the corresponding programming paradigm supports both, the derivation and state model mechanisms of the object-oriented modeling with the fully polymorphic support of the func- tional programming. In other words, the generic paradigm does support the state driven input/output with iterator or traversal concepts as well as the functional specification mechanism of functional programming languages. It can therewith be seen as the connector between the data sources and the specification. Not only can the implementation effort be reduced by order of magnitudes. It can also be used to implement high performance run-time code.

5.3 Library-Centric Software Design

Libraries have become indeed a central part of all major programming efforts connected to scientific and engineering areas. An optimal library design can be specified by: - Re-usability: applications should reuse source code as much as possible - Orthogonality: each part can interact with all other libraries - Performance: a maximum of performance with zero abstraction penalty should be achieved[70] The current trend, however, is to combine several programming languages, resulting in multi-language applications [71]. Different languages are utilized, each within the field where they perform best. Lan- guages such as Python [71] are used to combine different modules. But problems of interface specification and implementation arise with the combination of several programming languages further complicating matters. Next, the handling of different languages on different platforms is quite more difficult. In the area of TCAD or more generally in the field of scientific computing the performance aspects should be handled orthogonally to the development of applications. Optimizations can thereby be treated sepa- rately. With the multi-language approach performance aspects can not be considered orthogonally because of the use of compiled modules which require an interface layer in order to build applications. However, in the field of scientific computing the re-usability of libraries is often extremely limited with respect to the following issues. - Numerical data types. There are numerous well known numerical data types which contain, e.g., matri- ces or complex numbers. These data types are often optimized for a special application in order to yield high performance. Only with generic interfaces these performance enhancing measures can be used in different kinds of applications. - Topologies. Practical numerical schemes often require different underlying topological data structures. While some applications perform well using structured grids, other applications require unstructured meshes with varying local feature sizes. Although the nature of these topologies is totally different, standardized interfaces for all topological data types have to be provided. - Different dimensions. Special symmetries that are encountered in many problems of scientific comput- ing can be used to reduce the effective dimension of a calculation. Even though all problems can be treated in three dimensions the enormous gain in performance by using lower dimensional data struc- tures can not be neglected. In some cases of sophisticated models a three-dimensional treatment is still impossible due to the enormous consumption of resources. - Equation system assembly. Most of the solver mechanisms require an initialization of the values of their own interfaces. An interface is required which abstracts these specialties and makes the solvers acces- sible in a general manner. With this interface one can formulate the governing equations independently of the actual data structures of the solver. - Solution mechanisms for large equation systems. Many problems result in large equation systems which

85 have to be solved simultaneously. There are solvers available for various special cases which perform well under certain circumstances but fail to converge in general. Therefore, interface design has to guarantee that different solver mechanisms can be used. To handle these different issues we have separated the data structures in the field of TCAD from all different types of algorithms. Investigations concerning the developed applications at our institute have shown, that the topological structures can be abstracted and generalized into a generic topology library (GTL) which is presented in more detail in Section 8.1. This new approach of a generalized topology enables a new functional description for the discretization of different partial differential equations (PDEs) without sacrificing any performance. This is accomplished in our generic discretization library (GDL) which fulfills the requirements for scientific computing, - pecially TCAD [72–74]. The GFL is presented in Section 8.2. The connection between these layers is established by the cursor concept which is the enhancement of the iterator design pattern [75, 76]. One step further can be gone with the encapsulation of simple models into libraries, such as the BGL, STL, Boost. Libraries have become indeed a central part of all major programming efforts connected to scientific and engineering areas. An optimal library design can be specified by:

• Re-usability: applications should reuse source code as much as possible [77]

• Orthogonality: each library part can interact with all other parts of different libraries or self- developed code [64]

• Performance: a maximum of performance with zero abstraction penalty should be achieved [63, 70]

5.3.1 Multi-Paradigm Approach

The non-trivial and highly complex scenario of scientific computing fits the combination of different pro- gramming paradigms exceptionally well. The generic programming paradigm establishes homogeneous interfaces between algorithms and data structures without sub-typing polymorphism by the iterator/cursor mechanism. Functional programming (FP [68]) eases the specification of equations and offers extendable expressions while retaining the functional dependence of formulae by higher order functions. To analyze the shift of paradigms and the advantages of using different paradigms we review the develop- ment of applications at our institute. The used programming paradigms are stated as well to highlight the development and achievements enabled by the different paradigms. The features of meta-programming offer the embedding of domain-specific terms and mechanisms directly into the language, as well as compile-time algorithms to obtain optimal run-time. To analyze the shift of paradigms and the advantages of using different paradigms, we review the development of applications at our institute. The program- ming paradigms used are stated as well, to highlight the development and achievements enabled by the different paradigms:

5.3.2 Analysis of Programming Paradigms

To analyze the shift of paradigms and the advantages of using different paradigms, we review the develop- ment of applications at the institute. For further information see Section 9. The programming paradigms used are stated as well, to highlight the development and achievements enabled by the different paradigms:

86 Name Year Paradigm Information MINIMOS 1980 imperative [78] S*AP 1989 imperative [79] MINIMOS-NT 1996 imperative, OO [15] ELSA 1999 OO [13] WSS 2000 OO [12] VGML 2005 OO, GP [80] GSSE 2006 OO, GP, FP, MP [74]

The shift from imperative programming to a large variety of different programming paradigms, such as generic programming, functional programming, and meta-programming, results in an enormous simplifi- cation of code. This development can be estimated using the total lines of code which is necessary to write applications [63]. The features of meta-programming offer the embedding of domain-specific terms and mechanisms directly into the language, as well as compile-time algorithms to obtain optimal run-time.

1980 100.000 lines of code imperative Fortran 1990 300.000 lines of code imperative C, Fortran 2000 600.000 lines of code imperative, OO C, C++ 2006 20.000 lines of code OO, GP, FP, MP C++

By using generic components, the application design is reduced to specify the important parts only. All other source code is hidden in the components. The drastic reduction of source lines arises from the fact, that all complex data structure relevant code is now concentrated in one library, a generic topology library (GTL, [81]). The mathematical formalism is covered by a domain specific domain language. The non-trivial and highly complex scenario of scientific computing fits exceptionally well to the com- bination of different programming paradigms with respect to their advantages. The features of meta- programming offer the embedding of domain specific terms and mechanisms directly into the language as well as compile-time algorithms to obtain optimal run-time. The generic programming paradigm estab- lishes homogeneous interfaces between algorithms and data structures without sub-typing polymorphism. Functional programming (FP [68]) eases the specification of equations and offers extend able expressions while retaining the functional dependence of formulae by higher order functions. Library-centric software design in the field of scientific computing has only recently become feasible, due to the rise of new programming paradigms which offer generality and specialization together.

87 Chapter 6

Programming Concepts

This section provides an introduction of a generic data structure if further developed and also reviewed against already existing concepts. The fiber bundle approach is compared with the cursor/property map concept. Additionally several issues of the currently used iterator categorization are briefly investigated. Concepts for a domain specific embedded language are given to model the various parts of scientific computing. The corresponding multi-paradigm approach is introduced to accomplish this language. The following principles for writing successfully code can be summarized: - the code should be partitioned into functions - every function should be most 20 lines of code - functions should not depend on the global status but only on the arguments - every function is either general or application specific, where a general function is useful to other appli- cations - every function that could be made general should be made general - the interface to every function should be documented - the global state should be documented by describing both semantics of individual variables and the global invariants Based on these principles it can be observed that more than 90 percent of the GSSE code consist of general functions. Decomposing an application into a collection of general purpose algorithms and data structures make it more robust. The extension capability of the GSSE goes far beyond the currently implemented parts. Due to the parametrization of all modules and even all small parts of GSSE, it can be used in the future with data structures and numerical data types which are not developed until now. For example, the solver interface and the quantity structure of the GSSE are parametrized by their numerical data type, e.g., double. If the process of using double double or even a new standard of numerical data types, the implementation of the GSSE does not change at all. Not only is the GSSE parametrized by the numeric data type but also data structures, topological structures, solver interfaces, file interfaces and so on.

88 6.1 Generic Data Structures

This section can be seen as a part of the transformation of the theoretical concepts introduced in Section 2.6 and 2.4 into applicable concepts, in this case of a specification of generic data structures. The specification is based on topological properties only and separates therewith the actual data storage structure from the stored data. In terms of the introduced fiber bundle theory, generic data structures can be seen as modeling the base space. From another point of view, these concepts specified the properties of chains and can therewith be seen as the translation of the concepts introduced in Section 2.8. Section 6.2 uses this separation and extends it with the concept of the index space. The first step towards a generic data structure is the integration of the various already existing data struc- tures. Therefore, Figure 6.1 presents the structure of a single linked list. In terms of topological concepts, only locally neighboring cells can be traversed by this data structure, which can be seen by the represen- tation of the complex topology by the poset on the right side. The number of elements in the meta-cell subsets is bounded by a constant number, unique for each type of data structure in the 0-cell complex regime. The term meta-cell is used to describe various subsets with a common name. The single linked list uses two elements in the meta-cell, later stated as local(2) and called forward concept in the STL.

Figure 6.1: Complex topology of a single linked list.

The same mechanisms as for the single linked list are used to describe the double linked list. Compared to the single linked list, three elements in the meta-cell are used, local(3) as can be seen in Figure 6.2. The concept of bidirectional traversal is used in the STL for this type of data structure. A binary tree data structure can also be modeled by this type of complex topology.

Figure 6.2: Complex topology of a double linked list.

Next, the complex topology of a tree structure is depicted in Figure 6.3, whereas the term local(4) is used for the characterization of the meta-cell. As can be seen the single linked list, the doubly linked list, and the tree contain a similar structure and differ only in the number of elements in their respective meta-cells. The last data structure related to commonly used structures within the 0-cell complex is based on the concept of an array, where the complex topology is shown in Figure 6.4. As difference related to the already reviewed data structures can be observed, that the poset is build upon only two layers: the 0-cell sets and the complex set. This type of data structure therefore uses a so-called global property. All cells are directly accessible from the complex. In terms of the STL, this is called random access property.

89 Figure 6.3: Complex topology of a tree.

Figure 6.4: Complex topology of an array.

The next example uses a cell dimension of one and therewith a cell complex with dimension one. Figure 6.5 introduces the complex topology of this type of complex, commonly known as graph. Vertices incident to an edge as well as adjacent edges can be traversed.

Figure 6.5: Complex topology of a graph.

These two types of complexes, the zero and one dimensional complexes can now be abstracted into higher dimensional complexes with the two different types of complex topologies: the local (Figure 8.7) and the global cell complex (Figure 8.8). Based on the cell complex types the following cell types are available: - 0-cell: vertex - 1-cell: edge - 2-cell: triangles, quadrilaterals - 3-cell: tetrahedra, cubes - 4-cell: hyper-tetrahedra, tesseract

The most important property of a CW-complex can be explained by the usage of different n-cells and the consistent way of attaching sub-dimensional cells to the n-cells. This fact is covered by the mapping function. From now, an abbreviation to specify the CW-complex with its dimensionality, e.g., a 1-cell complex describes a one-dimensional CW-complex is used. An illustration of this type of cell complex is given in Figure 6.15.

90 Figure 6.6: Local cell complex (left) and a cell representation (right). Vertices are marked with black circles.

Figure 6.7: Global cell complex (left) and a cell representation (right).

Figure 6.8: Representation of a 1-cell complex with cells (edges, C) and vertices (V).

So far all mechanisms to handle the underlying topological space of data structures were introduced. Each subspace can be uniquely characterized by its dimension. All these subspaces are connected appropriately by the concept of a finite CW-cell complex. The first part of the data structure characterization is listed in Table 6.1.

data structure cell dimension array 0 single linked list (SLL)/ stream 0 double linked list (DLL) / binary tree 0 arbitrary tree 0 graph 1 grid,mesh 2,3,4,..

Table 6.1: Classification scheme based on the dimension of cells.

A common interface for characterizing data structures by their dimension is now possible. A drawback of this characterization can be easily seen with the 0-cell complexes. The topological space and subspaces only guarantee the correct connection of its subsets, but one cannot derive a traversal mechanisms based on these concepts. Therefore, we use concepts of order theory to formalize the internal structure of our

91 cells and the structure of our cell complex. Incidence and adjacency traversal can be directly derived from this concept. The characterization given in Table 6.1 is extended with the cell topology in Table 6.2. As can be seen the zero dimensional cell complex types do not differ at all in this part of classification. From dimension greater than one a distinction can be achieved by specifying the cell topology.

data structure cell dimension cell topology array 0 simplex SLL/stream 0 simplex DLL/binary tree 0 simplex arbitrary tree 0 simplex graph 1 simplex grid 2,3,4.. cuboid mesh 2,3,4.. simplex

Table 6.2: Classification scheme based on the dimension of cells and the cell topology.

Based on this formal introduction a formal concise definition in Table 6.3 is derived. The complex topol- ogy uses the number of elements of the corresponding meta-cells.

data structure cell dimension cell topology complex topology array/vector 0 simplex global SLL/stream 0 simplex local(2) DLL/binary tree 0 simplex local(3) arbitrary tree 0 simplex local(4) graph 1 simplex local grid 2,3,4,.. cuboid global mesh 2,3,4,.. simplex local

Table 6.3: Classification scheme based on the dimension of cells, the cell topology, and the complex topology.

92 6.2 Separation of Base and Fiber Space

As introduced in the section related to programming paradigms, the generic programming and active library programming is one of the most recent developed paradigms. This section focuses on recent developments within the C++ programming language and the correspond- ing, not to C++ limited, generic programming paradigm. With the advent of the C++ STL, the separation of access to data structures and algorithms by the means of iterators has become one of the key elements for the generic programming paradigm. Thereby the implementation complexity of algorithms and data structures is significantly reduced. Unfortunately it combines traversal and data access. An alternative approach was suggested [76] by separating the traversal and access into a cursor and property map concept. To give an example of this mechanism the following source snippet is presented:

Cursor and Property Map Concept val = pmap(∗ cursor ); pmap(∗ cursor ,val ); ++cursor ;

But up to now this concept is not used widely. The problems with the old iterator concept are, e.g., the well-known example of the std::vector which can actually be modeled by a random access container [76] due to the container data structure. Due to the used classification of iterators and in this particular case, the associated constant return value property, this container cannot be used with respect to this classification type. The proposed new iterator concept [82] and another version [83] with different focus on these problems, were accepted into the TR1 of C++. Based on the formal classification introduced in Section 6.1 the combination of traversal and data access is analyzed in more detail. The following list overviews the basic iterator traits [27]:

iterator category specification property input/output data property forward local(2) topological property bidirectional local(3) topological property random access global topological property

As we have seen, there is a unique and distinguishable definition possible for all of these data structural properties. On the one hand side, the backward and forward compatibility of the new iterator categories are a major problem [84] and are not included in the new C++0x standard [85]. Problems are encountered, if the iterator categories [27, 85] are integrated into the topological specifica- tion. The transition from the old iterator categories to the new categories are also a point of investiga- tion[84].

iterator category specification property incrementable local(2) topological property single pass data property forward local(2) data and topological property

The combinatorial property of the underlying space of these three categories is the same: a 0-cell complex with a local topological structure such as:

93 data structure dimension cell type complex topology single linked list / stream 0 simplex local(2)

Only the incrementable property can be described by a topological property, whereas the other two cat- egories are data dependent. The former iterator properties [27] only use two different categories which specify the behavior of data, namely the input and output property. Based on the generic data structure specification and the introduced fiber bundle concepts, the following overall classification can be obtained:

total space data structure base space index space [24], cell complex [1] fiber space attached data

The decomposition of the base space is modeled by the formal introduction to data structures. As an example Figure 6.9 depicts an array data structure based on the concept of a fiber bundle. Attached to each cell, marked with a dot in the figure, a simple fiber space is given, which conserves the neighborhood of our base space and carries the data of our array.

Figure 6.9: A fiber bundle over a 0-cell complex.

Organizing data, which represent a total space, by the concept of a fiber bundle means to separate prop- erties of the base space (which usually is given by a manifold, but more general topological spaces are possible as well) from the properties of the fiber space (e.g. the tangential space containing vector fields). Both properties can be implemented independently: for instance, the same mechanisms can be used to access a vector field given on a tetrahedral mesh, a structured grid or on a particle set because the fiber space is identical. On the other hand, the base space is identical for different scalars, vectors, tensors etc. data given on the same tetrahedral grid. Another example is given in Figure 6.10 where the base space is modeled by a two-dimensional triangular cell complex. By the concept of fiber bundles the base space is separated from the fiber space. For the base space of lower dimensional data structures, such as an array or single linked list, the only relevant information is the number of elements determined by the index space. Therefore most of the data structures do not separate these two spaces. For backward compatibility with common data structures the concept of an index space depth is introduced. As can be seen in Figure 6.11 common data structures can be modeled with an index space depth of zero, such as arrays and lists where the content is directly stored at the given location, e.g. an address. An index space depth of zero means, that the index (later the value from the dereferenced traversal object) can be used directly as the corresponding value. An index space depth of one or larger means, that the index value returned has to be used in another container, e.g. a fiber. In other words, the stored content

94 Figure 6.10: A fiber bundle over a 2-cell complex. is only an indirection, e.g. an address of a memory location, which has to be dereferenced to obtain the content. The cursor and property map concept always model an index space depth of one.

Figure 6.11: Illustration of an index space depth of zero (left) and one (right). The base space is modeled by a 0-cell complex.

The advantages of this approach are similar to the cursor and property map [76] but differ in several details. The similarity is that both properties can be implemented independently. But the fiber bundle approach equips the fiber space with more structure, e.g., storing more than one value corresponding to the traversal position as well as preservation of neighborhoods. This feature is especially useful in the area of scientific computing, where different data sets have to be managed, e.g., multiple scalar or vector values on vertices, faces, or cells. Based on the structure of the fiber space, vertical and horizontal subsets can be extracted. The horizontal parts are called slices. Another important property of the fiber bundle approach is that an equal (isomorphic) base space can be exchanged with another cell complex of the same dimension. The concept of an index space depth can be used which allows backward compatibility of all data structures in the same context as higher dimensional data structures, such as grids and meshes. The following table summarizes the common features and the differences.

cursor and property map fiber bundles isomorphic base space no yes traversal possibilities STL iteration cell complex traversal base space yes yes traversal fiber space no yes data access single data topological space fiber space slices no yes backward compatibility no yes

95 6.3 Standard Algorithms for Scientific Computing

The natural piece for software reuse in the field of scientific computing is a given algorithm or a small set of coupled algorithms in combination with additional data structures requirements, e.g. performance constraints. Different applications then combine these small set of algorithms in various ways to achieve their specific results. During the last decades, several software projects were developed, at the institute and from other research groups. As the hereby gained experience has shown, reuse of single algorithms normally is quite hard. Even complete algorithms, e.g. the finite volume discretization, could be shared by different applications with minor adaptations. There are a number of software packages for different types of PDE solutions, but conventional packages are not organized in a way that would allow algorithmic re usability immediately. Different works have contributed small parts to the field of scientific computing, but up to now, no small set of algorithms suitable for scientific computing was developed. Some works have developed parts, e.g. generic grid components [18], but the other parts were always left out due to the various requirements in scientific computing. Normally, implementations of algorithms make certain assumptions on the way the data they require is represented. But the spirit of the generic programming and the transformation into the STL has shown, that highly reusable data structures and algorithms [26, 27] can be developed and implemented, even with an overall high performance comparable to handwritten and hand-optimized implementations. But up to now, this transformation and development stopped due to the complexity of the field of scientific computing. Normally, the more complex a data structure is, the less stable are the assumptions and the lower probability that they hold across the boundary of the current context. Therefore different types of problems occur, whenever algorithms are to be reused outside the currently used context or if the context changes, e.g. a higher dimension simulation. The GRAL [18] was a major contribution with an in-depth analysis of this problem related to algorithms operating on computational grids. The work analyzed the relationship between data and algorithms based on topological and combinatorial concepts. A small set of kernel components were developed which greatly eases the specification of algorithms for grid applications. A contribution of this work is the presentation of several small algorithms suitable for various tasks within scientific computing. Special attention was spent to specify and develop concepts and modules which can be used in an orthogonal way. As Section 5.2 has introduced means for library centric application design. The derived concept which has a major impact for module development is a mathematical embedded specification language, a DSEL. The following section presents a major contribution of this work related to these outstanding issues with a small collection of algorithms. The overviewd discretization schemes, see Section 3, require different traversal schemes as well as dif- ferent locations for their quantity storage. As introduced in Section 5.1, a generic data model supported by the concept of fiber bundles is used, to separate the implementation parts into to distinct domains, the base space with the corresponding cell complex and traversal concepts, and the fiber space with the space structure and quantity storage mechanisms. These concepts were used to derive a complete set of standard algorithms An example is the development of applications which implemented the given discretization schemes. Other areas, such as interpolation and dimensional and topological neutral access to data structures and quantity storages are also given.

96 6.3.1 Generic Traversal

One of the most fundamental standard algorithm is related to traversal. The pioneering work of generic programming and the corresponding implementation of the C++ STL [26, 27] has introduced the important concept of the separation of data structures and algorithms, similar to the separation of base space and fiber space within the fiber bundle concept, see Section 2.4. The generic programming uses a hierarchy of iterators, specified by a mix of traversal and data access [76]. Topological functionality is mostly needed to fulfill the requirements of discretization schemes. All these schemes need a set of neighboring elements based on the topological property of incidence. Section 2.8 introduced concepts for topological data structures. The corresponding connection matrix, e.g. for a 3-simplex cell complex, can be depicted as:

P0 P2 P4 P6 n−cells C0 C2 C4

C1 C3 C5 5 1 1 1 P1 P3 P5 P7 4 1 1 −1

3 1 1 1 2 1 1 −1 1 1 1 1

0 1 1 −1

0 1 2 3 4 5 6 7 0−cells

i Figure 6.12: Connection matrix Cj for a 3-simplex cell complex.

Here it can be seen, that the connection matrix uses the orientation of the n-cells, and is sparse. Based on this connection matrix, the following traversal concepts are available: - Unique hull property, lattice property (required) - Homological boundary operator - Indicende relation (from left to bottom) - Adjacence relation (from bottom to left) Additionally, the generic traversal concept uses the separation property of the fiber bundle. Based on this separation a generic traversal mechanism, which uses the concepts of the cell and complex topology, and the specified connection matrix, is given by the following concept:

traverse traversal_object

The developed GTL, see Section 8.1, implements this generic traversal mechanism, which can use virtu- ally any derived internal structure of a cell or of the whole cell complex to build the requested traversion. The following source snippet uses a given cell type and derives the boundary of this cell for traversal: gsse :: traverse

An example of this cell type is a 3-cuboid cell, and generates the 3 − 1-cell traversal operation. In other words, this traversal operation generates a facet on cell iterator. Simple cell traversal can be specified by: gsse :: traverse or, shorter, by using the partial specialization mechanism of C++ gsse :: traverse

97 6.3.2 Quantity Accessors

For the fiber space, various quantity accessors are required which complies with the physical dimension and orientation.

access / pcochain

In the following code snippet a simple example of the generic use of a data accessor similar to the property map concept is given, where a scalar value is assigned to each vertex. The data accessor is extracted from the container to enable a functional access mechanism with a key value which can be modeled by an integer or string. Data Accessor d a type da(container , key d ) ; f o r each ( container.vertex begin(), container.vertex end(), da = 1.0 );

6.3.3 Generic Calculation

The introduced discretization schemes, as well as different other mechanism, require standard arithmetic operations. Therefore a generic calculate is necessary which can be specified with the corresponding arithmetic operation. It is necessary that this generic calculation automatically deduces the neutral element of the given operator for initialization.

calculate

The GFL, given in Section 8.2, implements comprehensive functor mechanism, using them as foundation, the following three algebraic generic components were developed, which are sufficient for all discrete representations of the operations of integration and differentiation. gsse :: calculate [ ] gsse :: calculate [ ]

Shortcuts are available for these operations: gsse ::sum < traversal >[ ] gsse :: diff < traversal >[ ]

It is important that all functors can be used completely orthogonally with other functors.

6.3.4 Interpolation

Interpolation is frequently used in various areas of scientific computing. An important part is the interpola- tion of higher dimensional cell values onto 0-cells, e.g. for visualization. Very few visualization packages offer the possibility for visualization of higher dimensional cells.

interpolate<, linear>

98 A selected example of the introduced discretization schemes is the FDTD discretization for the Maxwell equations. To finally visualize the resulting fields, the higher dimensional cell values, such as the E or H an interpolation of these fields onto vertices is used. In the following source snippet, the interpolation of the Ex field vector is given:

iterate [ Ex(_1) = sum(0.0)[E(_1)] / sum(0.0)[1.0] ](itup);

6.3.5 Point Location

Point location is a necessary requirement for efficient algorithms, such as interpolation, intersection tests,or boundary operators, Different types of point location mechanisms can be summarized with the concept of orthogonal range queries. The following source snippet presents the generic version of this algorithm

orthogonal_range_query

A recent generic orthogonal range query package was released g[86] which support the given generic software component requirements. The implemented version, compared to the generic algorithm, reads: point container = domain. query pointcontainer( point t(0,0,0), point t(0.2, 0.5, 0.5) ) ;

6.3.6 Assembly Algorithms

Finite volume as well as finite difference schemes can already be completely described by the given standard algorithms. The assembly procedure of finite elements requires an additional step for an efficient implementation, the assembly of local submatrices, as given in Section 3.3 The finite element method additionally requires pre-calculated stencil matrices for which the following algorithm was developed: gsse::stiffness ( matrix ); which uses precalculated element matrices for different types of cell types. Finally, to enable a different strategy of matrix assembly for the finite element method, the following algorithm, gsse::fem neighborhoud ( )was developed. void fem neighborhood( .. ) { iter.reset (); for(int l = start; iter.valid(); ++iter , ++l) { entry list(0, l) = vec[( ∗ iter ).handle ()]; } }

99 Based on these generic components, all three different types of discretization schemes can be used easily as building blocks for generic application design. void assemble stencil( .. ) { for(int i=0; i

for(int j=0; j< entry list.size(); ++j) { long column = entry list(0, j); mat add(line , column, stencil(i, j)); } int rhs pos = entry list .size(); r h s add(line , stencil(i, rhs pos )); } }

100 6.4 Domain Specific Embedded Language

A domain specific language can be introduced with the basic requirement of being a language with a set of symbols, a well-defined set of rules specifying how the symbols are used to build well-formed compositions, and finally a well-defined subset of all well-formed compositions that are assigned specific meanings [60]. The domain-specific of a DSL can be attributed to increase the level of abstraction. Not only is the grammar reduced, but also the expressiveness is increased. Therewith the possibility of writing code in terms close to the abstraction of the given problem domain is the characteristic property and, more important, the driving force behind all DSL. That programmers deal with microscopic languages/DSL every day Finally, a successfully DSL should be a declarative language, providing the user with notations to describe what rather than how can be done. Therefore the most of the DSL are given by a set of rules rather than imperative statements what to do. A purely declarative process describes only entities (symbols) and their relationships, e.g. the Make tool chain. A language without using operators as a mathematical domain specific language would deliver the follow- ing:

equ = sum(mul(diff(a,1.2), add(b,3.3)),diff(c,1e3))

By implicitly using a mathematical notation one obtains:

equ= (a-1.2)*(b+3.3) + (c-1e3)

This DSL is included in most of the programming languages as a built-in feature. Some languages also support the specification and overloading of almost all operators build therewith a possibility to obtain a very close distance to the normal mathematical notation for different areas, such as tensors or vector calculus. The advent of multiple paradigms available in one programming language renders new possibilities for do- main specific languages. The presented advantages and disadvantages of several programming paradigms and the reduction of specification and implementation effort clearly suggest the usage of domain specific embedded language. In contrast to the object-oriented programming paradigm, the generic programming paradigm supports an orthogonal means for application design with the separation of algorithms and data structures, connected by the concept of iterators, or in a more general way by traversal and abstract data accessors. By the concept of a DSEL the implementation effort can even more reduced to an implementation effort of O(1 + 1), where only the corresponding mechanisms have to be specified. But, the implementation of a DSEL for non-trivial areas is far away from user oriented. Domain specific languages are very common in different types of computer systems, e.g. with regular expressions, grammar generators like HTML, YACC (cite), UNIX shell scripts, SQL, VHDL, BNF, or the emerging Boost libraries (Phoenix, Proto, ..). The already given examples of DSLs always use a separate entity to obtain the final running code, such as for the YACC an extra compiler or for the Maketools the make-executable. Therefore these tools have to be maintained, the user has to learn the new tweaks and flaws. And finally but most importantly, the interoperability. Most of the external tools cannot have a close interaction with your program. Even worse, normally a DSL do not have complex mechanism of creating interoperable protocols to interface your program. That was the intend to create a DSL, to get closer to one domain, and therefore reducing the other domains. The other way of dealing with interoperability is to integrate the DSL into a common host language.

101 Structured Unstructured

1D 2D 3D ND 1D 2D 3D LSF SG Finite Volume .... Finite Element P=n.. P=3 P=2 P=1

Figure 6.13: Application design based on the object-oriented programming paradigm. Each of the blue blocks has to be implemented, resulting in an implementation effort of O(n · m), where n stands for the number of algorithms and m for the number of data structures

Structured Unstructured

1D 2D 3D ND 1D 2D 3D LSF SG Finite Volume .... Finite Element P=n.. P=3 P=2 P=1

Figure 6.14: Application design based on the generic programming paradigm. Only the blue parts have to be implemented, which reduced the implementation effort to O(n + m).

In combination with an application library, any general purpose programming language can act as a DSL, so why where DSLs developed in the first place? Simply because they can offer domain-specificity in better ways: - Appropriate or established domain-specific notations are usually beyond the limited user-definable op- erator notation offered by general purpose languages. A DSL offers domain-specific notations from the

102 start. Their importance cannot be overestimated as they are directly related to the suitability for end user programming and, more generally, the programmer productivity improvement associated with the use of DSLs. - Appropriate domain-specific constructs and abstractions cannot always be mapped in a straightforward way on functions or objects that can be put in a library. This means a general purpose language using an application library can only express these constructs indirectly. Again, a DSL would incorporate domain-specific constructs from the start. A formal definition of a DSL can now be given:

Definition 69 (DSL) A domain specific language incorporates domain-specific notation, constructs, and abstractions as fundamental design considerations.

A domain specific embedded language (DSEL) is simply a library that meets the same criteria. The concept of a DSEL addresses many of the problems of translation units such as YACC and interpreters as Make. All software engineering tasks, such as designing, implementing, and maintaining the DSL is reduced to implementing and maintaining a library. And more importantly, the interoperability can be enhanced to the highest level, due to the fact that both concepts are now available in one host language. All the already developed libraries and functionality is available together. Summarizing the advantages of domain specific embedded languages can be briefly summarized: - Abstracting the underlying language/system/compiler into the direction of the user/domain expert - The overhead of learning or adopting a new language is greatly reduced. - Reduction of documentation due to expressive names and self-documentation - Validation of semantic, e.g. by a compiler

6.4.1 Requirements for Programming Languages

The introduced concepts for domain specific languages require mechanisms from the host/parent language. This section deals with the analysis of mechanisms and concepts which have to be offered by the host language to successfully build such a domain specific embedded language. operator overloading means, that different operators, such as +,- can be overloaded. parametric poly- morphism means, that a kind of template system is available. functional mechanism means, that higher order and objects are available. time of evaluation means, that program code can be specified for compile and for run time.

Language operator overloading parametric polymorphism functional mechanisms time of evolution C no partial (macros) no compile/run time D [6] partial yes no compile/run time Java no partial (version) no run time C# [5] no partial (version) no run time Haskell yes partial (version) yes run time C++ yes yes yes compile/run time

Table 6.4: Language comparison.

As can be seen from this brief language comparison, not many languages are available which allow the modeling of the operators effectively. And languages like Haskell were not developed with an overall high performance approach, but offers also the definition of new operators which extend the built-in language syntax. Based on this overview the one remaining language with the most broad approach for DSELs is C++ with syntactic expressiveness and runtime efficiency: - A static type system

103 - The ability to achieve near-zero abstraction penalty [87, 88] - Powerful optimizers - A template system that can be used to: - generate new types and functions - perform arbitrary computations at compile-time - dissect existing program components - A rich set of built-in symbolic operators (48) that can be overloaded. These unique combination of features renders C++ to the one and only possible language for the here given concepts. Next to the great possibilities of using C++ has a host language for a DSEL, the multi- paradigm approach offers a great degree of freedom for implementing high performance scientific code and even applications. One of the most important parts of the here introduced GSSE is the fact that with the presented concepts, C++ can optimize 11 a whole application and therefore generate a run-time performance for various tasks without a comparison. A DSEL can function as an extension to the general- purpose host language.

6.4.2 Building Elements for the Embedded Language

Building an arbitrary DSEL the following goals should be addressed: - Interoperability: one of the most important parts in a heterogeneous environment such as scientific computing - Declarativeness: scientists are used to a mathematical notation and not to the program constructs from a programming language The distance from the actual mathematical notation should be reduced as far as possible - Expressiveness: The final code should look like the normal notation - Efficiency: A simple example within the DSEL should compile to a final code without any overhead - Static type safety: The host compiler should catch as many problems as possible - Maintainability: Simple changes to the problem should only result in simple changes to the model within the host language - Scalability: If the environment do not support any features, it should be easily extensible.

6.4.3 Application

The following example uses finite volume schemes in order to assemble a simple Laplace equation. We use the linear equation data type instead of the standard numerical types (e.g. double) and replace the function values by linear equations. The constructor of the linear equation contains the value ψi as well as the index i and returns an equation object with the meaning ψi +∆xi. The following code snippet presents an application of the introduced linear function concept to obtain the formulation of the Laplace equation: linear equation laplace eqn ; vertex edge eov it(vertex) for(; eov it.valid(); ++eov i t ) { linear equation equation; edge vertex voe i t (∗ e o v it );

for(; v it.valid(); ++v i t ) { equation += linear equation(f(∗ v it), i(∗ v it )) ∗

104 orient(∗ v it , ∗ e o v it ); } equation ∗= A(∗ e o v it) / d(∗ e o v it ); laplace equ += equation; }

The linear equation can also be specified using functional programming. laplace equ = sum [ sum( e ) [ lineqn(psi( 1), i( 1 ) ) ∗ orient( 1 , e ) ] ∗ A( 1) / d( 1 ) ](vertex );

It has to be highlighted that most of the implemented parts are targeted at simple modules such as traverse, sum and diff. The reason for that is, that if an environment cannot deal with simple things like that how should the environment deal with complex problems such as a nonlinear drift-diffusion application.

105 6.5 Visual Programming

Based on the concepts of topological manifolds and the separation facilities of the fiber bundle approach and the correlated programming concepts introduced in Section 5.2 new possibilities for programming concepts can be derived. One of these concepts is the concept of visual programming. Visual programming is not only the illustration of the final structure but can also be used in the different abstract ways, such as the convergence analysis of non-linear algebraic solver steps as well as in defining the model boundaries by identifying different types of boundary conditions. Next to a real-time visualization interface to update the visualization engine at each new time step, the following abstract quantities are necessary: - Equation type: EType - Coupling coefficients - Domain identifier - Boundary conditions Based on this abstract quantities, not only can the application design greatly be reduced, but also the search for errors can be supported by these type of programming. The following sections introduce two different types of visually enhanced application design.

6.5.1 Visually Aided Specification Process

The specification of boundary conditions as well as initial conditions in higher dimension require an abstract quantity type to attribute the given region of space with the appropriate type, e.g. Dirichlet boundary conditions, or Neumann conditions. In the following a simple set of scalar Dirichlet boundary conditions are depicted for a quasi electrostatic simulation is given where the inner domain is separated by the boundary area. Compared to different currently used applications, the specification of this type of abstract quantity can be used with arbitrary input compared to the special input requirements of different types of modeling the contacts as volume or area parts

Figure 6.15: Different equations types within a domain: boundary area (red), inner domain (blue).

Based on different types of partial differential equations and the corresponding different discretization schemes, different types of boundary or initial conditions have to be specified. To guarantee the greatest means of extensibility for specifying these abstract quantities, all different types of boundary concepts have to be supported. The following figure depict three different types of boundary conditions for a finite difference time domain simulation. Another possibility is to specify non-trivial boundary conditions for structures which are given analytically or with a high symmetry.

106 Figure 6.16: Different equations types within a domain: Dirichlet (red) and Neumann (blue) boundary conditions as well as (green) initial conditions..

Figure 6.17: Visualization of different equation areas: domain inside (blue), boundary parts (red).

Corresponding script: selectboundary segment0 assign border 1.0 select x > −10 assign rho 1.0 assign EType 1.0 select y > 0.3 & x < 0.23 & x > −0.23 & border == 1 assign EType 2.0 assign PotBnd 1.0 rotate −36 coordinate select y < −0.30 & border == 1 & x < 0.23 assign Type 2.0 assign PotBnd 0.0 rotate 36 coordinate select x > −10 on edges seledges full on edges init eps assign eps 1.0

107 6.5.2 Stable Program Development

A complete example of an abstract process which can be greatly supported by the visual programming is given next. A nonlinear solving procedure is used and each linear solution step is presented next with the corresponding error in the implementation. The visualization of the calculation is available in real time, making it possible to observe the evolution even of a non-linear solution process. GSSE interfaces for real-time analysis are available, e.g. by an extension to OpenDX [9], and proves to be invaluable for the adjustment of simulation parameters. The leftmost figure shows the initial potential, while the rightmost depicts the final solution. The center image shows an intermediate result that has not yet fully converged.

Figure 6.18: Potential of a pn diode during different stages of the newton iteration. From initial (left) to the final result(right).

As can be seen the overall process of the visual programming greatly supports the development of non- trivial applications.

108 6.6 Solution Techniques

Differential equations are discretized on elements of the topological structure and entered in a matrix by assembly methods. By traversing the domain and accessing the algebraic properties of the stored quantities the result is obtained by accessing algebraic matrix solver systems [16, 17, 89]. As introduced in Section 3 different discretization schemes require different types of traversal.

6.6.1 Requirements from Algebraic Solving Systems

In general there exist two main methods of equation assembly which differ in the type of sub-matrices assembled in one step. Element-wise assembly (Figure 6.19) is typically used in the finite element method. All cells (finite elements) are traversed and for each of the cells a local matrix is calculated. This matrix introduces coupling factors between the various shape functions. As each shape function is mapped to an element of the underlying cell complex, couplings between functions can be seen as couplings between values on elements. The local matrix entries are written into the global matrix according to a global vertex/cell numbering scheme.

Figure 6.19: Element-wise assembly. All Figure 6.20: Line-wise assembly. All ver- cells are traversed and sub- tices are traversed and lin- matrices are calculated. The ear equations are assembled. sub-matrices are inserted Linear equations are inserted into the system matrix. into the system matrix.

The finite volume scheme uses a different technique of matrix assembly (Figure 6.20). The differential equation is discretized on a vertex of the simulation domain. Couplings to other values are described by sums over topological elements. One of the major advantages of this method is that each degree of freedom causing a matrix entry has its own governing equation. This also implies that the matrix regions of assembly are disjoint, which allows a larger degree of parallelization, because each assembling element has exclusive access to matrix lines. By introducing the requirement that all functional objects have to provide their derivative, it is possible to get an additional benefit from this implementation because the tedious and error prone calculation of the Jacobian, outlined in Equation 6.1, as needed for the solution of the nonlinear equation system, can be automated. ∂f~ J~ = (6.1) ∂xi

6.6.2 Linearized Functions

In order to simplify the line-wise assembly method, e.g. for finite volumes, a notion of a linear func- tion data type is introduced. Considering some differential operator L(ψ) and the differential equation L(ψ) = 0. Figure 6.21 shows a one-dimensional simulation domain with a quantity distribution. While

109 Figure 6.21: Linear equation. The linear equation data structure stores the value as well as the dependence on the unknown variables ∆xi. the depicted values are not the solution of the considered equation, the residual can be determined by using the finite volume formulation. Not only the residual value is considered but also the effects of linear changes of single values on the residuum. In order to provide an environment which is able to handle small variations, an index i is assigned to all variables. The small variations are called ∆xi and the function value is denoted by ψi. A function value including dependencies thus yields ψi +∆xi. Based on these considerations addition is introduced:

(ψi +∆xi) + (ψj +∆xj) = (ψi + ψj)+(∆xi +∆xj) (6.2)

In order to define a closed algebraic structure, the concept of a linear function is used. This function can be written in the following manner

ai · ∆xi + ci . (6.3) Xi The relations described in Equation 6.2 can be generalized to algebraic expressions. A data structure which implements these operations consequently, provides the following property: specifying the residuum and replacing the function value ψi by ψi +∆xi results in a linear equation for the values of ∆xi. The enormous variety of physical models in device as well as process simulation requires fast and flexible implementation methods for many different discretization schemes. In order to have such an environment which can handle even complicated models, our approach is to ease the specification, while providing the functional dependence of the formulae. Using the widespread known drift-diffusion model as an example, it can be shown that the formal specification overhead can be reduced to a minimum. During the implementation of any physical model the most error prone as well as time consuming part is the linearization required to obtain the elements of the Jacobian matrix. Four methods are commonly employed in order to obtain all partial derivatives. Our new approach overcomes these problems by implementing only the linear (or higher order polyno- mial) functional dependence of equations on variables around x. A calculation framework is introduced where derivatives are implicitly available and do not have to be specified explicitly. This is one of the major advantages, especially if high flexibility is expected for different models. The amount of manual work is reduced to the specification of the equations, and amounts to about O(n).

The elements of the framework are truncated Taylor series of the following form f0 + i ci · ∆xi. Figure 1 shows the internal structure of the Taylor expansion. To use a quantity x within a formula its value f i P 0 has to be specified and the linear dependence ci = 1 on the vector x of quantities. This step, however, can be performed implicitly by the computer. A framework of basic operations on Taylor series is introduced, which can handle truncated polynomial expansions. In the following we will only specify our nonlinear functionals using linearized functions in upper case letters. All necessary numerical operations on these data structures can be performed in a straight forward manner.

110 F = f0 + ci · ∆xi G = g0 + di · ∆xi (6.4) Xi Xi F ⊕ G = (f0 + g0)+ (ci + di) · ∆xi F ⊗ G = (f0 · g0)+ (g0 · di + f0 · ci) · ∆xi i i X X (6.5)

f0 f0 · di − g0 · ci ′ F ÷ G = ( )+ ( 2 ) · ∆xi G(F )= g(f0)+ g (ci) · ∆xi (6.6) g0 g0 Xi Xi Having implemented these schemes we are able to derive all required functions on these mathematical structures. This means that we have a consistent framework for formulae in the follow- ing sense: if A is the linearized version of function A at x0, we obtain ∂A/∂xi = ∂A/∂xi |x0 around the point of linearization.

We now apply the given mathematical framework to the numeri- cal integration of the non-isothermal drift-diffusion problem. We have the following governing equations for electrons when dis- cretizing using the optimum artificial diffusivity method [90]. Figure 1. The multiplication of two Figure 1 shows the multiplication of two truncated expansions Taylor series. F = 3+∆x1, and G = 3+∆x3. As a result we obtain F ⊗ G = 12 + 4∆x1 + 3∆x3.

µ k x J = n,i,j B n B α − n B −α , α = α ∆n , ∆T , ∆Ψ , B x = . d j i,j i i,j i,j i,j i,j i,j i,j ex − 1      (6.7)

In order to obtain all partial derivatives ∂xi J, where x is the vector of unknowns (n, p, Ψ, T ), we introduce the base quantities for potential, carrier concentration, and temperature xi = xi0 +∆xi. We directly insert these quantities xi into the discretized equation and use the operators for truncated polynomial expansions (2, 3). For α and µn we obtain a model dependent functional dependence and for J we can directly evaluate

αi,j = α0 + ai∆xi , µni,j = µ0 + mi∆xi , (6.8) Xi Xi k J = B µ ⊗ n +∆n ⊗B α ⊖ n +∆n ⊗B −α (6.9) d n j0 j i,j j0 j i,j   Next, all necessary operations (2, 3) are evaluated automat  ically to bring the discrete formulation into a standard form(1). Note that applying these operations yields the same results as if we manually evaluate all derivatives and explicitly form the linearized equation.

We obtain the matrix line of the Jacobian in the form J = J0 + k ∂xk J∆xk, where the xk are the dependent variables ni,nj, Ti, Tj, Ψi, and Ψj. In contrast to the first approach all operations are performed by the computer automatically, because we only use the linearizedP formulation and not the complete formula. Our method results in the same equations as conventional methods, however, it imposes a minimum of external specification effort. In general, all discretization schemes which use line-wise assembly based on finite differences as well as finite volumina can be handled using the described formalism. Therefore it is possible to implement complicated physical models, for instance six moments based transport models [91], much faster and efficiently compared to conventional methods.

111 Part III

Applied Concepts

112 Chapter 7

Areas of Application

In this chapter, the area of application in the field of TCAD, and in more general, scientific computing, is introduced with several examples from different areas and with different discretization schemes. The great diversity of physical phenomena present in semiconductor devices themselves and in the processes involved in their manufacture make the field of TCAD extremely challenging. The quest for ever decreas- ing device dimensions and faster switching speeds results in ever growing requirements on the employed simulation methodologies. New, more detailed and more sophisticated models are required to simulate not only the operation but also the manufacture of the devices. Because, while computer performance is steadily increasing, the additional complexity of the simulation models used easily outgrows this gain in computational power. It is therefore of utmost importance to employ the latest techniques of software development to obtain high performance and thereby ensure adequate simulation times even for complex problems. Recognizing this, the semiconductor industry jointly works on the further refinement of the necessary technologies (ITRS)[92]. TCAD is a crucial tool for achieving the set goals as simulations help to greatly reduce the costs in development of new devices as well as manufacturing equipment. A demonstration of the developed applications based on the introduced theoretical and then transformed into practical concepts, which solve partial differential equations based on the facilities of the GSSE is given. These equations can be divided into elliptic equations, such as the Laplace or Poisson equation, parabolic equations which describe diffusive processes, and hyperbolic equations, such as wave equations. The need for an advanced equation processing frame work offering high performance as well as high expressiveness is presented. A system of coupled non-linear equations from the field of TCAD is used to demonstrate the approach and results for the investigated equations are presented. Not only is the GSSE introduced, but also is the related work reviewed. Finally, several examples are given to demonstrate that the proposed concept is indeed functional and even superior to the pastly developed applications relating to run-time performance and implementation effort.

113 Chapter 8

A Generic Scientific Simulation Environment

The final question remains, why a new environment has to be implement? Are there not already more than enough environments available 9. Of course, there are a lot of available environments. But none of the reviewed environments implement a DSEL for scientific notation, with an overall high performance, with generic interfaces to all different types of linear solver systems...

Visualization

IBM OpenDX IUE SmartV Amira, Paraview

Orthogonal Interpolation Matrix − Solver Range Queries Mechanisms Interface

Trilinos PetSc IUE QQQ Direct solver Atlas Domain

Traversal Data Accessors Utility Library Concepts

nD−Voronoi Metric tool Norms Boundary Coboundary Operators Operators

Segment

Incidence Quantity Information Storage

Cell Complex Properties Properties

Input / Output Mesh Interfaces

WSS F5 GSSE: VGModeler Netgen CGAL Netgen GSSE:binary deLink Tetgen

Figure 8.1: Build blocks and overview of the GSSE.

The GSSE approach separates two main parts for the transformation of the presented concepts into appli- cable modules for application design in TCAD. The first part of the GSSE is based on a comprehensive topological library (GTL). A complete topological framework is presented that is able to provide inci-

114 dence traversal operations for various topological elements. This enables one to perform the necessary topological operations for several discretization schemes. The traversal concept allows to formulate algo- rithms based on this interface independently of the actual implementation of the topological data structure including dimension and archetype. The consequent use of this interface leads to dimensionally and topo- logically independent formulations of algorithms. This approach of a generalized topology enables a new functional description for the discretization of PDEs without sacrificing any performance. This is accom- plished in a generic functor library (GFL), which finally implemented the build blocks (functors) for the domain specific embedded language. The GFL is therefore the important part to ease the developed of algorithms for scientific computing in an expressive way. Following the given abstraction interfaces within the software components from Section 5.1: I.) → II.) data structure: GSSE as a purely generic application design environment does not provide any direct storage mechanisms at all but utilizes iterators that implement an data model of abstraction level II. II.)→ III.) functional programming: This abstraction layer is implemented in GSSE through meta-data that allow to identify vectors and co-vectors and to extend the set of of data model operations. I.)↔ III.) generic programming: GSSE implements one such set of (mathematical) operations indepen- dently of a specific data structure. III.)→ IV.) physical interpretation: GSSE allows this abstraction layer through textual human-interpretable identifiers for so-called grid and field objects, but does not yet provide support beyond this. The generic programming paradigm realized by the template mechanism offers homogeneous interfaces between algorithms and data structures by the iterator/cursor pattern. Functional programming enables an efficient means to specify equations and offers extensible expressions The complete process of application design can, based on these two libraries, greatly reduced. Therefore, the GSSE implements concepts accomplishing the following criteria. - A generic library should be complete so that all applications can be written exclusively using this library (as well as standard libraries). Indeed, completeness increases the usability enormously, because no components have to be added while existing components can be adapted. - The parts of the library should be usable for a broad range of different applications. Each of the software components is not only written for a very specific purpose, but for a manifold of problems. - The interoperability of the library must not be affected by its completeness. Even if the complete library can be used by itself, it has to provide standardized interfaces which guarantee compatibility for data structures which have not been foreseen in the initial design. Generic library design deals with the conceptual categorization of computational domains, the reduction of algorithms to their minimal conceptual requirements, and strict performance guarantees. The benefits of this approach are the re-usability and the orthogonality of the resulting software. Generic libraries were pioneered by the Standard Template Library (STL) in C++, but both software and language technology have long gone beyond STL.

115 8.1 Generic Topology Library: GTL

The GTL can be seen as the most reduced form of specifying a data structure with performance constraints and offering a unique interface of all different types of data structures in a topological way and therefore dimensional neutral. Topological functionality is mostly needed to fulfill the requirements of discretization schemes. All these schemes need a set of neighboring elements based on the topological property of incidence. Therefore an implementation of the GTL [81] provides comprehensive incidence traversal and orientation operations with a generic interface similar to the C++ STL [27] and is based on GrAL [93]. The GTL implements the theoretical and practical concepts of the topologial toolkit as well as the concepts of chains. Therewith all different types of data structures can be topologically interpreted as cell complexes of a certain dimension. Because of the combinatorial properties of complexes, a higher-dimensional data structure can also be considered as the projection onto any lowe r-dimensional one, thereby traversed in multiple ways. Related to the base space property of the fiber model, the GTL operates on the n-skeletons. Because of the distinction between local and global properties of the complex topology, data type designers can at the same time control which incidence relations, thus which kinds of efficient traversal, are available. Local properties can be depicted by local or explicit neighborhood information, whereas global properties are modelled by implicit neighborhood information. As introduced in Sectin 6.3, a generic traversal mechanism is used by the connection matrix concept. The GTL implements this connection matrix as a bi-map (cite boost bi-map) with a sparse represenationq of the vertex on cell information. Additionally, only one bit is used to store the orientation and connection of each vertex on cell and cell on vertex information. Therewith the global storage of incidence and adjacence information can be estimated by:

i O(Cj)= mn × m0 · 1 bit (8.1)

To obtain such an efficient storage, the information stored is twofold. First, only the number of vertices of bits are allocated. Then, the information if the vertex i is oriented accordingly to the n-cell orientation, only the bit information 0 or 1 is stored, accordingly to oriented consistentlym or not oriented consistently, respectively. Figure 8.2 lists all methods available from the connection matrix which are possible between topological elements of different dimension, e.g. for simplex cells. It can be seen easily that the incidence relation of elements of the same dimension can be modeled by the equality relation. The first row shows all edges, faces as well as cells which are incident with the same base vertex. The first column shows vertices which are incident with one base edge, faces or cell. The classification scheme of Section 6.1 is used to implement a generic data structure interface. The features of this interface and some of the implementation details are presented. The GTL, as a sublibrary within the GSSE, is based on a layered and orthogonal concept, which means each layer can be used separately. A schematic diagram is given in Figure 8.3. The bottom layer represents mechanisms for set handling. This layer also integrates homogeneous access to compile time and runtime mechanisms. The next layer implements the poset traversal mechanisms. Generic traversal operations use the set information and specify the corresponding traversal objects. Cell and complex topology modules offer various access mechanisms based on the underlying set operations. Local and global properties of the complex topology part are implemented with their respective interfaces, as given in detail in Section 8.1.7. The implementation is basically oriented on the property map concept but differs in several details, as given in Section 8.1.7. Finally, the container layer combines the cell complex part with the data storage part (quantity) with respect to the index space depth. It is important to note the different evaluation times of various parts of the GTL. Cell types, e.g., Figure

116 Figure 8.2: Traversal methods induced by the incidence relation. horizontal: traversal schemes of the same base element. vertical: traversal scheme of the same traversal elements.

Figure 8.3: Conceptual view of the GTL.

??, cell topology traversal mechanisms, and global complex topologies can be evaluated at compile time. The runtime part is used, e.g., for local complex topologies such as a linked list or a simplex mesh. With the advent of the Fusion 2 library [94] this interplay between compile and runtime is encapsulated cleanly. The following table overviews the different parts of the library with their corresponding time of evaluation.

compile time runtime cell topology, simplex yes no cell topology, cuboid yes no cell topology, polytope no yes complex topology, local no yes complex topology, global yes no

In the following we present the transition from the formal introduction to the notation possible with the

117 GTL. The first part presents the cell type. The declaration of a simplex and cuboid object is given in the code snippet with their respective dimension. The corresponding cell topologies for these two cell types are already implemented with in the GTL. Most of the typedefs are omitted due to space constraints.

Cell Type Definition cell type <0, simplex > cell s; // vertex cell type <3, cuboid > cell c; // cuboid

Some complex types are presented in the next code snippet, however, the complex type is certainly not restricted to these types of cells. Arbitrary types can be used which model the corresponding concepts.

Complex Type Definition complex type complex 1; // array complex type > complex 2; // double linked list / tree complex type complex 3; // regular grid

The container type combines the complex types (base space properties) with the data types (fiber space properties), where the index space depth property is used to introduce the distance of the data ac- cessors from the base space.

Container Type Definition container t > container ;

In the following code snippet a simple example of the generic use of a data accessor similar to the property map concept is given, where a scalar value is assigned to each vertex. The data accessor is extracted from the container to enable a functional access mechanism with a key value which can be modeled by an integer or string. Data Accessor d a type da(container , key d ) ; f o r each ( container.vertex begin(), container.vertex end(), da = 1.0 );

The following sections gives several examples for different dimensional cell complex types as well as the possibilities of identifying other libraries by their topological structure.

8.1.1 Quantity Accessors

The topological data structure itself, however, is not sufficient to perform complex calculations on such a structure. For this reason we provide means for the storage of values on the topological container. Each of the topological elements such as vertices or edges can be associated with one or more values. In our approach, the property map concept [76] is adopted, which offers the possibility of accessing the quantities in a functional way by a mechanism called quantity accessor. The quantity accessor implementation also takes care of accessing quantities with different data locality, e.g., quantities on vertices, edges, faces, or cells. The quantity accessor is initialized with a domain. During initialization, the quantity accessor quan is bound to a specific domain with its quantity key. The operator() is evaluated with a vertex of the cell complex as argument and returns a reference to the stored value.

Quantity assignment string key quan = ”user quantity ”;

118 quan t quan = scalar quan(domain, key quan ); quan(vertex) = 1.0;

In the following code snippet a simple example of the generic use of this accessor is given, where a scalar value is assigned to each vertex in a domain. The quantity accessor creates an assignment which is passed to the std::for each algorithm.

Quantity assignment string key quan = ”user quantity ”; quan t quan = scalar quan(domain, key quan ); f o r each(cc. vertex begin(), cc.vertex end(), quan = 1.0);

The quantity and data accessor are introduced to present the corresponding interfaces. Similar to the property map interface, the data storage container for the accessors can be chosen arbitrarily to satisfy the application requirements.

Data Accessors for the Container value = container.get value(base handle , quan key ); value = container.get value(∗ vertex o n cell , quan key );

As can be seen the base handle (index) selects the respective fiber, whereas the quan key selects the corresponding fiber element. In contrast to the cursor and property map concept, the fiber bundle approach allows traversal operators within the fiber space as well:

Fiber Access fiber = container.get fiber(base handle ); for (it = fiber.begin(); it != fiber.end(); ++it) // ...

Next, a slice can be extracted, which means that a complete data set attached to a base space can be returned:

Slice Access slice = container.get slice(quan key ); for (it = slice.begin(); it != slice.end(); ++it) // ..

8.1.2 The 0-Cell Complex

The best known and mostly used data structures in programming can be described by the 0-cell complex types. The data structures within the C++ STL are an example of this type. The tags global and local in the following code snippet specify the locality of the complex topology.

C++ STL approach std :: vector container ; std :: vector ::iterator it;

119 Figure 8.4: Representation of a 0-cell complex with a topological structure equivalent to a standard container. it = container.begin(); ++it; //topological traversal int value = ∗ it; //data access

On the one hand side, the iterator concept is one of the key elements of the C++ STL. It separates the actual data structure access from algorithms. Thereby the implementation complexity is significantly reduced. On the other hand side, it combines iteration and data access. The improvements of separating the iteration and data access are outlined with a cursor and property map concept [76]. A possible application of this approach is demonstrated in the next code snippet:

Separated iteration and data access vector container ; vector ::iterator it; property map pm(container); it = container.begin(); ++it; //iteration bool value = pm(∗ it); //data access

STL Data Structure Definitions complex type cx; // array complex type > cx; // single linked list / stream complex type > cx; // double linked list / tree complex type > cx; // arbitrary tree

For this dimension of cell complexes the STL iterator traits can be used to classify the data structure as well:

STL Data Structure Definitions, STL Iterator Traits complex type cx; // simplex ,global complex type cx; // simplex ,local <2> complex type cx; // simplex ,local <3>

To demonstrate the equivalence of the STL and the GTL data structures a simple code snippet is presented:

Equivalence of Data Structures typedef cell type <0, simplex > cells t ; typedef complex type complex t ; typedef long data t ; container t > container ;

// is equivalent to the STL container type std :: vector container ;

120 Here, the separation of the topological structure and the data specification can be clearly seen.

8.1.3 The 1-Cell Complex

This type of cell complex is usually called a graph. Figure 6.5 shows a typical example. A cell of this type of cell complex is called an edge. Incidence and adjacence information exists between edges and vertices. The next code snippet presents a traversal (iteration) using mechanisms of the BGL. In this example all edges are traversed.

BGL Iteration typedef adjacency matrix Graph; Graph gr(number o f points ); // edge initialization edge iterator eit , eit end ; for (tie(eit , eit end) = edges(gr); eit != eit end; ++eit) { test source1 += source(∗ eit , gr); test source2 += target(∗ eit , gr); }

With the GTL the same functionality can be accomplished as demonstrated in the following code snippet. The global tag is used to highlight the global structure of the graph, which means, that the internal data layout is prepared for dense graph storage. In terms of the BGL, an adjacency graph data structure is used.

GTL Traversal typedef cell type <1, simplex > cells t ; typedef complex type complex t ; typedef long data t ; container t > container(number o f vertices ); // cell initialization container t :: cell o n vertex it covit , covit end ; for (tie(covit , covit end) = cells(container); covit != covit end; ++covit) { test source1 += source(∗ covit , container); test source2 += target(∗ covit , container); }

Related to interoperability, all developed algorithms of the BGL can be used as well. The property map concept and the related fiber bundle approach can be used as well because. The property map can be seen as a specialization of the fiber bundle approach. So the full power of the BGL can be used with the GTL as interface layer.

8.1.4 The 2-Cell Complex

This type of cell complex is usually called a mesh or grid. For the first example, we use a two-dimensional simplex cell complex. The following examples specify 2-cell complexes with the GTL.

121 Grid and Mesh Definitions cell type <2, simplex > cells t ; cell type <2, cuboid > cellc t ; complex type < cells t , global > c x ig; // grid complex type c x rg; // mesh

To illustrate the traversal mechanisms of the GTL the following code snippet is presented. A traversal which starts with an arbitrary cell traversal evaluated on a 2-cell complex is given. Then a vertex on cell traversal is initialized with a cell of the complex. During this loop an edge on vertex traversal is created and initialized with the evaluated vertex. This edge traversal operation starts the next traversal. The corresponding graphical representation is given in Figure 8.5. The objects marked depict the currently

Figure 8.5: Incidence relation and traversal operation. evaluated objects. In the first iteration state the vertex v1 is used and the iteration is performed over the incident edge, then the iteration continues with the remaining vertices.

A more complex traversal cell iterator c it = container.cell begin (); vertex o n cell iterator voc it = container.voc begin (∗ c it ); vertex o n cell iterator voce it = container.voc end(∗ c it ); for (; voc it != voce it; voc it .increment ()) { edge o n vertex iterator eov it = container.eov begin (∗ v o c it ); edge o n vertex iterator eove it = container.eov end(∗ v o c it );

122 for (; eov it != eove it; eov it .increment ()) { //operations on edges } }

As can be seen, the traversal can be used independently of the used dimension or type of cell complex. Three different objects have to be guaranteed by the cell complex: vertices, edges, and cells. All cell complex types which support these three objects can be used for this iteration. The following example compares the traversal mechanisms of the CGAL and the GTL. A traversal of all facets related to a cell is shown.

CGAL Traversal for the Cell Topology // ..... Polyhedron P definition

Facet iterator i = P.facets begin (); for (; i != P.facets end(); ++i) { //operations on a facet }

The special geometric object Polyhedron in terms of the GTL is implemented by a common container type.

GTL Traversal for the Cell Topology // container definition facet o n cell iterator foc it = container.foc begin(cell ); facet o n cell iterator foce it = container.foc end(cell ); for (; foc it != foce it; ++foc i t ) { //operations on a facet }

The next example compares the GrAL traversal with the traversal operations of the GTL. Several differ- ent grid types can be used in GrAL similar to the GTL container type. The main difference is the orthogonality of the cell types and the complex types in the GTL.

GrAL Traversal typedef grid types g t ; Triang2D CC; for( gt:: CellIterator c(CC);!c.IsDone();++c) { // loop over all vertices vc of cell c gt :: VertexOnCellIterator vc; for(vc = (∗ c).FirstVertex(); ! vc.IsDone(); ++vc) { // use ∗ vc

123 }

// loop over all neighbors for(gt::CellOnCellIterator cc(c); ! cc.IsDone(); ++cc) { // use cc } }

As can be seen the traversal mechanisms are implemented very closely to the grid types. The random access property cannot be used by all traversal operations. This complicates the optimization and performance steps related to application design. The next code snippet presents the GTL implementation. It has to be mentioned that the GTL interface can easily be implemented using the GrAL.

GTL Traversal // container definition cell iterator c it = container.cell begin (); cell iterator ce it = container.cell end (); for (; c it != ce it; ++c i t ) { vertex o n cell iterator voc it = container.voc begin (∗ c it ); vertex o n cell iterator voce it = container.voc end(∗ c it ); for (; voc it != voce it; voc it .increment ()) { // use ∗ v o c i t }

cell o n cell iterator coc it = container.coc begin(∗ c it ); cell o n cell iterator coce it = container.coc end(∗ c it ); for (; coc it != coce it; ++coc i t ) { // use ∗ c o c i t } }

As can be seen the interfaces can be exchanged easily and each of this libraries can be used with the GTL interface. A major advantage is therefore the reuse of source code due to the GTL approach.

8.1.5 Higher-Dimensional Cell Complexes

The 3-cell complex is similar, if simplex or cuboid cell types are used. Therefore we omit the correspond- ing figures and code snippets. Based on our formal definition of finite cell complexes higher dimensions can be used as well. The only required extension to the GTL is the introduction of the cell topology as is demonstrated with a 4-simplex, a pentachoron, depicted in Figure 8.6. The following code snippet shows the implementation of an arbitrary topology with the structure of the 4-cell simplex complex. All algorithms can be used as well due to the common traversal mechanisms. 4-D Simplex Complex typedef cell type <4, simplex > cells t ;

124 {0,1,2,3,4}

{0,1,2,3} {0,1,2,4} {0,2,3,4} {0,1,3,4} {1,2,3,4}

{0,1,2} {0,2,3} {0,1,3} {0,2,4} {0,1,4} {0,3,4} {1,2,3} {1,2,4} {2,3,4} {1,3,4}

{0,2} {0,1} {0,3} {0,4} {1,2} {2,3} {1,3} {2,4} {1,4} {3,4}

{0} {2} {1} {3} {4}

Figure 8.6: Poset of a 4-simplex, pentachoron. typedef complex type complex t ; typedef long data t ; container t > container ;

We extend the 0-cell and 1-cell complex types to arbitrary-dimensional cell complexes. In this work we restrict the topological spaces to the most important to scientific computing: the local (Figure 8.7) and the global cell complex (Figure 8.8). Based on our cell complex types the following cell types are available: - 0-cell: vertex - 1-cell: edge - 2-cell: triangles, quadrilaterals - 3-cell: tetrahedra, cubes - 4-cell: hyper-tetrahedra, tesseract

Figure 8.7: Local cell complex (left) and a cell representation (right). Vertices are marked with black circles.

Figure 8.8: Global cell complex (left) and a cell representation (right).

The following code snippet presents the implementation of an arbitrary topology with the structure of a

125 local 2-cell complex. The stored data is based on scalar values using a double for representation.

Iteration with our approach typedef topology <2, local, 4> topo t ; topology traits ::iterator it; typedef data data t ; data t data ; data traits ::value value; it = topo.vertex begin ();

++it; //iteration value = data(∗ it); // access

A more complex iteration cell iterator ce it = topo.cell begin (); vertex o n cell iterator vocit(∗ c e it ); for(;vocit.valid(); ++vocit ) { edge o n vertex iterator eovit(∗ vocit );

for (;eovit.valid(); ++eovit ) { //operations on edges } }

As can be seen, the iteration mechanism can be used independently of the used dimension or type of cell complex. The iteration is initialized with a cell iterator only. Three different objects have to be assured by the cell complex: vertices, edges, and cells. All cell complex types which support these three objects can be used for this iteration. Examples of higher dimensional complexes for C++ with the GSSE notation are given next. cell_type<3, simplex> cell_s; // tetrahedron cell_type<4, cuboid> cell_c; // hypercube complex_t > base_s; //{1} complex_t > base_s; //{2}

Here {1} describes an unstructured tetrahedral mesh and {2} describes a cell complex based on hyper- cubes. In the following the poset structure and a possible rendering of a four-dimensional simplex are presented. The poset structure demonstrates the traversal capabilities of the GSSE data structure interface. The poset of the four-dimensional cube is omitted due to the large number of facets. Only one possible render image is given next: A direct consequence of the use of the iterator ranges vertex begin(), vertex end() is that stan- dard algorithms such as for each are automatically supported. Using the function objects on vertex as well as on cell we can conveniently formulate traversal in the following manner.

126 Global Traversal f o r each(cc. vertex begin(), cc.vertex end(), on vertex ); f o r each(cc. vertex begin(), cc.vertex end(), on cell );

Generic Traversal Operations

In order to allow an arbitrary number of nested traversal operations, which is often required in applications, it is necessary to have appropriate data structures for traversal in order to keep the program code as concise as possible. For this reason we use the random access iterator concept in order to provide access to the topological elements. We refine the iterator concept [76] for data structural convenience (Fig. 8.11). In contrast to the STL iterator concept, dereferentiation of this generalized iterator does not provide the data content but a handle, that is used as a key to access a property map or quantity [72] in order to obtain the contained data. Our iterator concept is a refinement of the random access iterator concept and introduces the following additional concepts. To demonstrate the application of the valid() predicate we show a simple example of two nested iterations. We traverse from a base vertex to all incident edges and from these edges to incident vertices. (Fig. 1, (a) and (d))

The validity concept and its application cell vertex voc it(cell ); while( voc it.valid()) { vertex edge eov i t (∗ v o c it );

while(eov it.valid()) { //operations on edges ++eov i t ; } ++voc i t ; }

In the first traversal state, a cell is used to perform an iteration over all its incident vertices. Each

Figure 8.9: Rendering of a four-dimensional simplex.

127 Figure 8.10: Rendering of a four-dimensional cube.

Name Description Requirements bool valid() validity of iterator (not end) const iterator end() past end of iterator validity const void reset() set to the start point const

Figure 8.11: Concept refinement for the generalized iterator within the topological framework.

incident vertex *voc it is available in the outer loop. The inner traversal loop is initiated using the actually traversed element of the outer loop. From this vertex we obtain all incident edges using the dereferentiation of the inner iterator. In order to use standard algorithms of the STL, a data structure has to provide an initial as well as a terminal element. For this reason the end() function can be employed.

The iterator end functionality and its application struct func o b j { void operator()( cell t cell) { cout << c << endl ; } } ; cell vertex voc it(cell ); f o r each(voc it, voc it.end(), func obj ());

As many of the iterators require a non-negligible time for construction, it is often more efficient to use an iterator twice rather than creating a second instance for the same purpose. For this reason we introduce the reset function that enables us to reset an iterator to the beginning of the iteration range.

Usage of the reset function cell vertex voc it(cell ); for(; voc it.valid(); ++voc i t ) { // ... some operation } v o c it.reset ();

128 for(; voc it != voc it.end(); ++voc i t ) { // ... some operation }

8.1.6 Topological Mappings

The concept of chains, see Section 2.6.1, introduces the necessary concept to map the collection of cells of a cell complex into a space with algebraic properties. For the computational mapping, a so-called handle is used to map a single cell of given dimension and orientation into a data structure. The GSSE implements simple and lightweight wrapper data structures for single topological elements, only to be distinguishable from other topological elements. Basically any type can be used for these handles, which allows to uniquely identify the element within all elements of the same dimension. The value of such handles itself does not have any semantic meaning apart from offering the concept of equality comparison. The only valid operation on handles is, therefore, the equality relation. For the skeleton elements a unique identification can be found either via storing all the vertices or storing a cell and a local index which determines the element within the cell. Based on the identification of a unique describtion of all skeleton elements and even cells by their set of vertices, this works uses the vertex identification. As we only store incidence information between cells and vertices (Fig. 8.2 (c) and (j)) the incidence relation between skeleton elements is still undefined. There are many methods to specify this kind of information, for instance to use containers and explicitly storing this information. Even though this is possible and can be used in cases where high performance is required, such methods result in a high memory consumption, because most of the information has to be duplicated.

8.1.7 Implementation Details

Formal concepts and the interfaces derived from these are actually the main focus of the GTL. We want to present some implementation details of our approach in this section. Due to the fact that all information of sets and subsets is available at compile time for the cell topology of the simplex and the cuboid type common access mechanisms for different data types can be introduced. We analyzed the following data types from several libraries: std::tr1::array, mpl::vector, and boost::fusion::vector. In order to achieve efficient construction and access mechanisms for the GTL, we combined the compile time specification of the std::tr1::array with the constructor mechanisms of the fusion::vector. The result can be used as presented in the next code snippet for a two-dimensional simplex cell:

Cell Set Type typedef gtl ::array set3 type ; // .... set3 type cell(0,1,2);

The cell topology sets for this type of cell are given in the next code snippet. The corresponding number of elements in each set is calculated at compile time.

Cell Topology Sets (with DIM as dimension, e.g., 2) typedef gtl ::array cont 2 d ; typedef gtl ::array cont 1 d ;

129 typedef gtl ::array cont 0 d ;

Based on these sets, the cell topology container is implemented by a fusion::vector to combine several different data types:

Cell Topology Type Definition fusion :: vector cell topology type ;

The final generic traversal mechanisms is straightforwardly constructed with the specification of the entry point to the cell topology type at compile time.

Generic Traversal Operation Definition generic traversal vertex o n cell iterator ; generic traversal edge o n cell iterator ;

For the traverse operation we use Phoenix 2 actors to create the traversal mechanisms, as given in the next code snippet. A simple object generator is finally used to generate the traverse() operation.

Listing 8.0: template struct traversal generator { traversal generator(Initial const& init) : init(init) {}

template boost :: phoenix :: actor < typename boost ::phoenix :: as composite < traversal eval , Initial ,Operand >::type > operator [](Operand const& opm) { return boost :: phoenix ::compose< traversal eval >(init , op); } private : Initial init; } ;

Listing 8.0: template detail gtl :: traversal generator traverse () return detail gtl :: traversal generator (nt );

The following code snippets finally presents generic traveral operations with the GTL:

Listing 8.0: traverse (0.0) [accessor( 1)]( object) traverse (0.0, named)[ accessor( 1)]( object)

130 We present interfaces for the different types of complex topology. Due to the fact, that local complex topologies rely on the addition of single cells, the interface is implemented as:

Local Complex Topology class complex topology local { void add cell(cell type cell) // ... } ;

Global complex topologies can be modeled easily with the following interface, where only the correspond- ing global dimension has to be adjusted.

Global Complex Topology class complex topology global { void resize dimension( long new size , long index) // .. } ;

8.2 Generic Functor Library: GFL

The GFL was developed to describe a set of common functors for accessing data and arithmetic operations on them. This library implements the ability to specify whole equations in a concise yet expressive way. The avail- ability of a topology or traversal library, e.g. the GTL, is a fundamental requirement to specify equations in a functional way. Together these two libraries offer mechanisms to separate discrete mathematics into topological and numerical operations. Therewith a mathematic specific embedded language is possible. Based on the GTL methods, methods for discrete mathematics with respect to their separation into topo- logical and numerical operations are introduced. While quantities which are located on topological el- ements, can be specified as simple functional dependencies using functions and operators, topological traversal operations are traditionally specified using loops. Even though the imperative approach of loop specification offers enormous flexibility it also imposes many drawbacks. The divergence of functional quantity access and imperative topological traversal mainly results in an enormous bloat of code. This problem is aggravated due to the fact that most of the traversal loops are used in order to accumulate values. For this reason local variables are required in order to store the provisional results. All these considerations show that any resulting code suffers lack of maintainability and leads to unnecessarily long and error-prone formulations. As a consequence, the traversal through topological elements is completly separated from the actual algorithm or functional specification. The GFL is connected to the GTL only by means of topological traversal interfaces and the quantity accessors. The GFL implements concepts for generalized functional specification formalism. As a consequence the formulae can be specified using the functional paradigm, see Section 5.2, which perfectly matches its functional character. Since loops are used to represent sums or products in discrete mathematics without using imperative actions, the formal specification can be simplified greatly. A simple arithmetic operation, the summation of values stored on vertices, is presented using the functional programming paradigm supported by the Boost phoenix library [95]. The GTL is used to calculate the vertex-vertex adjacent operation. Here the sum of the quantities is evaluated and assigned to the base vertex.

Quantity calculation for each

131 ( domain. vertex begin() , domain. vertex end(), pot = sum (0.0 , 1+ 2)[pot] ) ;

The expressions 1 and 2 are part of the functional approach and evaluate the given arguments returned by pot. To present the usage of the GFL in more detail a simple equation is used:

(∆e→vq)=0 v→e X where q denotes the quantity located on a vertex and ∆ denotes the difference. The implementation with the GTL and the GFL is presented in the next code snippet without a dependence to dimension or type of cell complex. The compiler will always generate the most suitable code for each architecture and application. Therewith overall high performance can be achieved. for (vit = (∗ segit ). vertex begin (); vit != (∗ segit ). vertex end(); ++vit) { eq = sum [ sum(0.0, e) [ q ∗ orient( 1 , e) ] ]( ∗ vit) }

The complex resulting from this mapping is completed by specifying the current vertex object *vit at run-time which clearly demonstrates the compile-time and run-time border. For a comprehensive expla- nation the specific parts are separated. The first part specifies a traversal over all vertices. for (vit = (∗ segit ). vertex begin (); vit != (∗ segit ). vertex end(); ++vit) { // ... }

The next stands for the actual functional token. The *vit represents the currently evaluated terminal object, in this case the currently evaluated vertex. eq = sum [ // .... ]( ∗ vit)

The heart of the equation is shown in the next code snippet. Here a sum is used as the discrete representa- tion of the differential operator. The function orient denotes the orientation of the vertex and the edge. It returns 1 if the edge and the vertex are oriented consistently, −1 otherwise. This formulation is finally equivalent to the mathematical formulation of the difference. The topological traversal is given by the ver- tices incident to the edge. This inner part receives the edge information from the sum and builds a traversal on the vertices on this edge with a local variable e. sum(0.0, e) [ q ∗ orient( 1 , e) ]

132 The quantity accessor q supplies the sum algorithm with values stored on the vertices. Finally the summed value obtained from all inner sum operations is stored in the equation object eq. Additionally, a small functor which is based on the Phoenix 2 [96] library is implemented. To initialize each functor with the correct start value, we have used the algebraic properties of the SGI STL related to the identity elements of groups. For the two most important functors, the sum and prod, we use the appropriate identity element to initialize the calculation functor. Various topics in the field of scientific computing such as the linearization of equations and discretization of differential operators can be per- formed with these two operations. The following code snippet is only sketched with pseudo code due to the complex data type extraction within C++. template

CalcObject calculcate(Iterator first , Iterator last , Fun ctor& binaryop) { // operator+: identity element = 0 // operator ∗: identity element = 1 CalcObject init = identity element(binaryop );

for (; first != last; ++first) init = binaryop(init , ∗ first ); return init ; } // ..... calculate(container.begin(),container.end(),std ::plus ()) calculate(container.begin(),container.end(),std :: mu ltiplies ())

As an example of the GFL with the GTL and the corresponding extensibility a discretized formula of the Laplace equation is given by:

A L (Ψ) ≡ (∆ Ψ) ε = 0 , ell fv e→v d v→e X where Ψ denotes the solution quantity and ∆ denotes the difference. The geometrical factors A and d denote the cross section for the flux and the distance between the two edge points, respectively. The final source code based on GSSE for the Laplace equation without an assumption about dimension or type of cell complex. eq = sum [ sum(0.0, e) [ pot ∗ orient( 1 , e) ] ∗ A/ d ∗ eps ]

With small changes the Laplace equation can be extended to a Poisson equation as well as to general self-adjoint equations:

A L (Ψ) ≡ (∆ Ψ) ε = V ̺ , ell fv e→v d v→e X With the next code snippet the great extensibility of the functional programming approach can be clearly observed. Most of the source code remains unchanged; only minor parts have to be added. eq = sum

133 [ sum(0.0, e) [ pot ∗ orient( 1 , e) ] ∗ A/ d ∗ eps ] − V ∗ rho

The Poisson equation includes a non-zero right-hand side, where the integration is performed using finite volumes and leads to a multiplication with the volume V .

8.3 Generic Solver Interface, GSI

As introduced in Section 6.6 the access to algebraic solver systems requires different mechanism if used for a wide variety of discretization schemes. Therefore the GSSE provides a generic solver interface, which is based on the introduced traversal and quantity as well as linearization mechanisms. In the following, short code snippets representing the most important parts of the GSI are given. First, the compile or run-time specification is illustrated: #include ”gsse/ matrix interface/ trilinos solver interface .hpp” #include ”gsse/ matrix interface/gauss solver interface .hpp” typedef trilinos solver interface solver t ;

The generic matrix wrapper can be accessed specified with the correspoding mapping type, which build the bijective connection of the assembled cells and the corresponding domain: gsse::gsi :: solver traits ::matrix interface msi(); gsse::gsi :: solver traits ::matrix insert m ins(msi); gsse::gsi :: solver traits ::matrix rhs m rhs(msi);

gsse::gsi :: solver traits ::type m mapping ;

The quantity access and the linearization framework can be specified by: typedef gsse :: quan entry t ::type quan entry t ;

quan entry t pot quan (segment, key pot, m mapping );

The final solver procedure with different options is given in the next code snippet. msi. prepare solver (); msi. switch to full output mode (); msi. set options pack4 (); msi.dump(); bool solved = msi.solve();

Different algebraic solver software is already accessible such as: - Self developed direct and iterative solvers [16] - Trilinos [17] - PetSc [89]

134 8.4 Generic Orthogonal Range Queries

Compared to the [97, 98], the GSSE uses an orthogonal means of orthogonal range queries, based on the concepts and libraries of [86]. Not only can this approach uses completely orthogonaly, which means that different range querie algorithms and libraries can be used, with the corresponding library centric application design the best performance can be obtained. With the generic compile-time libraries of [86] these approach was used and therewith the following modules can be used interchangeable: - Sequential scans - Projection methods: the advantage of this method is that is fairly easy to implement where only simple sorting for arrays and binary searches are required. It is therefore suitable for compile-time as well as run-time method. But the overall performance is, compared to the other methods, bad. This method does not scale well with the input size and is therefore only suitable for small record distributions. - Kd-trees, quadtrees and octrees: The kd-tree outperforms the octree methods in most of the cases. Next, the octree methods has higher exectuion times and users several times the memroy of a kd-tree. This fact is based on the concept of how these methods organizes their records. The octree method partition the space where the kd-tree only partitiones the records. But, for a given problem there is usually a better cell method that has better execution times and uses less memory. Finally, kd-trees can be used as a generic range query method in a wide range of applications - Cell and sparse cell method: Sparse cell arrays have execution times that are almost as low as dense cell arrays. If the dense cell array has many empty cells, then the sparse cell method reduced the amount of memory storage without reducing the execution time. This means, that all mechanisms from the Wafer-State-Server [97, 98] can be used as well as different new developed techniques. Due to the performance dependencies of orthogonal range queries based on many parameters such as the dimension of the stored record, the number of records,their distribution, and the query range, it is of utmost importance to have a wide variety of range querie modules available. Finally, there is no best method [86] for all different application scenarios. For randomly distributed points in a given domain, the cell method outperform the other methods normally by one order of magnitude, e.g. with 10.000 elements the sequential scan search requires 8.1 seconds, the projection search requires 1.3 seconds, whereas the octree requires 0.363 seconds, and finally the optimized cell search only requires 0.067 seconds. The amount of used memory is about 40 MB for the sequential scan method, about 120 MB for the projection method, 398 MB for the octree and 326 MB for the optimized cell method. But the cell methods can be adjusted greatly based on the parameter to optimize, e.g. the query times are in the range from 0.173 seconds to 0.067 seconds based on the memory consumption of 52 MB up to 326 MB. The GSSE implementation for, e.g., the cell array, is accomplished by: // GSSE domain class typedef std ::map global point list t ; typedef typename global point list t :: const iterator Record t ; typedef point t Multikey t ;

struct gp accessor : public std :: unary function // The functor to access the multikey. { / / }

typedef geom:: CellArray <3,Record t , Multikey t , typename Multikey t :: value type , gp accessor > CellArray; typedef typename CellArray ::BBox BBox; typedef typename CellArray:: SemiOpenInterval SemiOpenInterval;

135 The usage for all different types of range query methods is always given by: typedef typename domain traits ::point t point t ; typedef typename domain traits ::point container t point container t ; domain t domain; point container t point container ;

point container = domain.query pointcontainer(point t(0,0,0), point t(0.2, 0.5, 0.5));

8.5 Generic File Interface, GFI

The fiber bundle model of [1] is used with an implementation for a file format, based on HDF5. Beyond the pure in-memory representation of the data model, its data structures map more or less directly to various file formats. The hierarchical structure inherent to HDF5 provides a natural representation. We made this HDF5 representation of the data model usable an independent C library in order to equip external application with a compatible I/O layer. This C library, called “F5”, provides a simple interface for most application scenarios which occur in external applications [10]. The file resulting from a numerical simulation has to encompasses the time-dependent output of dynamic Equation (10.17) as Ψ,n,p and for the Maxwell equations as E, H, H, D. These fields live on different skeletons of the grid:

• Vertex fields: Ψ,n,p

• Edge fields: A, D

• Face fields: E, H

Using the HDF5 I/O layer of FiberLib2, this output information will appear in a file listing (using the HDF5 standard tool h5ls) as: /T=1.0/GSSE/Points Group /T=1.0/GSSE/Points/Cartesian Group /T=1.0/GSSE/Points/Cartesian/Positions Dataset {98317} /T=1.0/GSSE/Points/Cartesian/psi Dataset {98317} /T=1.0/GSSE/Points/Cartesian/n Dataset {98317} /T=1.0/GSSE/Points/Cartesian/p Dataset {98317} /T=1.0/GSSE/Edges Group /T=1.0/GSSE/Edges/Points Group /T=1.0/GSSE/Edges/Points/Positions Dataset {81317} /T=1.0/GSSE/Edges/Cartesian/A Dataset {81317} /T=1.0/GSSE/Edges/Cartesian/D Dataset {81317} /T=1.0/GSSE/Faces Group /T=1.0/GSSE/Faces/Points Group /T=1.0/GSSE/Faces/Points/Positions Dataset {53965} /T=1.0/GSSE/Faces/Cartesian/E Dataset {53965} /T=1.0/GSSE/Faces/Cartesian/H Dataset {53965} /T=1.0/GSSE/Connectivity Group

136 /T=1.0/GSSE/Connectivity/Points Group /T=1.0/GSSE/Connectivity/Points/Positions Dataset {70965}

The file structure provides a grouping of the simulation fields according to their topological relationship within the mesh. From the mathematical point of view, this file structure is self-describing, yielding an abstraction level III. What is left for future work is the replacement of textual descriptions such as psi, A, D etc. by some ontology-based scheme for describing physical quantities independently of their naming conventions, which even among physicist lead to confusion. The planned introduction of metric units into HDF5 will be useful for this purpose.

137 Chapter 9

Related Work

Modern semiconductor process steps require a great number of different physical models and numerical methods. Increasingly complex physical formulations are required in todays simulation problems. Flexi- bility in the definition of models as well as in numerical solving methods is required. But the transforma- tion of physical model into a computer model can noramly be seen as a very time consuming procedure. Therefore a great number of different applications were developed during the last decades. Up to now, the interoperability between different simulators is becoming more important due to the com- plex requirements in TCAD’s modeling. [12, 14, 15, 79, 99] In the last decade many approaches towards implementing a general purpose simulation environment for the solution of partial differential equations have been taken. Most of the tools resulting from these at- tempts use topological structures which are specialized to a particular discretization scheme. This reduces resource use, but it comes at the cost of greatly diminishing the flexibility of topological traversal. As an example, the finite volume method does not require vertex-faces traversal. However, for some reasons it might be advantageous to implement discretization equations based on a mixed finite element/finite volume scheme which requires such traversal operations. The major step towards a more flexible use of topological structures is presented in [93]. The grid al- gorithm library (GrAL) introduces the first generalized iterator concept [76] based on multi-dimensional data structures. Most of the other environments completely veil the topological information by formalisms such as ele- ment matrices [100] and control functions [101]. Some commercial simulation tools, such as FEMLab, accept the input in the form of a final PDE. For this reason calculations which use non-standard traversal mechanisms are cumbersome or impossible to specify. In the last decades many approaches towards implementing a general purpose simulation environment for the solution of partial differential equations have been taken. Most of the other environments com- pletely veil the topological information by formalisms such as element matrices and control functions. The following table shows significant steps towards a more flexible application design: - 1993: usage of fiber bundles for scientific visualization [9] - 1997: C++ STL - 2000: generic traversal operations [93] - 2001: generic graph library [52] - 2003: separation of traversal and data access [76] - 2004: usage of fiber bundles for scientific visulazation [1]

138 To deal with all of these issues, our institute has developed different simulation environments, libraries, and applications during the last decade: - - STAP [79] is based on a set of high-speed simulation programs for two- and three-dimensional analysis of interconnect structures. The simulators are based on the finite element method and can be used for highly accurate capacitance extraction, resistance calculation, transient electric and coupled electro- thermal simulations. - Minimos-NT [15] is a general-purpose semiconductor device simulator providing steady-state, tran- sient, and small-signal analysis of arbitrary device structures. In addition, Minimos-NT offers mixed- mode device/circuit simulation to embed numerically simulated devices in circuits with compact mod- els. - FEDOS [44] is a finite-element based simulator for oxidation and diffusion phenomena with integrated mesh adaptation.

9.1 Wafer-State Server

A generic data model suitable for process and device simulation, the Wafer-State-Server [97, ?] is based on an analysis of the problems arising in coupling different kinds of process simulators. The analysis was made related to allow an efficient data exchange between simulators even when they are based on different naive file formats. The data model also defines algorithms to perform geometrical operations as they are used in, e.g., topography simulation. A major drawback of the different types of file formats are the semantically incompability which are mainly based on oversimplifications of the used data model. This type of simplification are mostly attached to one kind of simulator but cannot be used elsewhere. In 2002 it was stated [97] that there is no standard for data transformation of TCAD tools. All the solutions are based on a file format transfer isntead of a data format with a corresponding data model. Optimization tasks as they occur in simulation of semiconductor devices were also investigated and algorithms for the Wafer-State-Server were introduced to aid complex high level simulation tasks. One of the main contributions of this work was the concept of a Wafer-State, which defines a state after a simulation step. Also, here no standard has be established. Another contribution was the extraction of requirements for a common TCAD data model: - Persistent storage of simulation data. The tools must store the result of a simulation for later reference and the user does not want to bother with the file format specifics - Handling of gridding/meshing steps. The tool needs a standardized way to interact with different mesh- ing tools. It must be possible to switch ot another mesh generator with the least possible effort - The user needs support to extract topological information in order to manipulate the underlying geom- etry. - All data structures and algorithms offered by the data model should be available in several spatial di- mensions. The Wafer-State-Server implements various geometrical and topological library with geometrical algo- rithms, interpolation mechanism and consistency checks for simplex objects, especially in three dimen- sions. The approach developed in this thesis, the GSSE 8 is mainly developed upon the concepts of the Wafer- State-Server with quite different transformation into an application. First and most importantly, the programming paradigm was shifted from object-oriented programming to

139 a multi-paradigm style. The GSSE does not use any object-oriented hierarchy 5.2 for the implemented objects. Secondly, all algorithms and traversal mechanism are build on libraries rather than implemented data structures. To show two examples, the GSSE traversal mechanisms and the GSSE orthogonal range queries and the corresponding interpolation mechanism. Next, the data format of the fiber bundle concept is used to developed a generic approach for the projection of data into a single file format with the most greatest extensibility. For a conclusion, the GSSE includes all parts of the Wafer-State-Server with a multi-paradigm approach and a complete reusability of each module with the greatest flexiblity based on the fiber bundle data model.

9.2 Amigos,Fedos

The analytical model interface and general object-oriented solver (Amigos, [102]) was developed due to the requirements for a general partial differential equation solver. A common kernel for physical modeling, parameter and model hierarchies, grid-adaptation, numerical solver, simple front-end controller as well as geometry and boundary definitions was defined. Amigos can handle different discretization schemes, such as finite elements and finite volumes. Interesting analysis between STAP and Amigos with the implementation of a new model: 5 days vs. 2 hours. Compared to GSSE, Amigos is a highly flexible and great achievment toward generic application design. But, the used programming paradigm, object-oriented, reduces the reusability greatly. Also, the GSSE uses another high performance approach and as presented in Section (gsse performance) it is as fast as STAP. GSSE uses the mechanisms of a mathematical specification from Amigos, but not with an extra language which has to be analysed. GSSE uses an embedded language for this specification purpose. The main advantage is, that the complete resulting application can be greatly optimized and can even more benefit by each new compiler version.

9.3 Minimos-NT

Minimos-NT [103, 104] is the successor of the well known Minimos [99] and implements concepts for multi-dimensional device simulation based on the finite volume discretization scheme. Various models were developed during the last decades.

9.4 STAP, SCAP

The STAP and SCAP tools were developed for the interconnect analysis for TCAD issues [105, 106], especially for thermal (STAP) and capacitive (SCAP) problems. The finite element discretization scheme was used to implement electrostatical and magnetostatical approximations for two and three dimensions. Mesh packages for layouts (cite CDF) as well as visualization packages were developed. The main advantage of the SAP (STAP and SCAP) is the overall high performance. All algorithms and solving procedure were hand optimized by C specialists. Compared to the GSSE approach, these tool chain has one main advantage. It is only written in C and

140 therewith highly portable and no sophisticated techniques have to be teached/learned to understand or extend the code. The main drawback is the non-existent interoperability of software modules, beside the fact that a great effort was used to write converter and transformation programms. Another drawback is the missing expressivness of the resulting code. Highly optimized code parts can barely be extended from other developers.

9.5 Other Related Works

Various research groups have put a lot of effort into the development of libraries for sub-problems oc- curring in scientific computing. We briefly review the most important library approaches suitable for application design: FiberLib2 Inspired by the Butler’s vector bundle data model [24] and OpenDX, “FiberLib2” is a re-implementation of a data model which was originally conceived [1] for visualizing numerical data originating from of general relativity (GR). Since the mathematics of GR requires explicit treatment of otherwise implicitly assumed properties of space and time, designing a data model to cover GR improves its genericity. The Sophus C++ library [48] aims at coordinate-free formulations. This approach allows to exchange different implementations of the same mathematical concepts through components that are organized into four layers: 1. mesh layer (implements grids for sequential and parallel HPC machines), 2. scalar field layer (numerical discretization schemes such as finite differences and finite elements, including partial derivatives), 3. tensor layer (coordinate systems, matrices and vectors and general differentiation op- erators) and the 4. application layer (solvers for PDEs). However, this approach suffers from severe abstraction penalty and requires a code transformation tool [107]. The FEniCS project [108], which is a unified framework for several tasks in the area of scientific comput- ing, is a great step towards generic modules for scientific computing. Up to now most of the modules are in the prototype state. Compared to our approach, the GSSE rather achieves building blocks for scientific computing, especially an embedded domain language to express mathematical dependencies directly, not only for finite elements. The GSSE provides modules for various discretization schemes such as finite volumes (FV), finite elements (FE), finite differences (FD), and boundary element methods (BE). Femster [109] is a class library for finite element calculations. This means that users must provide their own code for assembling global FE matrices. In other words, Femster implements a general finite element API. Our approach tries to circumvent the tedious tasks, such as matrix assembly, with a generic solver interface. The Generative Matrix Computation Library (GMCL [110]) is a framework based on expression tem- plates, generative C++ programming idioms, and many template metaprogramming facilities, e.g. control structures for static metaprogramming. For our approach, GMCL can be easily integrated in the matrix handling and physical quantity space. The Template Numerical Toolkit (TNT [111]) is a collection of interfaces and reference implementations of numerical objects (matrices) in C++. The toolkit defines interfaces for basic data structures, such as multidimensional arrays and sparse matrices, commonly used in numerical applications. Like the GMCL framework, this approach can be easily combined with our proposed approach. The Boost Graph Library (BGL [52]) is a generic interface which enables access to a graph’s structure, but hides the details of the actual implementation. All libraries which implement this type of interface are interoperable with the BGL generic algorithms. This approach was one of the first in the field of non- trivial data structures with respect to interoperability. The property map concept [52] was introduced and heavily used. Our quantity accessors can be seen as a direct successor of this approach. Unfortunately, the BGL was designed for graphs only. Lower or higher dimensional data structures cannot be handled by

141 this approach. BGL is based on a generic approach to the topic of graph handling and traversal with a standardized generic interface. Therewith all different kinds of graph algorithms can be implemented without worrying about the actual implementation. The interface makes it possible for any graph library that implements the interface to interoperate with the BGL. The approach is similar to the C++ STL approach for interop- erability of algorithms and containers. The Graph Template Library (GTL [112]) is, next to the BGL, an extension to the STL related to graph data structures and algorithms. The Grid Algorithms Library (GrAL [18]) was one of the first contributions to the unification of data structures of arbitrary dimension for the field of scientific computing. A common interface for grids with a dimensionally and topologically independent way of access and traversal was designed. Mathematical concepts for topological spaces were introduced and applied to grids. Applications for the field of solving PDEs were developed as well as mechanisms for distributed grids and meshes. Concepts for an abstract way of accessing data similar to the property map were also introduced and demonstrated in applications. The Computational Geometry Algorithm Library (CGAL [113]) provides components for several ge- ometrical algorithms and data structures in a generic library approach. The main contribution of CGAL is the concept of a kernel [114], which can be parameterized in several ways related to robustness of the operations. The Computational geometry algorithms library implements generic classes and procedures for geomet- ric computing with generic programming techniques. It is based on a three-layer structure, where the kernel layer consists of constant-size non-modifiable geometric primitive objects and operations on these objects. The second layer consists of a collection of basic geometric data structures and algorithms. This layer includes polygons, half-edge data structures, polyhedral surfaces, topological maps, planar maps, ar- rangements of curves, triangulations, convex hulls, shapes, optimization algorithms, dynamic point sets for geometric queries, and multidimensional search trees. The third layer consists of non-geometric support facilities, such as support for number types, and circulators. deal.II [115] provides a framework for finite element methods and adaptive refinement for finite elements. It uses modern programming techniques of C++ and enables the treatment of a variety of finite element schemes in one, two, or three spatial dimensions, as well as time-dependent problems. Therewith modern finite element algorithms, using, among other aspects, sophisticated error estimators, and adaptive meshes can be developed easily. This approach is specialized to finite element discretization schemes and cannot be used easily for other schemes, e.g. finite volume or finite differences. Prophet [101] is an environment for the solution of PDEs. It introduces four different levels of abstraction. The first and base library layer consists of a database for models of coefficients and material parameters, macros for expressions, grid routines, and a linear solver. The PDE layer consists of an assembly control, discretizations, and modeling modules. The third layer consists of different modules, for example modules for solving. The fourth layer is the user layer which consists of an input parser and the visualization mechanism. Prophet separates the geometric information and physic information. The user should not need to know about the spatial discretization. Prophet was originally developed for semiconductor process simulation and was enhanced for device simulation afterwards. ExPDE [116] collects efficient high-performance libraries for PDEs using the C++ technique of expres- sion templates These libraries are an important step into library centric application design. But most of these libraries were not developed with interoperability as a necessary constraint. As a consequence, additional code has to be introduced which slows the development process down and impedes the execution speed of the final

142 application. An analysis has revealed that, up to now, no related work can be used directly. All of these libraries have not been developed with emphasis on interoperability. This issue complicates the transition from one library to another. Therefore, our approach (Section 8.1) introduces a common layer with data structure definition and access routines, where all of these libraries can be used. With the generic programming paradigm and the implementation with templates in C++ the abstraction penalty can be minimized (Section 11).

143 Chapter 10

Generic Application Design

This section introduces various developed applications based on the GSSE and the corresponding con- cepts. Each of these examples uses the automatic model quantity evaluation and communicates over the fiber bundle F5 module.

10.1 Generic Libraries for TCAD

Generic library design deals with the conceptual categorization of computational domains, the reduction of algorithms to their minimal conceptual requirements, and strict performance guarantees. The benefits of this approach are the re-usability and the orthogonality of the resulting software. Generic libraries were pioneered by the Standard Template Library (STL) in C++, but both software and language technology have long gone beyond STL. We compiled libraries which contain functionality meeting four main criteria. First [72, 74, 117, 118], the library should be complete so that all applications can be written exclusively using this library (as well as standard libraries). Indeed completeness increases the usability enormously, because no components have to be added while existing components can be adapted. Second, the parts of the library should be usable for a broad range of different applications [118]. Each of the software components is not only written for a very specific purpose but for a manifold of problems. Third, the interoperability of the library must not be affected by its completeness. Even if the complete library can be used by itself, it has to provide standardized interfaces which guarantee compatibility for data structures which have not been foreseen in the initial design. Meta-programming techniques are used to address performance issues [60]. We thereby show that the benefits of a generic library can be combined with high overall performance when using modern program- ming paradigms and techniques.

144 10.2 Electrostatic Simulation

Various areas within TCAD can be simulated with electrostatic simulations, e.g. interconnect examples. The following equations are used to Analise a potential distribution as well as for extraction of the capacity of a given domain. The used equation in this section is based on a generic Poisson equation stated by:

−div (ε grad (ϕ)) = f in Ω ϕ = 0 on Σ (10.1) where Ω is a bounded open domain with boundary Σ, f is a given function. For an electrostatic problems, which is used in this work in various sections, the following equation is used:

div (−ε grad (ϕ))=0 (10.2) and for Dirichlet boundary conditions:

∀xi ∈ ∂Ωi : ϕ(xi)= Vi (10.3) whereas the surface of the conductor i is given by ∂Ωi . Due to the finite extension of the simulation area homogeneous Neumann boundary conditions are used, expressed by the potential:

∀xi ∈ ∂Ωa : n ε grad (ϕ) = 0 (10.4)

Based on this equation and the corresponding boundary conditions the following Galerkin expression is obtained:

N ϕi wi div (ε grad (nj)) dΩ+ (10.5) Ω Xj=1 Z N ϕj w¯i n ε grad (nj) dA− (10.6) ∂Ω Xj=1 Z

w¯i σ dA = 0 (10.7) Z∂Ω If the weighting functions are differentiable and the operator L(u) is simple, the domain integrals can be transformed by Green’s theorem. The requirements for the differentiability for the basis functions are therefore reduced, but information related to the weightings functions are lost. Therefore a broader class of the shape functions can be used and the resulting formulation is called a Weak formulation.

N N − ϕj grad (wi) ε grad (nj) dΩ + ϕj wi n ε grad (nj) dA + (10.8) Ω ∂Ω Xj=1 Z Xj=1 Z N ϕj w¯i n ε grad (nj) dA − w¯i σ dA = 0 (10.9) ∂Ω ∂Ω Xj=1 Z Z

145 Based on the choice of weighting functions of the boundary residuum with w¯i = −wi vanish both of the boundary integrals with the conormal derivation. Using the Galerkins approach of using the weighting functions equal to the ansatz functions wi = ni the following final expression is obtained:

N ϕj grad (ni) ε grad (nj) dΩ= ni σdA (10.10) Ω ∂Ω Xj=1 Z Z where the following ansatz for the charge distribution σ is used:

N σ = σjnj (10.11) Xj=1 and finally obtains:

N ϕj grad (ni) ε grad (nj) dΩ= (10.12) Ω Xj=1 Z N σj ni nj dA (10.13) ∂Ω Xj=1 Z which can be written in matrix form as:

[A]{ϕ} = {b} (10.14) whereas the matrix [A] is build by an assembly procedure where the local indices are transfered to global indices of the vertices:

M em [A]= [AL ] (10.15) m=1 X

10.2.1 Transformation into the GSSE: Finite Volume

In order to show the applicability of the generic discretization library for elliptic problems such as the Laplace equation, the given equation renders into the following source code: A generic Poisson equation as well as general self-adjoint equations are described by: A L (ϕ) ≡ (∆ ϕ) ε = V ̺ , ell fv e→v d v→e X With the next code snippet the transformation of the equation discretized by finite volumes is presented: eq = sum [ sum(0.0, e) [ pot ∗ orient( 1 , e) ] ∗ A/ d ∗ eps ] − V ∗ rho

146 10.2.2 Transformation into the GSSE: Finite Elements

For the finite element approach the Galerkin’s method is used, see Section 3.3. To obtain the solution several tasks have to be accomplished automatically by the environment:

• Computation of the element stiffness matrices G and elements load b.

• Assembly of the global stiffness matrix A and load vector b.

• Solution of the algebraic equation system Ax = b.

The corresponding algorithms were introduced in Section 6.3. From the G matrix we obtain the local stiffness matrix by using pre-integrated stencil matrices which are covered in the stencil function object. The neighborhood of the cell is traversed and all adjancent elements are obtained automatically. In the case of first order FEM on a triangle this contains exactly 3 elements, namely the matrix entry indices of the vertices. The following source code presents our approach for a generic finite element application for the Poisson equation.

Listing 10.0: for (cit = (∗ segit ). cell begin (); cit != (∗ segit ). cell end (); ++cit ) {

array typeG =g matrix(points); array type stencil = stiffness(G);

stencil = fem neighborhood(∗ segit , ∗ cit );

assemble stencil(stencil , matrix interface ); } } matrix interface .solve ();

The final line invokes the solver interface and for the solution process obtained by the generic solver inter- face. The solver is switched automatically to ensure efficient matrix operations for the used discretization method. GSSE contains precalculated matrices for simplices (1,2,3D) and cuboids (1,2,3D).

10.2.3 Results

The following section presents various examples within the area of TCAD with the given set of discretized equations.

Interconnect Structure

Down-scaling of integrated circuits to the deep sub-micron regime and beyond increases the influence of interconnects on circuit behavior drastically. Parasitic effects are becoming more and more important as devices get faster and line widths smaller. These effects become the limiting factor for further improve- ments of circuit speed.

147 Figure 10.1: Left: the given interconnect structure modeled by the GSSE mesh engine (ref). Right: the current density depicted on the surface of the interconnect lien.

Figure 10.2: Left: the current density depicted by a vector field visualization. Right: ??

Capacitance Measurement

The following structure was created by the automatic layout conversion tool (ref). No explicit measurements can be given, but the resulting capacitance values with different types of finite elements (p-refinement) and different mesh refinement levels (h-refinement) are given.

Figure 10.3: Left: a given structure suitable for capacitance measurement, modeled by the Layout conversion tool (ref). Right: the iso-surface potential distribution within the oxide.

148 10.3 Wave Equation

The introduced concepts for the finite difference methods, and in more particular based on the introduced concepts for generic discretization, the modeling of the wave equation by the GSSE introduces the close distance between the physical field modeling and the final implementation. A implementation based on GSSE is given in this section. Using the insights from Section 3.4.1 an efficient way of specifying the Maxwell equations is given. The corresponding mapping of the cochains onto the chain elements. The staggered grids with the dual complex are mapped into different dimensional elements such as edges and faces for the representation of the electrical field and magnetic field. A transversal magnetic mode is presented. n+1 n l 1 1 Ez (i, j)= Ez (i, j) +∆t [Hn+1/2(i + , j) − Hn+1/2(i − , j) ∆x y 2 y 2 l 1 1 −∆t [Hn+1/2(i, j + ) − Hn+1/2(i, j − ) ∆y x 2 x 2 With the transfer of all index calculations to topological iteration and traversal mechanisms, e.g. the electric field quantity to edges Ee and the magnetic field quantity to facets Hf the formula can be rewritten as:

old Hf Ee − Ee =∆t le∆e→f , Af   where e → f denotes the traversal of all facets incident to the edges, and Af represents the area of the corresponding facet. The evaluation of all quantities on their corresponding dimension and topological objects is completed automatically. The final source code is presented in the following code snippet. The minimal requirement to specify such complex equations can be seen clearly. equation E += t ∗ sum( e) [H]

Figure 10.4 presents a simulation output for a three-dimensional calculation with a simple harmonic - cillating source in the x-y plane.

Figure 10.4: Wave equation with a harmonic oscillating source in the x-y plane at two different time steps.

149 10.4 Device Simulation

Semiconductor devices have become an ubiquitous commodity and users have come to expect a constant increase of device performance. It is this quest for ever decreasing device dimensions and faster switching speeds that results in ever growing requirements on the employed simulation methodology. Semiconductor devices have become an ubiquitous commodity and users have come to expect a constant increase of device performance at higher integration densities and falling prices. To most efficiently elim- inate the technological obstacles encountered the international technology roadmap for semiconductors (ITRS)[119] has been jointly developed by the semiconductor industry. It is this quest for ever decreasing device dimensions and faster switching speeds that results in ever grow- ing requirements on the employed simulation methodology. New, more detailed and more sophisticated models are required to simulate not only the operation but also the manufacture of the devices. While computer performance is steadily increasing the additional complexity of these simulation models eas- ily outgrows this gain in computational power. It is therefore of utmost importance to employ the latest techniques of software development techniques to obtain high performance and thereby ensure adequate simulation times even for complex problems.

10.4.1 The Equations

Boltzmann’s equation for electron transport in semiconductors, as given in Equation 10.16, is the base for many calculations for device simulations. ∂ ∂ f + ~v · grad (f)+ F~ · grad (f)= f| (10.16) ∂t r k ∂t collisions

Here f is the distribution function and v the velocity of the charge carriers, while F~ denotes the force of an electric field on these particles. The inavailability of the computational power required for the solution of Boltzmann’s equation resulted in the development of simplifications that can be used to calculate several important quantities. One of these simplifications is the drift diffusion model, that can be derived from Equation 10.16 by ap- plying the method of moments [78]. While it only yields the electron and hole concentrations, and the current and is unable to incorporate non local effects, it has been used very successfully in semiconduc- tor simulations. Equations 10.17 shows the resulting current relations. These equations are solved self consistently with Poisson’s equation, given in Equation 10.18.

div (Jp)= div(q pµp grad( )Ψ − q Dp grad ( ) p)= −q R

div (Jn)= div(q nµn grad()Ψ+q Dn grad ( ) n)= q R (10.17)

div (grad ( ) ǫ Ψ) = −q (p − n + C) (10.18)

To this end the equations are discretized using the Scharfetter-Gummel [120] scheme resulting in a non- linear equation system of the form

q µn Uth Jn,ij = njB(Λij) − niB(−Λij) (10.19) dij Ψ − Ψ x  j i (10.20) Λij = B(x)= x Uth e − 1

150 The discretization of the given problem using finite volumes:

A 1 div (x) ≈ x grad (x) ≈ ∆ x (10.21) V d e→v v→e X The formulation so obtained can be implemented using virtually any programming language but as has been explained previously a solution offering high performance is greatly desirable. In order to make maintenance of the code as easy as possible and to achieve a maximum of flexibility it is important to keep the code expressive. The great question is how to achieve both of these seemingly contradicting goals at the same time.

10.4.2 Transformation into the GSSE before the application of functional programming to achieve these goals of high expressiveness and high performance is outlined using an example from the field of semiconductor simulation. Calculation results based on the presented techniques are provided as proof of their feasibility. The transformation of the presented equations into highly expressive C++ code under highest performance requirements is presented in the following. The source code of the discretized equations 10.20 is shown in the accompanying code snipped.

// Poisson equation / / equation pot = ( sum() [ diff (ZERO ) [ p o t quan] ∗ area / dist ]+(p −n + C) ∗ ( vol ∗ q / (eps0 ∗ epsr )) )( ∗ vit );

// Continuity equation for electrons / / equation n = ( sum() [ diff (−n quan ∗ Bern( diff ()[ pot quan]/U th), −n quan ∗ Bern( diff ()[ − p o t quan]/U th)) ∗ q ∗ mu n ∗ U th ∗ area/ dist ] )(∗ vit );

// Continuity equation for holes equation p = ( sum() [ diff (p quan ∗ Bern( diff ()[ − p o t quan]/U th), p quan ∗ Bern( diff ()[ pot quan]/U th)) ∗ q ∗ mu p ∗ U th ∗ area/ dist ] )(∗ vit );

151 As can be observed, the above implementation makes no assumptions about the dimension of the problem and is suitable for arbitrary dimension. The concept of mathematical equations is modeled by directly mapping the mathematical expressions to function objects. The Bernoulli function, given in Equation 10.20, is mapped to Bern, while pot quan is a function object providing access to a quantity. The discretized differential operators are easily recognized as the diff and sum constructs. The whole ex- pression is then examined at compile time to construct an optimized means to obtain the needed data. Thereby a link between the still continuous formulation of the equation and a specific tessellation of the simulation domain is formed. As a consequence the implementation makes no assumptions about the dimension or topology of the problem and is therefore suitable for any dimension and any topology. This dimension independent formulation is only made possible by the combination of the separation of algorithms from data sources by the use of iterators and the employment of functional objects. This combination of expressiveness and due to the use of meta programming and its evaluation at com- pile time the calculation associated with the specified equations can be highly optimized by the compiler and thereby ensures superb runtime performance. The combination of high expressiveness at such per- formance levels is only possible by using all the facilities provided by C++. Currently no other language offers sufficient support for all the necessary programming concepts [121] to enable this high level ab- straction at this runtime speed.

10.4.3 Results

To demonstrate that the proposed computational scheme is indeed operational several examples are pre- sented.

Two-Dimensional PN-Diode

First, a silicon PN-diode is simulated based on the given code part. The complete application uses not more then 200 lines of source code for all different dimensions and topological input structures. Figure 10.4.3 depicts the input structure for two and three dimensions.

y ΓD

P-region

N-region

x ΓD

Figure 10.5: Domain for a two-dimensional PN-junction diode

Two neighboring regions of different doping, one doped with acceptors, the other doped with donors result is the well known rectifying PN diode. PN diodes are among the simplest devices in microelectronics.

152 We show the simulation results for a two-dimensional diode that makes use of the presented source code. It is at the core of our simulation and needs no variations to accommodate different dimensions and different mesh types. Figure 10.4.3 depicts the input structure for two dimensions. The concentration of acceptors 16 −3 15 −3 has a maximum of NA = 10 cm , the donors concentration has a maximum of ND = 10 cm . The netto doping profile (ND − NA) of the device presented in Figure 10.4.3 is illustrated in Figure 10.6. The concentration of acceptors is higher to make sure that the depletion region extends further into the larger, donor doped region.

Figure 10.6: Netto doping concentration C of the two-dimensional PN diode. Donors are given in red, acceptors are blue.

The shape of the initial barrier resent without external bias and for reverse mode can be seen in Figure 10.7. When operated in forwards the current is blocked, as is illustrated by the potential by increasing the initial barrier, as shown in Figure 10.8. The visualization of the calculation is available in real time, making it possible to observe the evolution even of a non-linear solution process. GSSE interfaces for real-time analysis are available, e.g. by an extension to OpenDX [9], and proves to be invaluable for the adjustment of simulation parameters.

Figure 10.7: Two-dimensional diode with potential distribution for (left) equilibrium and (right) reverse mode.

153 Figure 10.8: Two-dimensional diode with potential distribution for forward mode.

The space charge distribution for the three operation modes can be seen in Figure 10.10 and for the reverse mode in Figure 10.9

Figure 10.9: Two-dimensional diode with space charge distribution for (left) equilibrium and (right) reverse mode.

Figure 10.10: Two-dimensional diode with space charge distribution for (left) forward and (right) heavy forward mode.

154 By applying various post-processing algorithms, the solved potential and the corresponding dopant dis- tribution is used to calculate the energy level Ec, Ev, Ef by the corresponding algorithms. The given expressions are evaluated at compile-time to obtain a high performance even for the post-processing step. All quantities are equipped with data accessors to be evaluated automatically by the corresponding local- ity: for (vertex iterator vit = (∗ segit ). vertex begin (); vit != (∗ segit ). vertex end(); ++vit) { ( rho = (n − nD − p + nA) ∗ (−q))( ∗ vit ); ( Ec = − q ∗ pot) ( ∗ vit ); ( Ev = − q ∗ p o t − Eg) ( ∗ vit ); ( if (n > p) [ Ef = (Ec+Ev)/2 + q ∗ U th ∗ Log(n/ni) ] . else [ Ef = (Ec+Ev)/2 − q ∗ U th ∗ Log(p/ ni) ] )( ∗ vit ); }

Figure 10.11 shows the shape of the band structure in forward and reverse direction respectively. In for- ward direction the barrier between the two doped regions is lowered, allowing current to flow, Figure 10.11 on the left.

0 1 Conduction band Conduction band Valence band -0.2 Valence band Fermi level 0.5 Fermi level -0.4

0 -0.6 Energy [eV] Energy [eV] -0.8 -0.5 -1

-1 -1.2 0 2 10−8 4 10−8 6 10−8 8 10−8 0 2 10−8 4 10−8 6 10−8 8 10−8 Length [m] Length [m]

Figure 10.11: Bands in forward (left) and reverse (right) operation.

Bipolar Transistor

The visualization of the calculation is available in real time, making it possible to observe the evolution even of a non-linear solution process. GSSE interfaces for real-time analysis are available, e.g. by an extension to OpenDX [9], and proves to be invaluable for the adjustment of simulation parameters. The leftmost figure shows the initial potential, while the rightmost depicts the final solution. The center image shows an intermediate result that has not yet fully converged.

155 y emitter base

P-region

N-region

P-region

collector x

Figure 10.12: Domain for a two-dimensional PNP-transistor.

Figure 10.13: Two-dimensional PNP-transistor with potential distribution for forward mode.

Figure 10.14: Two-dimensional PNP-transistor with potential distribution for forward mode.

Figure ?? show the shape of the band structure in forward and reverse direction respectively. In forward direction the barrier between the two doped regions is lowered, allowing current to flow, see Figure ??, on the left.

156 Figure 10.15: Two-dimensional PNP-transistor with space charge distribution for (left) equilibrium and (right) re- verse mode.

Figure 10.16: Two-dimensional PNP-transistor with space charge distribution for (left) forward and (right) heavy forward mode.

0 1 Conduction band Conduction band Valence band -0.2 Valence band Fermi level 0.5 Fermi level -0.4

0 -0.6 Energy [eV] Energy [eV] -0.8 -0.5 -1

-1 -1.2 0 2 10−8 4 10−8 6 10−8 8 10−8 0 2 10−8 4 10−8 6 10−8 8 10−8 Length [m] Length [m]

Figure 10.17: Bands in forward (left) and reverse (right) operation.

157 10.5 Biological Simulation

Electric phenomena are common in biological organisms such as the discharges within the nervous system, but usually remain within a small scale. In some organisms, however, the electric phenomena take a more prominent role. Some species of fish, such as Gnathonemus petersii from the family of Mormyridae [122, 123], use them for detection of their prey and for communication among their own kind. The Mormyridae inhabit fresh water environments of South Africa, especially rivers. The animals remain dormant during the day and begin their activity only after the sun has set. They feed on small prey within the muddy benthic substrate substrate of their habitat and plankton that dwells near the river bed. The nose region of G. petersii is modified to form a chin or snout like form. The shape representing the fish in the simulations is given in Figure 10.18. The snouts possess very sensitive receptor cells which help to find small prey in the mud. These animals are very interesting for biologists firstly because of their electric sense, secondly because of their behavior: The play behavior they exhibit is unique within the bone fish family.

Figure 10.18: A triangulated model of a electrically active fish. The electrically active organ is marked in red.

The up to 30 cm long fish actively generates electric pulses with an organ located near its tail fin (marked in Figure 10.18). The distortions of the generated field are then picked up by the fish’s receptors in its nose region. It can thereby distinguish between inanimate objects, prey, predators, or members of its own kind. The goal of the simulation is to approximate the electric field generated by the fish and to determine how the electric field changes at the detecting cells, when different objects are introduced in the vicinity of the fish. This is difficult to obtain reliably with live specimen.

10.5.1 The Equations

The electrical phenomena due to the fish is modeled by a relaxation equation which can be directly derived from Maxwell’s equations in a quasi-electrostatic case. It takes the following shape:

∂t div (ε grad (Ψ)) + div (γ grad (Ψ)) = P (10.22)

Here Ψ is the electrostatic potential, while ε and γ describe the permittivity and the conductivity respec- tively. The separation of charge which is actively performed by the fish is modeled by the inhomogeneity P . Equation 10.22 has to be solved numerically, because a closed form analytical solution does not exist for the general case.

10.5.2 Transformation into the GSSE

Equation 10.22 is discretized using the finite volume discretization scheme and is then implemented using the facilities offered by the GSSE. The high semantic level of the specification made possible by the GSSE is illustrated by the following snippet of code:

158 Figure 10.19: Different geometries and meshes equation = sum( v) [ sum( e)[ lineqn(pot,psi) ] ∗(area/dist) ∗ (∗ deltat+eps) ] + vol ∗ ( ( P ∗ deltat) + rho)

The generic traversal mechanisms encapsulates the corresponding dimensional properties and grid types with the sum<> traversal operations. While this satisfies the topological requirements of the finite volume discretization method, additional functionality is required from the framework to correctly assemble the equation system. Automatic data accessors for gamma, eps, and rho are used to ensure the correct locality of the quantities can be determined. Functional programming in C++ uses local scopes which are used in the code snippets with the [,]. Due to the local scopes, the outer function objects have to be explicitly have to make quantities available by stating v and e. The area and dist are used to calculate the values related to the finite volume scheme. It has to be mentioned that the given source code can be used for several dimensions and for different types of underlying grids.

Figure 10.20: Simulation Setting

The simulation domain is divided into several parts as shown in Figure 10.20. The parts include the fish itself, its skin, that servers as insulation, the water the fish lives in, and an object, that represents either an inanimate object or prey. The parameters of each part can be adjusted separately.

159 While the shown specification of the simulation retains a lot of high level semantics and yields a highly performant executable, an accessible interface also needs to be provided to make it accessible to a broad range of people.

10.5.3 Results

For a possible setting of the simulation domain, the potential distribution given in Figure 10.21 is obtained. Various mesh densities can be used for the unstructured discretization of the domain.

Figure 10.21: Figure showing the results of a simulation done with very fine mesh.

The realization of this example is implemented as a web application which can be accessed under http: //webapplication.gsse.at/ to give the reader the opportunity for experimentation. The user can choose from three different geometric constellations of the fish and an object (thing) as shown in Figure 10.19. Three different mesh resolutions are available with approximate simulation times given for each of them.

160 10.6 Diffusion Simulation

The diffusion simulation can be easily combined with the wave equation simulation.

10.6.1 The Equations

Differential Equation:

div (f)+ α∂tf = 0 on Ω (10.23)

Boundary Condition:

∂nf = 0 on ∂Ω1 (10.24)

f = B(x,y,z) on ∂Ω2 (10.25)

A diffusion equation of parabolic type can be assembled in analogy to the Laplace equation. The dis- cretized formulation of such an equation looks remarkably similar to the elliptic equation, but includes an additional time derivative:

A f − f L (f) ≡ (∆ f) + old = 0 , para fv e→v d ∆t v→e X where ∆t denotes the time step and fold stands for the solution function f in the last time step. A proper functional formulation of the operator can be given by eq = sum [ sum(0.0, e) [ f( 1) ∗ orient( 1 , e) ] ∗ A/ d ] + (f( 1) − f old( 1) ) / dt

This scheme is called the backward Euler [124] discretization of the diffusion problem. The forward Euler discretization [124] which avoids the solution of an equation system but is unstable, if time steps are too high, can be formulated analogously.

A f − f L (f) ≡ (∆ f ) + old = 0 para fv e→v old d ∆t v→e X The corresponding source is given by: eq = sum [ sum(0.0, e) [ f old( 1) ∗ orient( 1 , e) ] ∗ A/ d ] + (f( 1) − f old( 1)) / dt

161 10.6.2 Transformation into the GSSE

Backward Euler

if (domain. get type(∗ vit) == domaintype interior neumann) { equation = ( sum( v) [ sum( e) [ lequ(f, f entry) ] ∗ (area / dist) ] + lequ(0.0, f entry) ∗ vol / time step )( ∗ vit ); } else if (domain. get type(∗ vit) == domaintype dirichlet) { equation = ( lequ(f, f entry) − f bnd )( ∗ vit ); }

Foward Euler

if (domain. get type(∗ vit) == domaintype interior neumann) { equation = ( sum( v) [ sum( e) [ f old ] ∗ (area / dist) ] + (lequ(f, f entry) − f old) ∗ vol / time step )( ∗ vit );

// Update the values

double residuum = equation(); double dependent = equation[f entry(∗ vit )];

f(∗ vit) −= residuum / dependent;

} else if (domain. get type(∗ vit) == domaintype dirichlet) { equation = f bnd(∗ vit ); }

Crank-Nicolson

if (domain. get type(∗ vit) == domaintype interior neumann) { equation = ( sum( v) [ sum( e) [ (cn factor ∗ lequ(0.0, f entry)) + f ] ∗ (area / dist)

162 ] + (lequ(0.0, f entry )) ∗ vol / time step )( ∗ vit ); } else if (domain. get type(∗ vit) == domaintype dirichlet) { equation = ( lequ(f, f entry) − f bnd )( ∗ vit ); }

10.6.3 Results

Figure 10.22: Two-dimensional diffusion simulation for structured and unstructured meshes with the forward Euler time stepping.

163 Chapter 11

Performance Analysis and Optimization

The performance of computer systems with respect to a given set of applications depends on numerous factors and can not be attributed only to the speed of the central processing unit (CPU). Among the most important factors is the connection of the CPU to the computers main memory [125]. Naturally the focus of the evolution of CPUs was to increase their processing speed. This goal was greatly aided and in fact only made possible by continuous downscaling of the dimensions of the devices are built on. This downscaling of the densely packed logic found in CPUs made it possible to attain ever higher clock speeds thereby increasing their maximum performance. The main effect for random access memory (RAM) modules, on the other hand, was to increase their size, again by an ever growing level of integration. The trend of increasing clock speeds, especially of CPUs, has already led to the problem that CPUs require data at a faster rate than memories are able to supply it. This has resulted in the development of memory hierarchies introducing several levels of caches and instruction pipelines, thereby increasing the overall performance of the systems. From a software point of view, new paradigms such as OO, GP, FP, and MP are now available and can even be used together. The non-trivial and highly complex scenario of scientific computing yields itself exceptionally well to the combination of these different programming paradigms with respect to their advantage. The features of meta-programming offer the embedding of domain specific terms and mech- anisms directly into the language as well as compile-time algorithms to obtain an optimal run-time. The generic programming paradigm establishes homogeneous interfaces between algorithms and data struc- tures without subtyping polymorphism. Functional programming eases the specification of equations and offers extendable expressions while retaining the functional dependence of formulae due to higher order functions. The language of choice is C++ due to the possible high execution speed of the compiled code [87] such as object oriented, generic (GP), functional (FP [68, 95, 126]), and meta-programming (MP [60]). It has also been shown that C++ supports the development of efficient and high performance libraries capable of handling complex topics such as graph treatment [52, 63, 121] unlike pure functional programming languages like Haskel or ML. The basic parts of how to achieve high performance in C++ are based on the usage of templates. There- with the compiler’s data-type-based function selection at compile time leading to a global optimization with inlined function blocks is possible. Additionally lightweight object optimization [88] with frequent allocation to registers is made available. The C and Fortran languages do not offer techniques for a variable degree of optimization, such as con- trolled loop unrolling [127, 128]. Such tasks are left to the compiler. Therefore, libraries have to use special techniques such as practiced by ATLAS [129] or have to rely on manually tuned code elements which have been assembled by domain experts or the vendors of the microprocessor architecture used.

164 Thereby, a strong dependence on the vendor of the microprocessor is incurred. In short, these methods hugely complicate the development process of high performance libraries. The basic parts of how to achieve high performance in C++ can be summarized as follows:

• Parametric polymorphism: with the compiler’s data-type-based function selection at compile time, a global optimization with inlined function blocks and inter-procedure/inter-library optimization is possible.

• Lightweight object optimization [88]: allocation to registers is possible with the reduction of struc- tures to their basic parts.

The unique way parametric polymorphism is realized in C++ [64, 130] makes it possible to write compile- time libraries that enable an optimization across the boundaries of the libraries, thereby reaching new performance optima at the expense of increased compile time. This has already been demonstrated in the field of numerical analysis, yielding figures comparable to Fortran [69, 87, 131], the previously undisputed candidate for this kind of calculation. The basic parts of how to achieve high performance in C++ can be summarized with: - Static polymorphism [128]: with the compiler’s data-type based function selection at compile time, a global optimization with inlined function blocks and inter-procedure/inter-library optimization is possi- ble. - Lightweight object optimization [88]: allocation to registers is possible with the reduction of structures to their basic parts The C and Fortran languages do not offer techniques for a variable degree of optimization, such as con- trolled loop unrolling [127, 128]. Such tasks are left to the compiler. Therefore, libraries have to use special techniques as employed by ATLAS [129] or have to rely on manually tuned code elements which have been assembled by domain experts or the vendors of the microprocessor architecture used. Thereby, a strong dependence on the vendor of the microprocessor is incurred. In short, these methods greatly complicate the development process of high performance libraries. Although functional and generic programming support the parallelization significantly, the currently used computer technology restricts this effort due to their architecture. For vector lengths smaller than 104, cache hits reveal the full computation power of the CPU, longer vectors show the limits imposed by memory band width. A typical representative of a 0-cell complex is the topological structure of a simple array. The C++ STL containers such as vector or list are representatives and are schematically depicted in Figure 11.1. The points represent the cells on which data values are stored whereas Figure 11.2 presents a 1-cell com- plex type with different dimensional traversal mechanisms. The arrows highlight the actual topological traversal mechanisms.

Figure 11.1: Topological traversal of vertices.

Based on this traversal mechanisms, a transformation of a simple discretized equation to GSSE code is straightforward. Here we present an example of the Poisson equation with a finite volume discretization scheme [40, 78]. This equation can be formulated as:

Poisson equation with our DSEL

165 Figure 11.2: Topological traversal of cells and incident vertices of a cell.

equation = ( sum [ diff [pot] ∗ A/d ∗ sum[] / 2 ] − q )(vertex );

The term pot represents the distributed potential, A the Voronoi area, d the distance of two points from each other, q the charge, and epsilon the permittivity. It is important to stress that all quantities have to be evaluated in their proper context (data locality), that is pot, epsilon, and q on vertices and A, d on the corresponding edges. To show the maximum of achievable performance different computer systems are summarized in the following table showing the CPU types, amount of RAM, and the compilers used. A balance factor (BF) is evaluated by dividing the maximum MFLOPS measured by ATLAS [129] by the maximum memory band width (MB) in megabytes per second measured with STREAM [125]. This factor states the relative cost of arithmetic calculations compared to memory access.

Type Speed RAM GCC MFLOPS MB/s Balance factor P4 1 x 2.8 GHz 2 GB 4.0.3 2310.9 3045.8 0.7 AMD64 1 x 2.2 GHz 2 GB 3.4.4 3543.0 2667.7 1.2 IBMP655 8 x 1.5 GHz 64 GB 4.0.1 16361.7 4830.4 3.4 G5 4 x 2.5 GHz 8 GB 4.0.0 24434.0 3213.3 7.6

11.1 Topological Traversal

First, we will focus on the advantages of a generic topology library that includes all classes and algorithms for the most important topological cell complex types such as simplex and cuboid types. This library is based on all the requirements extracted from the applications developed at our institute. It greatly reduces the application development and the information necessary to construct them. It ac- complishes this by storing topological data and entering a formal interface specification [72]. It provides incidence traversal and orientation operations with a generic interface similar to the C++ STL. Therewith algorithms and complete discretization schemes can be specified orthogonally without specifying the ac- tual topological structure. Problems which are hard to adapt to existing libraries due to the restrictions of the underlying data structures, can be handled easily using this library. To investigate the abstraction penalty of our generic code we analyze a simple C implementation without

166 any generic overhead compared to our topological environment. The next code snippet presents the C source code for a three dimensional 0-cell complex (cube):

C approach for 0-cell traversal for (i3 = 0; i3 < sized3; i3++) for (i2 = 0; i2 < sized2; i2++) for (i1 = 0; i1 < sized1; i1++) { // operations // use i1, i2, i3 }

Generic approach for 0-cell traversal vit1 = container.vertex begin (); vit2 = container.vertex end ();

for (;vit1 != vit2; ++vit1) { // operations // use ∗ vit1 }

Figure 11.3 presents results obtained on four computer systems. As can be seen, the performance is com- parable on all different systems without incurring abstraction penalty of the highly generic code compared to a simple C implementation.

10 10 10 10 10 10 10 10

C code C code C code C code DSEL DSEL DSEL DSEL

9 9 9 9 10 10 10 10 Iterations/s Iterations/s Iterations/s Iterations/s

8 8 8 8 10 10 10 10

7 7 7 7 10 3 6 9 10 3 6 9 10 3 6 9 10 3 6 9 10 10 10 10 10 10 10 10 10 10 10 10 Number of points Number of points Number of points Number of points

Figure 11.3: 0-cell traversal on the following computer systems (from left to right): P4, AMD64, G5, and IBM.

To provide a more complex example, we compare the traversal mechanisms of the BGL data structures to our own approach. The following code snippets present the corresponding implementations.

BGL incidence traversal typedef adjacency list Graph; Graph gr(number of points ); // edge initialization graph traits ::edge iterator ei, ei end;

for (tie(ei, ei end) = edges(gr); ei != ei end; ++ei) { test source1 += source(∗ ei, gr);

167 test source2 += target(∗ ei, gr); }

Generic approach for incidence traversal typedef topology t t ; t t container(number of points ); // cell initialization typedef topology traits ::cell on vertex iterator covit t ; covit t covit = covit it(∗ container .vertex begin ()); while (covit.valid ()) { test source1+= source(∗ covit , container); test source2+= target(∗ covit , container);

++covit ; }

BGL BGL BGL BGL DSEL DSEL DSEL DSEL

9 9 9 9 10 10 10 10 Traversions/s Traversions/s Traversions/s Traversions/s 8 8 8 8 10 10 10 10

4 6 4 6 4 6 4 6 10 10 10 10 10 10 10 10 Number of vertices Number of vertices Number of vertices Number of vertices

Figure 11.4: Incidence traversal for the BGL and our approach on the following computer systems (from left to right): P4, AMD64, G5, and IBM.

Figure 11.4 presents the run-time results for edge on vertex traversal. The overall run-time behavior of our approach is comparable to that of the BGL.

11.2 Equation Specification

The second important part is based on a functional specification library [72, 118] which implements the basic features for a mathematical domain specific language. The specification of equations with automatic derivations by means of truncated Taylor series as well as discretized differential operators are the major parts for this library. The library offers support for different discretization schemes embedded into a functional specification language which offers higher-order functions. Therewith algorithms and equations are specified independently of the actual dimension, topology, and numerical data type of the cell complex type. Due to the use of meta programming and its evaluation at compile time the calculation associated with the specified equations can be highly optimized by the compiler and thereby ensures excellent run-time performance. To compare different equation specification approaches we analyze several techniques which are available in C++. All approaches are compared to a hand-optimized Fortran 77 implementation:

168 - Naive C++ implementation with std::vector - Blitz++ Version 0.9 [131] - Simple version of expression templates [60] - C++ std::valarray - Boost phoenix library (our DSEL) The following compiler switches were used: AMD64: −O3 −march=k8 −funroll −loops −mtune=k8 −fforce −addr P4 : −O3 −march=pentium4 −funroll −loops −mtune=pentium4 −mfpmath=sse −ffast −math G5 : −O3 −mcpu=G5 −funroll −loops −mtune=G5 IBM : −O3 −funroll −loops −mcpu=powerpc64 −maix64 −pipe

The test is performed using a vector addition Af = Ab + Ac + Ad, evaluated with different vector sizes. Figure 11.5 compares the different approaches. For vector lengths smaller than 104, cache hits reveal the full computation power of the CPU, longer vectors show the limits imposed by memory bandwidth.

2500 2500 2500 2500 Vector Valarray Valarray Vector Blitz++ Blitz++ 2000 Blitz++ Blitz++ 2000 F77 2000 F77 2000 F77 F77 ET Naive C++ Naive C++ ET Valarray Valarray 1500 1500 1500 1500 DSEL DSEL MOps/s MOps/s MOps/s MOps/s 1000 1000 1000 1000

500 500 500 500

0 2 3 4 5 0 2 3 4 5 0 2 3 4 5 0 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Vector length Vector length Vector length Vector length

Figure 11.5: Performance of the evaluated expression on the following computer systems (from left to right): P4, AMD64, G5, and IBM.

For vector lengths smaller than 104, cache hits reveal the full computation power of the CPU, longer vectors show the limits imposed by memory band width. For the functional equation specification, the Blitz++ benchmark is used to estimate the overall perfor- mance behavior of our computer systems. Afterwards, simple vector addition are compared on different computer systems and with different compiler versions. The influence of the actual computer system as well as the compiler used with the corresponding optimization switches are presented. To analyze the influence and ongoing advancements in compiler technology, we have analyzed different compiler versions. Therewith the influence on different machines can be clearly separated into the system related differences and the compiler based differences. As a base foundation we use the Blitz++ bench- mark system which uses a vector operation (DAXPY) f = α · x + y, evaluated with different vector sizes. Finally, we compare our vector addition benchmark on the P4 with different compilers.

169 2500 2500 2500 2500

Vector Vector Vector Vector Blitz++ Blitz++ Blitz++ Blitz++ 2000 2000 2000 2000 F77 F77 F77 F77 BLAS BLAS BLAS BLAS Valarray Valarray Valarray Valarray 1500 1500 1500 1500 MOps/s MOps/s MOps/s MOps/s 1000 1000 1000 1000

500 500 500 500

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Vector length Vector length Vector length Vector length

Figure 11.6: Blitz++ benchmark with different compiler versions: GCC-4.0.3, GCC-4.1.0, GCC-4.1.1, GCC-4.2.0

2500 2500 2500 2500

Vector Vector Vector Vector Blitz++ Blitz++ Blitz++ Blitz++ 2000 2000 2000 2000 F77 F77 F77 F77 ET ET ET ET Valarray Valarray Valarray Valarray 1500 1500 1500 1500 DSEL DSEL DSEL DSEL MOps/s MOps/s MOps/s MOps/s 1000 1000 1000 1000

500 500 500 500

0 2 3 4 5 0 2 3 4 5 0 2 3 4 5 0 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Vector length Vector length Vector length Vector length

Figure 11.7: Different compiler versions on the P4: GCC-4.0.3, GCC-4.1.0, GCC-4.2.0, Intel 9.0

As can be seen, the functional approach based on the Boost phoenix library performs well with recent compilers such as the GCC-4.1.0 compiler generation. GCC-4.2.0 is still under development and only presented to show the possibilities of the emerging compiler generations. The Intel 9.0 compiler does not perform well with all different specification mechanisms.

11.3 Interconnect Performance Analysis

It should be noted that for high precision simulations it is essential to model the simulation domain as exactly as possible. The accuracy and efficiency of a finite element and finite volume simulation strongly depends on the quality of the tessellation of the domain. As a consequence we introduced a comprehensive solid modeling and mesh generation and adaptation approach [80]. Our interconnect simulation tools use the finite element method to discretize the partial differential equa- tions resulting in a system of equations that eventually has to be linearized, and thereafter solved with a preconditioned conjugate gradient algorithm [17]. To give a glimpse on details we consider a typical problem of forming the equation system. The problem we consider is posed in the following way:

LΨ := div(−ε grad(Ψ)) − ̺ = 0 in Ω (11.1)

Ψ − ΨD = 0 on ∂Ω , (11.2) where ε denotes the (isotropic) permittivity of the considered domain, which is assumed to be constant in an element of the tessellation. Due to the weak formulation using Galerkin finite elements [42] weighting coefficients for the local element matrices have to be derived for tetrahedra: K2 + K2 + K2 ge = ε 11 21 31 (11.3) 1 detJ

170 there, K is the adjoint matrix of the Jacobian J which is derived by the affine transformation of the mesh elements to the standard element. Due to operator overloading different mathematical structures such as scalars, vectors, and even matrices can be handled using identical notation. Therewith the transformation into code results in the following code snippet: double g1 = epsilon*(K11*K11 + K21*K21 + K31*K31)/detJ;

To assemble the system matrix, a local element matrix has to be assembled: Se stands for the local element stiffness matrix which is derived by:

e e e e e e e S = g1S1 + g2S2 + g3S3 + g4S4 + g5S5 + g6 (11.4)

The corresponding C++ code reads:

Se = g1*S1 + g2*S2 + g3*S3 + g4*S4 + g5*S5 + g6*S6;

S1-S6 means the linear form function matrices and g1-g6 are calculated at the nodes of the tetrahedra in the global coordinate system [105]. Figure 11.8 presents the example structure under investigation with a coarse mesh for the following per- formance analysis. In order to obtain sufficiently accurate results, the mesh size typically has to be in the order of 104 to some 105 nodes.

Figure 11.8: Temperature distribution due to self-heating in a tapered interconnect line with cylindrical vias.

For a rigorous analysis we evaluate three different implementations in C++ and compare them to a hand- optimized Fortran 77 implementation on different computer architectures. The first implementation is based on the GNU GCC valarray data-type which is a standardized data-structure representing a math- ematical vector. This data-type has shown excellent performance on different computer architectures with recent compilers. Secondly, we utilize the Blitz++ [132] library, which introduced high performance cal- culation comparable to Fortran 77 directly in C++. Lastly, a naive C++ implementation is used that creates two temporary objects, one for the addition and one for the assignment. As a consequence all elements have to be accessed three times. The tests were performed on four different computer systems: Figures ??-?? compare these different approaches on different hardware architectures. The y-axes is labeled with million operations per second. The vector addition consists of 3 operations, two additions and one assignment. For vector lengths smaller than 104, cache hits reveal the full computation power of the CPU, longer vectors show the limits imposed by memory bandwidth. The poor performance of naive C++ code is indeed remarkable. Based on these observations of restrictions due to the limited bandwidth, we illustrate the influence of a problem’s size on the overall finite element. We therefore investigate our test structure with different levels

171 of refinement. By resolving a three-dimensional simulation domain, the number of points easily exceeds the critical threshold and thereby leads to severe problems caused by memory bandwidth restrictions (Figure 11.9).

200

150

100 Equations / ms

50

3 4 5 6 10 10 10 10 Number of vertices

Figure 11.9: Comparison of the finite element assembly times.

Investigations of parallelization attempts on multi-processor machines (G5) show that the inner loop of the finite element assembly cannot be parallelized easily. On the one side, the update mechanisms of the element matrices may require access to the same part of memory simultaneously, which could be avoided by a different assembly scheme, e.g. node-based assembly. On the other hand, the inner loops are compiled very efficiently and only bounded by memory bandwidth. Tests with four CPUs have shown, that parallel assembly does not speed up the total assembly process at all. Restrictions resulting from memory bandwidth completely negate any benefit due to parallelization.

172 Chapter 12

Own Publications

Contributions to Books:

[B1] H. Ceric, R. Heinzl, Ch. Hollauer, T. Grasser, S. Selberherr: Microstructure and Stress Aspects of Electromigration Modeling. In Stress-Induced Phenomena in Metallization, American Institute of Physics, Melville, 2006, ISBN: 0-7354-03104, 262 - 268.

[B2] W. Benger, R. Heinzl, W. Kapferer, W. Schoor,M. Tyagi, S. Venkataraman, G. Weber Proceedings of the 4th High-End Visualization Workshop. Proc. of the 4th High-End Visualization Workshop, Tyrol, Austria, July 2007, ISBN: 978-3-86541-216-4

Papers in Journals:

[J1] R. Heinzl, P. Schwaha, and T. Grasser. A High Performance Generic Scientific Simulation Envi- ronment. In Springer Journal, accepted.

[J2] M. Spevak, R. Heinzl, P. Schwaha, and T. Grasser. A Computational Framework for Topological Operations. In Springer Journal, accepted.

[J3] R. Heinzl, P. Schwaha, and S. Selberherr. A Generic Topology Library. In Elsevier Science Ltd., accepted.

Publication in Proceedings:

[P1] A. Sheikholeslami, E. Al-Ani, R. Heinzl, C. Heitzinger, F. Parhami, F. Badrieh, H. Puchner, T. Grasser, and S. Selberherr. Level Set Method Based General Topography Simulator and its Applications in Interconnect Processes. In Intl. Conf. on Ultimate Integration of Silicon, pages 139–142, Bologna, Italy, July 2005.

[P2] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser. A Novel Technique for Coupling Three- Dimensional Mesh Adaptation with an A Posteriori Error Estimator. In Proc. 2005 PhD Research in Microel. and Elect., pages 175–178, Lausanne, Switzerland, July 2005.

[P3] A. Sheikholeslami, F. Parhami, R. Heinzl, F. Badrieh, H. Puchner, T. Grasser, and S. Selberherr. Applications of Three-Dimensional Topography Simulation in the Design of Interconnect Lines. In Proc. Conf. Sim. of Semiconductor Processes and Devices, pages 187–190, Tokio, Japan, September 2005.

173 [P4] P. Schwaha, R. Heinzl, M. Spevak, and T. Grasser. Coupling Three-Dimensinoal Mesh Adaptation with an A Posteriori Error Estimator. In Proc. Conf. Sim. of Semiconductor Processes and Devices, pages 235–238, Tokio, Japan, September 2005. [P5] R. Heinzl and T. Grasser. Generalized Comprehensive Approach for Robust Three-Dimensional Mesh Generation for TCAD. In Proc. Conf. Sim. of Semiconductor Processes and Devices, pages 211–214, Tokio, Japan, September 2005. [P6] P. Schwaha, R. Heinzl, W. Brezna, J. Smoliner, H. Enichlmair, R. Minixhofer, and T. Grasser. Fully Three-Dimensional Analysis of Leakage Current in Non-Planar Oxides. In 2005 Europ. Sim. and Modeling Conf., pages 469–473, Porto, Portugal, October 2005. [P7] E. Al-Ani, R. Heinzl, P. Schwaha, T. Grasser, and S. Selberherr. Three-Dimensional State of the Art Topography Simulation. In 2005 Europ. Sim. and Modeling Conf., pages 430–432, Porto, Portugal, October 2005. [P8] R. Heinzl, P. Schwaha, M. Spevak, and T. Grasser. Adaptive Mesh Generation for TCAD with Guaranteed Error Bounds. In Proc. 2005 Europ. Sim. and Modeling Conf., pages 425–429, Porto, Portugal, October 2005. [P9] M. Spevak, R. Heinzl, P. Schwaha, and T. Grasser. Simulation of Microelectronic Structures using A Posteriori Error Estimation and Mesh Adaptation. In Proc. of the 5th MATHMOD Conf., page 317, Vienna, Austria, February 2006. [P10] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser. Concepts for High Performance Generic Sci- entific Computing. In Proc. of the 5th MATHMOD Conf., volume 1, Vienna, Austria, February 2006. [P11] M. Spevak, R. Heinzl, P. Schwaha, and T. Grasser. Process and Device Simulation with a Generic Scientific Simulation Environment. In Proc. MIEL 2006, volume 2, pages 475–478, Belgrad, Serbia and Montenegro, April 2006. [P12] P. Schwaha, R. Heinzl, W. Brezna, J. Smoliner, H.Enichlmair, R. Minixhofer, and T. Grasser. Leak- age Current Analysis of a Real World Silicon-Silicon Dioxide Capacitance. In Proc. of the ICCDCS, pages 365–370, Playa del Carmen, Mexico, April 2006. [P14] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser. Multidimensional and Multitopological TCAD with a Generic Simulation Environment. In Proc. of the ICCDCS, pages 173–176, Playa del Carmen, Mexico, April 2006. [P15] R. Heinzl, M. Spevak, and P. Schwaha. A Novel High Performance Approach for Technology Computer Aided Design. In Proc. of the 12th EEICT Conf., volume 4, pages 446–450, Brno, Czech Rep., April 2006. [P16] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser A Generic Scientific Simulation Environment for Multidimensional Simulation in the Area of TCAD. In Proc. of the NSTI., volume 4, pages 526–529, Boston, USA, May 2006. [P17] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser. High Performance Process and Device Simula- tion with a Generic Environment. In Proc. of the 14th Iranian Conf. on El. Eng. ICEE 2006, pages 446–450, Teheran, Iran, May 2006. [P18] A. Sheikholeslami, R. Heinzl, S. Hozler, C. Heitzinger, M. Spevak, M. Leicht. O. H¨aberlen, J. Fug- ger, F. Badrieh, F. Parhami, H. Puchner, T. Grasser, and S. Selberherr. Applications of Two- and Three-Dimensional General Topography Simulator in Semiconductor Manufacturing Processes. In Proc. of the 14th Iranian Conf. on El. Eng. ICEE 2006, pages 446–450, Teheran, Iran, May 2006.

174 [P19] M. Spevak, R. Heinzl, P. Schwaha, and T. Grasser. A Computational Framework for Topological Operations. In Proc. of the PARA Conf., page 57 ,Umea, Sweden, June 2006.

[P20] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser. A High Performance Generic Scientific Simula- tion Environment. In Proc. of the PARA Conf., page 61, Umea, Sweden, June 2006.

[P21] P. Schwaha, R. Heinzl, M. Spevak, and T. Grasser. Advanced Equation Processing for TCAD. In Proc. of the PARA Conf., page 64, Umea, Sweden, June 2006.

[P22] R. Heinzl, P. Schwaha, M. Spevak, and T. Grasser. Performance Aspects of a DSEL for Scientific Computing with C++. In Proc. of the POOSC Conf., pages 37–41 , Nantes, France, July 2006.

[P23] P. Schwaha, R. Heinzl, M. Spevak, and T. Grasser. A Generic Approach to Scientific Computing. In Proc. of the ICCAM, page 137, Leuven, Belgium, July 2006.

[P24] M. Spevak, P. Schwaha, R. Heinzl, and T. Grasser. Automatic Linearization using Functional Programming for Scientific Computing. In Proc. of the ICCAM, page 147, Leuven, Belgium, July 2006.

[P25] M. Spevak, R. Heinzl,P. Schwaha, T. Grasser, and S. Selberherr. A Generic Discretization Library. In Proc. of the OOPSLA, pages 95–100, Portland, USA, October 2006.

[P26] R. Heinzl, M. Spevak, P. Schwaha, and S. Selberherr. A Generic Topology Library. In Proc. of the OOPSLA, pages 85–93, Portland, USA, October 2006.

[P27] R. Heinzl, M. Spevak, P. Schwaha, and S. Selberherr. Performance Analysis for High-Precision Interconnect Simulation. In Proc. of the ESMC, pages 113–116, Toulouse, France, October 2006.

[P28] P. Schwaha, R. Heinzl, G. Mach, C. Pogoreutz, S. Fister, and S. Selberherr. A High Performance Webappliation for an Electro-Biological Problem. In Proc. of the ECMS, pages 218–222, Prague, Czech Republic, June 2007.

[P29] R. Heinzl, P. Schwaha, C. Giani, and S. Selberherr. Modeling of Non-Trivial Data Structres with a Generic Scientific Simulation Environment. In Proc.of the 4th High-End Visualization Workshop, pages 3–11, Tyrol, Austria, June 2007.

[P30] P. Schwaha, C. Giani, R. Heinzl, and S. Selberherr. Visualization of Polynomials Used in Series Expansion. In Proc.of the 4th High-End Visualization Workshop, pages 137–146, Tyrol, Austria, June 2007.

[P31] W. Benger, G. Ritter, and R. Heinzl, The Concepts of VISH. In Proc.of the 4th High-End Visual- ization Workshop, pages 26–39, Tyrol, Austria, June 2007.

[P32] R. Heinzl, P. Schwaha, and S. Selberherr Modern Concepts High-Performance Scientific Comput- ing. In Proc. of the ICSoft, pages , Barcelona, Spain, July 2007.

[P33] P. Schwaha, R. Heinzl, and S. Selberherr Simulation Methodologies for Scientific Computing. In Proc. of the ICSoft, pages , Barcelona, Spain, July 2007.

[P34] R. Heinzl, G. Mach, P. Schwaha Labtool - A Managing Software for Computer Courses. In Proc. of the ESM, pages , St. Julian’s, Malta, October 2007.

[P35] R. Heinzl, G. Mach, P. Schwaha Performance Test Platform. In Proc. of the ESM, pages , St. Julian’s, Malta, October 2007.

175 [P36] R. Heinzl, P. Schwaha, and S. Selberherr FV, FD, FE and the GSSE. In Proc. of the ESM, pages , St. Julian’s, Malta, October 2007.

[P37] P. Schwaha, R. Heinzl, and S. Selberherr Biological Simulation with GSSE. In Proc. of the ESM, pages , St. Julian’s, Malta, October 2007.

[P38] P. Schwaha, R. Heinzl,and S. Selberherr Conforming Discretization of Semiconductor Equations. In Proc. of the ESM, pages , St. Julian’s, Malta, October 2007.

[P39] F. Stimpfl, R. Heinzl, P. Schwaha, and S. Selberherr A Multi-Mode Mesh Generation Approach for Scientific Computing. In Proc. of the ESM, pages , St. Julian’s, Malta, October 2007.

176 Bibliography

[1] W. Benger, “Visualization of General Relativistic Tensor Fields via a Fiber Bundle Data Model,” Ph.D. dissertation, Freie Universit¨at Berlin, 2004.

[2] P. Bochev and M. Hyman, “Principles of Compatible Discretizations,” in Proc. of IMA Hot Topics Workshop on Compatible Discretizations, vol. IMA 142. Springer Verlag, 2006, pp. 89–120.

[3] B. Stroustrup, The C++ Programming Language. New York: Addison Wesley, 1995.

[4] D. Gregor, J. J¨arvi, J. Siek, B. Stroustrup, and G. R. A. A. Lumsdaine, “Concepts: Linguistic Support for Generic Programming in C++,” SIGPLAN Not., vol. 41, no. 10, pp. 291–310, 2006.

[5] T. Archer and A. Whitechapel, Inside C#, 2nd ed. Redmond, WA, USA: Microsoft Press, 2002.

[6] W. Bright, “The D programming language,” Dr. Dobb’s J., vol. 27, no. 2, pp. 36–40, 2002.

[7] I.Kazi, H. Chen, B. Stanley, and D. Lilja.

[8] E. Bethel, “Interoperability of Visualization Software and Data Models is Not an Achievable Goal,” in VIS ’03: Proceedings of the 14th IEEE Visualization 2003 (VIS’03). Washington, DC, USA: IEEE Computer Society, 2003, p. 114.

[9] IBM Visualization Data Explorer, 3rd ed., IBM Corporation, Yorktown Heights, NY, USA, Feb. 1993.

[10] W. Benger, G. Ritter, and R. Heinzl, “The Concepts of VISH,” in Proceedings of the 4th High-End Visualization Workshop, (2007), Tyrol, Austria, June 18–22 2007, pp. 28–41.

[11] A. H¨ossinger, J. Cervenka, and S. Selberherr, “A Multistage Smoothing Algorithm for Coupling Cellular and Polygonal Datastructures,” in Proc. SISPAD, 2003, pp. 259–262.

[12] T. Binder, A. H¨ossinger, and S. Selberherr, “Rigorous Integration of Semiconductor Process and Device Simulators,” IEEE Trans.Comp.-Aided Design of Int. Circ. and Systems, vol. 22, no. 9, pp. 1204–1214, 2003.

[13] A. Sheikholeslami, E. Al-Ani, R. Heinzl, C. Heitzinger, F. Parhami, F. Badrieh, H. Puchner, T. Grasser, and S. Selberherr, “Level Set Method Based General Topography Simulator and its Applications in Interconnect Processes,” in Intl. Conf. on Ultimate Integration of Silicon, Bologna, Italy, July 2005, pp. 139–142.

[14] S. Halama, C. Pichler, G. Rieger, G. Schrom, T. Simlinger, and S. Selberherr, “VISTA — User Interface, Task Level, and Tool Integration,” IEEE J.Techn. Comp. Aided Design, vol. 14, no. 10, pp. 1208–1222, 1995.

[15] IµE, MINIMOS-NT 2.1 User’s Guide, Institut f¨ur Mikroelektronik, Technische Universit¨at Wien, Austria, 2004, http://www.iue.tuwien.ac.at/software/minimos-nt.

177 [16] S. Wagner, T. Grasser, and S. Selberherr, “Performance Evaluation of Linear Solvers Employed for Semiconductor Device Simulation,” in Proc. Conf. in Sim. of Semiconductor Processes and Devices, 2004, pp. 351–354.

[17] M. Heroux, R. Bartlett, V. H. R. Hoekstra, J. Hu, T. Kolda, R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring, and A. Williams, “An Overview of Trilinos,” Sandia National Laboratories, Tech. Rep. SAND2003-2927, 2003.

[18] G. Berti, “Generic Software Components for Scientific Computing,” Ph.D. dissertation, Technische Universit¨at Cottbus, 2000.

[19] E. Tonti, “Finite Formulation of the Electromagnetic Field,” in Geometric Methods in Computa- tional Electromagnetics, PIER 32, F. L. Teixeira, Ed. Cambridge, Mass.: EMW Publishing, 2001, pp. 1–44.

[20] C. Mattiussi, “The Geometry of Time-Stepping,” in Geometric Methods in Computational Electro- magnetics, PIER 32, F. L. Teixeira, Ed. Cambridge, Mass.: EMW Publishing, 2001, pp. 123–149.

[21] A. Hatcher, Algebraic Topology. Cambridge University Press, 2002.

[22] K. J¨anich, Topologie. Heidelberg: Springer, 2001.

[23] A. J. Zomorodian, “Topology for Computing,” in Cambridge Monographs on Applied and Compu- tational Mathematics, 2005.

[24] D. M. Butler and S. Bryson, “Vector Bundle Classes From Powerful Tool for Scientific Visualiza- tion,” Computers in Physics, vol. 6, pp. 576–584, November/December 1992.

[25] J. M. Hyman and M. J. Shashkov, “Mimetic Discretizations for Maxwell’s Equations and Equa- tions of Magnetic Diffusion,” in Proc. of the Fourth International Conference on Mathematical and Numerical Aspects of Wave Propagation, Golden, Colorado, June 1-5, 1998, J. A. DeSanto, Ed. SIAM, 1998, pp. 561–563.

[26] D. R. Musser and A. A. Stepanov, “Generic Programming,” in Proc. of the ISSAC’88 on Symb. and Alg. Comp. London, UK: Springer-Verlag, 1988, pp. 13–25.

[27] M. H. Austern, Generic Programming and the STL: Using and Extending the C++ Standard Tem- plate Library. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1998.

[28] P. Gross and P. R. Kotiuga, Electromagnetic Theory and Computation: A Topological Approach. Cambridge University Press, 2004.

[29] J. Hocking and G. Young, Topology. Dover Publications, New York: Addison-Wesley, Dover, 1961.

[30] F. H. Branin, “The Algebraic-Topological Basis for Network Analogies and for Vector Calculus,” in Symposium on Generalized Networks (12–14 April 1966). Polytechnic Institute of Brooklyn, 1966, pp. 453–491.

[31] E. Tonti, “The reason for analogies between physical theories,” Appl. Math. Modelling, vol. 1, no. 1, pp. 37–50, 1976/77.

[32] ——, “The Reason for Analogies between Physical Theories,” Appl. Math. Modelling, vol. 1, no. 1, pp. 37–50, 1976/77.

178 [33] C. Mattiussi, “An Analysis of Finite Volume, Finite Element, and Finite Difference Methods using some Concepts From Algebraic Topology,” J. Comput. Phys., vol. 133, no. 2, pp. 289–309, 1997.

[34] E. Tonti, “On the geometrical structure of electromagnetism,” in Gravitation, Electromagnetism and Geometrical Structures, G. Ferraese, Ed. Bologna: Pitagora, 1996, pp. 281–308.

[35] C. Mattiussi, “A Reference Discretization Strategy for the Numerical Solution of Physical Field Problems,” Advances in Imaging and Electron Physics, vol. 121, pp. 143–279, 2002.

[36] J. M. Hyman and M. J. Shashkov, “Natural Discretizations for the Divergence, Gradient, and Curl on Logically Rectangular Grids,” Comput. Math. Appl., vol. 33, no. 4, pp. 81–104, 1997.

[37] ——, “Adjoint Operators for the Natural Discretizations of the Divergence, Gradient and Curl on Logically Rectangular Grids,” Appl. Numer. Math., vol. 25, no. 4, pp. 413–442, 1997.

[38] J. M. Hyman, M. J. Shashkov, and S. Steinberg, “The Numerical Solution of Diffusion Problems in Strongly Heterogeneous Non-Isotropic Materials,” J. Comput. Phys., vol. 132, no. 1, pp. 130–148, 1997.

[39] J. M. Hyman and M. J. Shashkov, “The Approximation of Boundary Conditions for Mimetic Finite Difference Methods,” Computers & Mathematics with Applications, vol. 36, pp. 79–99, 1998.

[40] P. A. Markowich, C. Ringhofer, and C. Schmeiser, Semiconductor Equations. Wien-New York: Springer, 1990.

[41] T. Barth and M. Ohlberger, “Finite Volume Methods: Foundation and Analysis,” 2004. [Online]. Available: citeseer.ist.psu.edu/barth04finite.html

[42] O. C. Zienkiewicz and R. L. Taylor, The Finite Element Method. Berkshire, England: McGraw- Hill, 1987.

[43] C. Johnson, Numerical Solutions of Partial Differential Equations by the Finite Element Method. Cambridge, UK: Cambridge University Press, 1987.

[44] H. Ceric, “Numerical Techniques in Interconnect and Process Simulation,” Dissertation, Technische Universit¨at Wien, 2004.

[45] K. S. Yee, “Numerical Solution of Initial Boundary Value Problems involving Maxwell’s Equations in Isotropic Media,” IEEE Trans. Antennas and Propagation, vol. 14, no. 1, pp. 302–307, 1966.

[46] R. B. Haber, B. Lucas, and N. Collins, “A Data Model for Scientific Visualization with Provisions for Regular and Irregular Grids,” in VIS ’91: Proceedings of the 2nd conference on Visualization ’91. Los Alamitos, CA, USA: IEEE Computer Society Press, 1991, pp. 298–305.

[47] E. Codd, “A Relational Model of Data for Large Shared Data Banks,” Commun. ACM, vol. 13, no. 6, pp. 377–387, 1970.

[48] M. Haveraaen, H. Friis, and T. Johansen, “Formal Software Engineering for Computational Mod- eling,” Nordic Journal of Computing, vol. 3, no. 6, pp. 241–270, 1999.

[49] D. Butler and M. Pendley, “A Visualization Model Based on the Mathematics of Fiber Bundles,” Computers in Physics, vol. 3, no. 5, pp. 45–51, September/October 1989, http://www.cs.brown.edu/research/vis/docs/pdf/bib/Butler-1989-VMB.pdf(pdf) http://www. .brown.edu/research/vis/results/bibtex/Butler-1989-VMB.bib%(bibtex: Butler-1989-VMB).

179 [50] P. Wegner, “Concepts and Paradigms of Object-Oriented Programming,” SIGPLAN OOPS Mess., vol. 1, no. 1, pp. 7–87, 1990.

[51] K. Bruce, L. Cardelli, G. Castagna, G. T. Leavens, and B. Pierce, “On Binary Methods,” Theor. Pract. Object Syst., vol. 1, no. 3, pp. 221–242, 1995.

[52] J. Siek, L.-Q. Lee, and A. Lumsdaine, The Boost Graph Library: User Guide and Reference Man- ual. Addison-Wesley, 2002.

[53] E. N. Volanschi, C. Counsel, G. Muller, and C. Cowan, “Declarative Specialization of Object- Oriented Programs,” in Proc. of the OOPSLA Conf. New York, NY, USA: ACM Press, 1997, pp. 286–300.

[54] R. Affeldt, H. Masuhara, E. Sumii, and A. Yonezawa, “Supporting Objects in Run-Time Bytecode Specialization,” in Proc. of the Symp. on Part. Eval. and Semantics-Based Prog. Manip. New York, NY, USA: ACM Press, 2002, pp. 50–60.

[55] H. M. Andersen and U. P. Schultz, “Declarative Specialization for Object-Oriented-Program Spe- cialization,” in PEPM ’04: Proc. of the 2004 ACM SIGPLAN Symp. on Part. Eval. and Semantics- Based Prog. Manip. New York, NY, USA: ACM Press, 2004, pp. 27–38.

[56] D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler Transformations for High-Performance Computing,” in Proc. of the OOPSLA Conf., vol. 26, 1996, pp. 345–420.

[57] D. Gay and B. Steensgaard, “Fast Escape Analysis and Stack Allocation for Object-Based Pro- grams,” in CC ’00: Proc. of the 9th Conf. on Compiler Constr. London, UK: Springer-Verlag, 2000, pp. 82–93.

[58] J. J¨arvi, D. Gregor, J. Willcock, A. Lumsdaine, and J. G. Siek, “Algorithm Specialization in Generic Programming - Challenges of Constrained Generics in C++,” in PLDI ’06: Proceedings of the ACM SIGPLAN 2006 conference on Programming language design and implementation. New York, NY, USA: ACM Press, June 2006.

[59] T. Geraud and A. Duret-Lutz, “Generic Programming Redesign Pattern,” in Proc. of the 5th Conf. on Pattern Lang. of Progr. (EuroPLoP ’2000), Irsee, Germany, 2000. [Online]. Available: citeseer.ist.psu.edu/geraud00generic.html

[60] D. Abrahams and A. Gurtovoy, C++ Template Metaprogramming: Concepts, Tools, and Tech- niques from Boost and Beyond (C++ in Depth Series). Addison-Wesley Professional, 2004.

[61] J. J¨arvi, J. Willcock, and A. Lumsdaine, “Concept-Controlled Polymorphism,” in GPCE ’03: Proc. of the 2nd Conf. on Generative Prog. and Comp. Eng. New York, NY, USA: Springer-Verlag New York, Inc., 2003, pp. 228–244.

[62] A. Alexandrescu, Modern C++ Design: Generic Programming and Design Patterns Applied. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2001.

[63] D. Gregor, J. J¨arvi, M. Kulkarni, A. Lumsdaine, D. Musser, and S. Schupp, “Generic Programming and High-Performance Libraries,” Int. J. of Parallel Prog., vol. 33, no. 2, June 2005.

[64] J. G. Siek and A. Lumsdaine, “Concept Checking: Binding Parametric Polymorphism in C++,” in Proceedings of the First Workshop on C++ Template Programming, Erfurt, Germany, 2000. [Online]. Available: citeseer.nj.nec.com/siek00concept.html

[65] A. Dekker, “Lazy Functional Programming in Java,” ACM SIGPLAN Notices, vol. 41, no. 3, pp. 30–39, March 2006.

180 [66] M. Naftalin and P. Wadler, Java Generics and Collections. O’Reilly & Associates, Inc., 2006.

[67] L. Prechelt, “An Empirical Comparison of Seven Programming Languages,” Computer, vol. 33, no. 10, pp. 23–29, 2000. [Online]. Available: citeseer.ist.psu.edu/article/prechelt00empirical.html

[68] B. McNamara and Y. Smaragdakis, “Functional Programming in C++ using the FC++ Library,” SIGPLAN, vol. 36, no. 4, pp. 25–30, Apr. 2001.

[69] T. L. Veldhuizen, “Expression Templates,” C++ Report, vol. 7, no. 5, pp. 26–31, June 1995, reprinted in C++ Gems, ed. Stanley Lippman.

[70] R. Heinzl, P. Schwaha, M. Spevak, and T. Grasser, “Performance Aspects of a DSEL for Scientific Computing with C++,” in Proc. of the POOSC Conf., Nantes, France, July 2006, pp. 37–41.

[71] H. P. Langtangen and X. Cai, “Mixed Language Programming for HPC Applications,” in Proc. of the PARA Conf., Umea, Sweden, June 2006, p. 154.

[72] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser, “Concepts for High Performance Generic Sci- entific Computing,” in Proc. of the 5th MATHMOD, vol. 1, Vienna, Austria, February 2006.

[73] M. Spevak, R. Heinzl, P. Schwaha, and T. Grasser, “Process and Device Simulation With a Generic Scientific Simulation Environment,” in Proceedings, MIEL 2006, vol. 2, Belgrad, Serbia and Mon- tenegro, April 2006, pp. 475–478.

[74] R. Heinzl, M. Spevak, P. Schwaha, and T. Grasser, “A High Performance Generic Scientific Simu- lation Environment,” in Proc. of the PARA Conf., Umea, Sweden, June 2006, p. 61.

[75] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, “Design Patterns: Abstraction and Reuse of Object-Oriented Design,” Springer series Lecture Notes in Computational Sciene (LNCS), vol. 707, pp. 406–431, 1993. [Online]. Available: citeseer.ist.psu.edu/gamma93design.html

[76] D. Abrahams, J. Siek, and T. Witt, “New Iterator Concepts,” ISO/IEC JTC 1, Information Technol- ogy, Subcommittee SC 22, Programming Language C++, Tech. Rep. N1477 03-0060, 2003.

[77] J. A. Lewis, S. M. Henry, D. G. Kafura, and R. S. Schulman, “An Empirical Study of the Object- Oriented Paradigm and Software Reuse,” in OOPSLA ’91: Conference proceedings on Object- oriented programming systems, languages, and applications. New York, NY, USA: ACM Press, 1991, pp. 184–196.

[78] S. Selberherr, Analysis and Simulation of Semiconductor Devices. Wien–New York: Springer, 1984.

[79] R. Sabelka and S. Selberherr, “A Finite Element Simulator for Three-Dimensional Analysis of Interconnect Structures,” Microelectronics Journal, vol. 32, no. 2, pp. 163–171, 2001.

[80] R. Heinzl and T. Grasser, “Generalized Comprehensive Approach for Robust Three-Dimensional Mesh Generation for TCAD,” in Proc. Conf. in Sim. of Semiconductor Processes and Devices, Tokio, September 2005, pp. 211–214.

[81] R. Heinzl, M. Spevak, P. Schwaha, and S. Selberherr, “A Generic Topology Library,” in Library Centric Sofware Design, OOPSLA, Portland, OR, USA, October 2006, pp. 85–93.

[82] D. Abrahams, J. Siek, and T. Witt, “New Iterator Concepts,” ISO/IEC JTC 1, Information Tech- nology, Subcommittee SC 22, Programming Language C++, Tech. Rep. N1550=03-0133, April 2006.

181 [83] D. Gregor, J. Willcock, and A. Lumsdaine, “Concepts for the C++0x Standard Library: Iterators,” ISO/IEC JTC 1, Information Technology, Subcommittee SC 22, Programming Language C++, Tech. Rep. N2039=06-0109, June 2006.

[84] M. Zalewski and S. Schupp, “Changing Iterators with Confidence. A Case Study of Change Impact Analysis Applied to Conceptual Specifications,” in Library Centric Sofware Design, OOPSLA, San Diego, CA, USA, October 2005.

[85] D. Gregor, J. Willcock, and A. Lumsdaine, “Concepts for the C++0x Standard Library: Iterators,” ISO/IEC JTC 1, Information Technology, Subcommittee SC 22, Programming Language C++, Tech. Rep. N2039=06-0109, June 2006.

[86] S. Mauch, “Efficient Algorithms for Solving Static Hamilton-Jacobi Equations,” Ph.D. dissertation, California Institute of Technology, Purdue, CA, 2003. [Online]. Available: citeseer.ist.psu.edu/mauch03efficient.html

[87] T. L. Veldhuizen and D. Gannon, “Active Libraries: Rethinking the Roles of Compilers and Li- braries,” in Proc. of the SIAM Workshop on Obj.-Oriented Methods for Inter-Operable Sci. and Eng. Comp. (OO’98). SIAM-Verlag, 1998.

[88] J. Siek and A. Lumsdaine, “Mayfly: A Pattern for Lightweight Generic Interfaces,” in Pattern Languages of Programs, July 1999.

[89] S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang, “PETSc Web page,” 2005.

[90] T. Tang and M.-K. Ieong, “Discretization of Flux Densities in Device Simulations Using Optimum A rtificial Diffusivity,” IEEE Trans.Comp.-Aided Design of Int. Circ. and Systems, vol. 14, no. 11, pp. 1309–1315, 1995.

[91] T. Grasser, H. Kosina, M. Gritsch, and S. Selberherr, “Using Six Moments of Boltzmann’s Transport Equation for Device Simulation,” Journal of Applied Physics, vol. 90, no. 5, pp. 2389–2396, 2001.

[92] ITRS, “International Technology Roadmap for Semiconductors,” 2005.

[93] G. Berti, “GrAL - The Grid Algorithms Library,” in ICCS ’02: Proc. of the Conf. on Comp. Sci., vol. 2331. London, UK: Springer-Verlag, 2002, pp. 745–754. [Online]. Available: http://link.springer-ny.com/link/service/series/0558/bibs/2331/23310745%.htm;http:// link.springer-ny.com/link/service/series/0558/papers/2331/23310745.pd%f

[94] Boost Fusion 2, Boost, 2006, http://spirit.sourceforge.net/. [Online]. Available: http://spirit. sourceforge.net/

[95] B. Phoenix, Boost Phoenix, Boost, 2004, http://spirit.sourceforge.net/. [Online]. Available: http://spirit.sourceforge.net/

[96] Boost Phoenix 2, Boost Phoenix 2, Boost, 2006, http://spirit.sourceforge.net/. [Online]. Available: http://spirit.sourceforge.net//

[97] T. Binder, “Rigorous Integration of Semiconductor Process and Device Simulators,” Ph.D. disser- tation, Technische Universit¨at Wien, 2002.

[98] A. H¨ossinger, R. Minixhofer, and S. Selberherr, “Full Three-Dimensional Analysis of a Non- Volatile Memory Cell,” in Proc. Conf. in Sim. of Semiconductor Processes and Devices, 2004, pp. 129–132.

182 [99] S. Selberherr, A. Sch¨utz, and H. P¨otzl, “MINIMOS—A Two-Dimensional MOS Transistor Ana- lyzer,” IEEE Trans. Electron Dev., vol. ED-27, no. 8, pp. 1540–1550, 1980.

[100] W. Bangerth, R. Hartmann, and G. Kanschat, “deal.II – A General Purpose Object-Oriented Finite Element Library,” Institute for Scientific Computation, Texas A&M University, Tech. Rep. ISC-06- 02-MATH, 2006.

[101] C. S. Rafferty and R. K. Smith, “Solving Partial Differential Equations with the Prophet Simulator,” Bell Laboratories, Lucent Technologies, 1996. [Online]. Available: http: //www-tcad.stanford.edu/˜prophet/math.pdf

[102] M. Radi, “Three-Dimensional Simulation of Thermal Oxidation,” Ph.D. dissertation, Technische Universit¨at Wien, 1998.

[103] C. Fischer, “Bauelementsimulation in einer computergest¨utzten Entwurfsumgebung,” Ph.D. disser- tation, Technische Universit¨at Wien, 1994.

[104] R. Klima, “Three-Dimensional Device Simulation with Minimos-NT,” Ph.D. dissertation, Technis- che Universit¨at Wien, 2002.

[105] R. Bauer, “Numerische berechnung von kapazit¨aten in dreidimensionalen verdrahtungsstrukturen,” Ph.D. dissertation, Technische Universit¨at Wien, 1994.

[106] R. Sabelka, “Dreidimensionale Finite Elemente Simulation von Verdrahtungsstrukturen auf Inte- grierten Schaltungen,” Ph.D. dissertation, Technische Universit¨at Wien, 2001.

[107] O. Bagge, “CodeBoost: A Framework for Transforming C++ Programs,” Master’s thesis, Univer- sity of Bergen, P.O.Box 7800, N-5020 Bergen, Norway, March 2003.

[108] A. Logg, T. Dupont, J. Hoffman, C. Johnson, R. C. Kirby, M. G. Larson, and L. R. Scott, “The FEniCS Project,” Chalmers Finite Element Center, Tech. Rep. 2003-21, 2003.

[109] P. Castillo, R. Rieben, and D. White, “FEMSTER: An Object-Oriented Class Library of High-Order Discrete Differential Forms,” ACM Trans. Math. Softw., vol. 31, no. 4, pp. 425–457, 2005.

[110] K. Czarnecki and U. W. Eisenecker, “Components and Generative Programming,” in ESEC/FSE-7: Proc. 7th ESEC. London, UK: Springer-Verlag, 1999, pp. 2–19.

[111] R. Pozo, “Template Numerical Toolkit for Linear Algebra: High Performance Programming with C++ and the Standard Template Library,” International Journal of High Performance Computing Applications, vol. 11, no. 3, pp. 251–263, Fall 1997.

[112] M. Forster, A. Pick, and M. Raitner, Graph Template Library (GTL), http://www.fmi.uni-passau. de/Graphlet/GTL/. [Online]. Available: http://www.fmi.uni-passau.de/Graphlet/GTL/

[113] A. Fabri, “CGAL - The Computational Geometry Algorithm Library,” 2001. [Online]. Available: http://citeseer.ist.psu.edu/fabri01cgal.html

[114] S. Pion and A. Fabri, “A Generic Lazy Evaluation Scheme for Exact Geometric Computations,” in Library Centric Sofware Design, OOPSLA, Portland, OR, USA, October 2006, pp. 75–84.

[115] W. Bangerth, R. Hartmann, and G. Kanschat, deal.II Differential Equations Analysis Library, Technical Reference, http://www.dealii.org/, http://www.dealii.org/. [Online]. Available: http://www.dealii.org/

183 [116] J. H¨ardtlein, A. Linke, and C. Pflaum, “Fast Expression Templates - Object Oriented High Perfor- mance Programming,” Eng. with Comput., vol. 3515, pp. 1055–1063, 2005.

[117] M. Spevak, R. Heinzl, P. Schwaha, and T. Grasser, “A Computational Framework for Topological Operations,” in Proc. of the PARA Conf., Umea, Sweden, June 2006, p. 57.

[118] P. Schwaha, R. Heinzl, M. Spevak, and T. Grasser, “Advanced Equation Processing for TCAD,” in Proc. of the PARA Conf., Umea, Sweden, June 2006, p. 64.

[119] ITRS, “International Technology Roadmap for Semiconductors, Update,” 2006.

[120] D. Scharfetter and H. Gummel, “Large-Signal Analysis of a Silicon Read Diode Oscillator,” IEEE Trans. Electron Dev., vol. 16, no. 1, pp. 64–77, 1969.

[121] R. Garcia, J. J¨arvi, A. Lumsdaine, J. Siek, and J. Willcock, “A Comparative Study of Language Support for Generic Programming,” in Proc. of the 18th Annual ACM SIGPLAN. New York, NY, USA: ACM Press, 2003, pp. 115–134.

[122] J. Dudel, R. Menzel, and R. F. Schmidt, Neurowissenschaft. Vom Molekul¨ zur Kognition. Berlin: Springer, 2001.

[123] W. Westheide and R. Rieger, Spezielle Zoologie. Teil 2: Wirbel- oder Schadeltiere¨ . Elsevier, 2003.

[124] H. R. Schwarz, Numerische Mathematik. Stuttgart: B. G. Teubner, 1997.

[125] Stream - Sustainable Memory Bandwidth in High Performance Computers, Boost, http://www.cs. virginia.edu/stream/. [Online]. Available: http://www.cs.virginia.edu/stream/

[126] Boost Lambda Library, Boost, http://www.boost.org. [Online]. Available: http://www.boost.org

[127] J. Siek and A. Lumsdaine, “The Matrix Template Library: Generic Components for High- performance Scientific Computing,” Computing in Science and Engineering, vol. 1, no. 6, pp. 70–78, Nov/Dec 1999. [Online]. Available: http://dx.doi.org/10.1109/5992.805137

[128] L. Lee and A. Lumsdaine, “Generic Programming for High Performance Scientific Applications,” in JGI ’02: Proc. of the 2002 joint ACM-ISCOPE Conf. on Java Grande. New York, NY, USA: ACM Press, 2002, pp. 112–121.

[129] R. C. Whaley and J. Dongarra, “Automatically Tuned Linear Algebra Software,” in 9th SIAM Conf. on Parallel Proc. for Sci. Comp., 1999, cD-ROM Proceedings.

[130] C. E. Oancea and S. M. Watt, “Parametric Polymorphism for Software Component Architectures,” in Proc. of the OOPSLA Conf. New York, NY, USA: ACM Press, 2005, pp. 147–166.

[131] T. L. Veldhuizen, “Using C++ Template Metaprograms,” C++ Report, vol. 7, no. 4, pp. 36–43, May 1995, reprinted in C++ Gems, ed. Stanley Lippman.

[132] ——, “Arrays in Blitz++,” in Proc. Symp. on Comp. in Obj.-Oriented Parallel Env., ser. Lecture Notes in Computer Science. Springer-Verlag, 1998.

184 Curriculum Vitae

September 6th, 1977 Born in Vienna, Austria 1997 High School Graduation at the HTL Donaustadt, Wien 1997-1998 Military service October 1998 Enrolled in Electrical Engineering at the Technical University of Vienna October 2003 Received degree of “Diplomingenieur” in Electrical Engineering with a specialization in Computer Technology from the Technical University of Vienna November 2003 Entered doctoral program at the Institute for Microelectronics, TU Vienna Juli 2007 Finished doctoral program at the Institute for Microelectronics, TU Vienna

185