Automated computational modelling for complicated partial differential equations

Automated computational modelling for complicated partial differential equations

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof. ir. K..A.M. Luyben, voorzitter van het College voor Promoties, in het openbaar te verdedigen op dinsdag 3 december 2013 om 12.30 uur

door

Kristian Breum ØLGAARD Master of Science in Civil Engineering, Aalborg Universitet Esbjerg geboren te Ringkøbing, Denemarken Dit proefschrift is goedgekeurd door de promotor: Prof. dr. ir. L. J. Sluys

Copromotor: Dr. G. N. Wells

Samenstelling promotiecommissie: Rector Magnificus Voorzitter Prof. dr. ir. L. J. Sluys Technische Universiteit Delft, promotor Dr. G. N. Wells University of Cambridge, copromotor Dr. ir. M. B. van Gijzen Technische Universiteit Delft Prof. dr. P. H. J. Kelly Imperial College London Prof. dr. R. Larsson Chalmers University of Technology Prof. dr. L. R. Scott University of Chicago Prof. dr. ir. C. Vuik Technische Universiteit Delft Prof. dr. A. Scarpas Technische Universiteit Delft, reservelid

Copyright © 2013 by K. B. Ølgaard Printed by Ipskamp Drukkers B.V., Enschede, The Netherlands ISBN 978-94-6191-990-8 Foreword

This thesis represents the formal end of my long and interesting journey as a PhD student. The sum of many experiences over the past years has increased my knowledge and contributed to my personal development. All these experiences originate from the interaction with many people to whom I would like to express my gratitude. I am most grateful to Garth Wells for giving me the opportunity to come to Delft and to study under his competent supervision. His constructive criticism and vision combined with our nice discussions greatly improved the quality of my research. As the head of the computational mechanics group, Bert Sluys has played a vital role by creating a very nice and supportive working environment where people enjoy a lot of creative freedom. As creativity is key in this research I consider myself lucky to have been part of Bert’s group. Ronnie Pedersen did a very good job in persuading me to come to Delft for a PhD, and I am happy that he managed to convince me. I am also grateful for enjoying his friendship throughout the years, the good times on the football pitch, and the even better times in ’t Proeflokaal watching football and discussing work and life in general. A friendly and inspiring working environment is important in order to produce quality work. Therefore, I would like to thank past and present colleagues Rafid Al- Khoury, Roberta Bellodi, Frank Custers, Frank Everdij, Huan He, Cecilia Iacono, Cor Kasbergen, Oriol Lloberas-Valls, Prithvi Mandapalli, Frans van der Meer, Andrei Metrikine, Peter Moonen, Dung Nguyen, Vinh Phu Nguyen, Mehdi Nikbakth, Marjon van der Perk, Frank Radtke, Zahid Shabir, Xuming Shan, Angelo Simone, Mojtaba Talebian, Andy Terrel, Ilse Vegt, Jaap Weerheijm, Sigurd Blöndal, Lars Damkilde, Niels Dollerup, Jens Hagelskjær, Michael Jepsen, Sven Krabbenhøft and Søren Lambertsen. In particular, I would like to thank Frans for the years that we shared the same office and for translating the propositions into Dutch. A special thanks goes to Mehdi, my ‘brother-in-arms’, the only person remaining in the group who was also involved with the FEniCS Project after Garth left for Cambridge and Xuming left for home. The research presented in this thesis, is centered around the FEniCS Project and, vi

therefore, I would also like to thank all the people in the FEniCS community, in particular my close collaborators from Simula Anders Logg, Martin Alnæs, Marie Rognes and Johan Hake for all the nice discussions, debugging assistance and good ideas. During my PhD, I also had the pleasure of visiting the University of Michigan and in this regard I want to thank Krishna Garikipati, Jake Ostien and his wife Erin for their hospitality during my stay in Ann Arbor. Outside the office, I enjoyed many hours in the good company of my friends Linda Grimstrup and Lars Freising which definitely improved the quality of my social life a lot. I also want to thank all my former team mates at Vitesse Delft for the many memorable hours on the football pitch trying to learn the secrets behind ‘totaalvoetbal’. Although The Netherlands and Denmark are quite similar in terms of weather, nature and culture it was always nice to receive visitors from home. For this, I would like to thank my friends Kenneth Guldager, Henrik Hansen, Mads Madsen, Christian Meyer, Nick Nørreby and Thomas Sørensen. Last, but certainly not least I want to thank my parents and my brother and sisters for their encouragement, support, help and visits during my years in Delft. I also wish to thank both of my sons for putting things in perspective which helped me to focus during the last iterations towards finishing this thesis. Of all people, I am most grateful to my wife. I know her patience has been tested to the limit, yet she remained supportive, loving and caring during all the years. For this, and for our sons, I am forever indebted. The research presented in this thesis was carried out at the Faculty of Civil Engineering and Geosciences at Delft University of Technology. The research was supported by the Netherlands Technology Foundation STW, the Netherlands Organisation for Scientific Research and the Ministry of Public Works and Water Management.

Kristian Breum Ølgaard Ølgod, Denmark, November 2013 Contents

1 Introduction1 1.1 Research objectives and approach...... 2 1.2 Outline...... 4 1.3 The FEniCS Project...... 5 1.3.1 Simple model problem...... 6 1.3.2 Unified Form Language...... 8 1.3.3 FEniCS Form ...... 11 1.3.4 Unified Form-assembly Code...... 15 1.3.5 DOLFIN...... 18

2 FEniCS applications to solid mechanics 29 2.1 Governing equations...... 30 2.1.1 Preliminaries...... 30 2.1.2 Balance of momentum...... 31 2.1.3 Potential energy minimisation...... 32 2.2 Constitutive models...... 33 2.2.1 Linearised elasticity...... 33 2.2.2 Flow theory of plasticity...... 34 2.2.3 Hyperelasticity...... 35 2.3 Linearisation issues for complex constitutive models...... 36 2.3.1 Consistency of linearisation...... 36 2.3.2 Quadrature elements...... 38 2.4 Implementations and examples...... 41 2.4.1 Linearised elasticity...... 43 2.4.2 Plasticity...... 43 2.4.3 Hyperelasticity...... 49 2.4.4 Elastodynamics...... 52 2.5 Current and future developments...... 53 viii Contents

3 Representations and optimisations of finite element variational forms 57 3.1 Motivation and approach...... 58 3.2 Representation of finite element tensors...... 60 3.2.1 Quadrature representation...... 61 3.2.2 Tensor contraction representation...... 62 3.3 Quadrature optimisations...... 64 3.3.1 Eliminate operations on zeros...... 67 3.3.2 Simplify expressions...... 69 3.3.3 Precompute integration point constants...... 71 3.3.4 Precompute basis constants...... 72 3.3.5 Further optimisations...... 73 3.4 Performance comparisons of representations...... 75 3.4.1 Performance for a selection of forms...... 75 3.4.2 Performance for common, simple forms...... 80 3.4.3 Performance for forms of increasing complexity...... 82 3.5 Performance comparisons of quadrature optimisations...... 86 3.6 Automatic selection of representation...... 92 3.7 Future optimisations...... 93

4 Automation of discontinuous Galerkin methods 97 4.1 Extending the framework to discontinuous Galerkin methods.... 98 4.1.1 Extending the Unified Form Language...... 99 4.1.2 Extending the Unified Form-assembly Code...... 100 4.1.3 Extending the FEniCS Form Compiler...... 100 4.1.4 Extending DOLFIN...... 101 4.2 Examples...... 103 4.2.1 The Poisson equation...... 104 4.2.2 Steady state advection–diffusion equation...... 105 4.2.3 The Stokes equations...... 109 4.2.4 Biharmonic equation...... 110 4.2.5 Further applications...... 114

5 Automation of lifting-type discontinuous Galerkin methods 115 5.1 Lifting-type formulation for the Poisson equation...... 116 5.2 Semi-automated implementation of lifting-type formulations.... 117 5.3 Comparison of IP and lifting-type formulations...... 122 5.4 Future developments...... 127

6 Strain gradient plasticity 129 6.1 A strain gradient plasticity model...... 130 6.2 A discontinuous Galerkin formulation for the plastic multiplier... 133 6.3 Linearisation of the governing equations...... 135 Contents ix

6.4 Implementation...... 137 6.4.1 The predictor step...... 138 6.4.2 The corrector step...... 139 6.4.3 Implementing the variational forms...... 141 6.5 Numerical examples...... 141 6.5.1 Unit square loaded in shear with strain softening...... 143 6.5.2 Plate under compressive loading with strain softening.... 146 6.5.3 Plate under compressive loading with strain hardening... 156 6.5.4 Micro-indentation...... 160 6.5.5 Computational notes...... 162

7 Conclusions and future developments 167

References 171

Summary 183

Samenvatting 185

Propositions 187

Stellingen 189

Curriculum vitae 191

1 Introduction

Since the advent of the modern programmable computer in the 1940s, the cost of computing power relative to manpower has decreased significantly. As a con- sequence, high-level programming languages have emerged allowing the imple- mentation of programs in source code using abstractions that are independent of the specific computer architectures on which the program is intended to run. A compiler is then invoked to translate the source code into machine code targeted for the given computer’s central processing unit (CPU). This development has allowed, among other things, researchers and scientists to write programs for investigating and solving various classes of problems numerically. In engineering, physical phenomena are often described mathematically by partial differential equations (PDEs), and a commonly used method to solve these equations is the finite element method (FEM). Standard finite element software typ- ically provide a problem solving environment for a set of engineering problems using a predefined selection of finite elements. As part of the application program- ming interface (API) a user can often supply subroutines which implement special methods, for instance, the constitutive model in case of a solid mechanics problem. This offers a degree of customisation and flexibility in terms of implementing certain models, but the approach may fall short as the complexity of a model increases. Strain gradient plasticity is an example of a class of models which can be difficult to implement in traditional finite element software and researchers often resort to implementing their own unique solver targeting a specific model. An implementation involves translating the abstract mathematical representation of the model into source code which can be handled by a compiler, a process which can be tedious, time consuming and error prone. However, by introducing a higher level of abstraction, the burden of this process can be alleviated when it comes to implementing mathematical representations of the FEM for solving PDEs. A possible abstraction consists of a form language for expressing the mathematical formulation of the given problem, and which automatically generate efficient source code from the given mathematical expressions. This thesis is centered around this type of automated mathematical modelling. 2 Chapter 1. Introduction

1.1 Research objectives and approach

The research presented in this thesis aims at developing concepts, tools and meth- ods which allow researchers and application developers to create efficient solvers for complicated partial differential equations with relatively little effort. Sev- eral software projects aim at providing a flexible framework for solving partial differential equations using the finite element method. These software projects include, among others, traditional finite element libraries and toolboxes such as deal.II (http://www.dealii.org/, Bangerth et al.(2007)), Diffpack ( http://www. diffpack.com/, Langtangen(1999)), DUNE ( http://www.dune-project.org, Bas- tian et al.(2008b,a)), GetFEM++ ( http://home.gna.org/getfem/), OpenFOAM (http://www.openfoam.com/) and Cactus (http://cactuscode.org/, Allen et al. (2000)). However, a bit of ‘hand coding’ is often needed in order to use the above mentioned software. For instance, a user must typically implement (parts of) the assembly algorithm which is cumbersome as the complexity of the problem is increasing. A number of software projects have, therefore, emerged that try to automate the finite element method. These projects include, among others, FINGER (Wang, 1986), Archimedes (Shewchuk and Ghattas, 1993), Symbolic Me- chanics System (Korelc, 1997), GetDP (http://geuz.org/getdp/, Dular et al.(1998)), FreeFEM++ (http://www.freefem.org/), Sundance (http://www.math.ttu.edu/ ~kelong/Sundance/html/, Long et al.(2010)), Feel++ ( http://www.feelpp.org/, Prud’homme(2006)) and the FEniCS Project ( http://fenicsproject.org, Logg et al.(2012a)). A common feature of these approaches is that they provide a higher level of abstraction for expressing variational forms and thereby lessen the burden on application developers. The developments presented in this thesis are implemented in various software components of the FEniCS Project which is chosen for a number of reasons. The software is released under an open source license1 which makes it possible to obtain and modify the source code. This provides a high degree of freedom and flexibility in terms of implementing advanced models and applications. Furthermore, if the application source code is published, the implementation becomes completely transparent and reproducible, both properties of importance in research. The software contains a problem solving environment that handles the assembly, the application of boundary conditions and the solution of sparse systems of equations. What distinguishes the software from the more conventional finite element packages is that it provides a high degree of mathematical abstraction by implementing a form language for expressing variational forms and relies on form compilers to automatically generate computer code for the local finite element tensor. This approach offers several advantages of which two are of particular interest. Firstly, the time needed to implement, test and debug the code for the local finite element

1All FEniCS core components are licensed under the GNU LGPL version 3 (or any later version) as published by the Free Software Foundation (http://www.fsf.org). 1.1. Research objectives and approach 3 tensor can be reduced. Secondly, various optimisations can be employed by the form compilers during the code generation stage to make the generated code competitive with hand optimised code. The importance of these two advantages is proportional to the complexity of the variational form. Finally, the software is under active development by a growing community which is helpful for receiving feedback when implementing new features and during debugging sessions. The potential of the FEniCS framework is evident, however, at the time when this work was commenced, the functionality in the FEniCS software was only available for a limited class of problems. For instance, only integration over element interiors was supported which precluded, among other things, discontinuous Galerkin methods from being handled as these methods involve integration over element boundaries. Furthermore, problems like conventional plasticity were not possible to solve because the software could only handle functions coming from a finite element space. Also, the generated code was only efficient for limited classes of problems. The objectives of this work can thus be condensed into the following: extend the automated mathematical modelling framework of FEniCS such that

• discontinuous Galerkin methods can be handled;

• rapid prototyping of advanced models and applications is possible; and

• efficiency is maintained also for complex problems in general.

As will be demonstrated in this work, addressing the above three issues has had a significant impact on the range of problems which can be handled in the FEniCS framework and thereby making life easier for researchers and application developers. A complex application from solid mechanics in the form of a strain gradient plasticity model is considered, as an example, to demonstrate the extensions to the FEniCS framework developed in this work. Strain gradient models are often used to provide regularisation in softening problems and to account for observed size effects at small length scales. An abundance of strain gradient models have been proposed in literature including the models by Aifantis(1984), Gurtin(2004), Fleck and Hutchinson(1997), Fleck and Hutchinson(2001) and Gao et al.(1999) to name a few. The focus in this work is on the class of models involving gradients of fields such as the equivalent plastic strain. An example of such a model is that proposed by Aifantis(1984) which involves the addition of the Laplacian of the equivalent plastic strain to the classical yield condition. A feature of this particular model is that the classical consistency condition leads to a partial differential equation rather than an algebraic equation, as is the case is classical flow theory of plasticity. The partial differential equation is only active in the region undergoing plastic deformations which introduces the difficulty of imposing non-standard boundary conditions on the secondary field on the evolving boundary. 4 Chapter 1. Introduction

Motivated by the work of Wells et al.(2004) and Molari et al.(2006) who used a discontinuous Galerkin formulation for a strain gradient-dependent damage model, a discontinuous basis can be used to interpolate the secondary field. This provides a natural framework for handling evolving elastic–plastic boundaries and provides local (cell-wise) satisfaction of the yield condition. To satisfy the regularity requirement of the secondary field, a discontinuous Galerkin formulation is used to enforce weak continuity across cell facets. In order to allow the use of a discontinuous constant basis for the secondary field, a so-called lifting-type discontinuous Galerkin formulation, proposed by Bassi and Rebay(1997, 2002), is adopted. A discontinuous constant basis is the natural choice for the secondary field when a linear continuous basis is used for the displacement field. Considering that the formulation involves an additional field variable it is also computationally more efficient if discontinuous constant elements can be used for this particular field.

1.2 Outline

The rest of this chapter contains an overview of the FEniCS Project including details on the components pertinent to the present work. Chapter2 continues with a demonstration of how to use the FEniCS toolchain for solid mechanics applications. The purpose of this demonstration is twofold. Firstly, it serves as an introduction to the concepts of automated modelling from a solid mechanics point of view, which will give an understanding of how the automated modelling approach can be utilised to also tackle more complex problems. Secondly, the presented models and applications will be used in subsequent chapters, either by extending the models or by using them as a platform for discussing the development of FEniCS components in connection to the work presented in this thesis. Local finite element tensors can be evaluated using different representations of the tensors. In Chapter3 the two representations that FFC adopts, the quadrature representation and the tensor contraction representation are presented and comparisons are made between the two representations. Furthermore, optimisation strategies for the quadrature representation are discussed and the performance of these are investigated. Chapter4 introduces the extensions implemented in the FEniCS framework to allow a class of discontinuous Galerkin (DG) formulations to be handled in an automated fashion. Building on these abstractions, a semi-automated approach to implementing lifting-type DG formulations is presented in Chapter5. This chapter also contains a brief comparison, in terms of complexity regarding the implementation and the numerical implications, between a lifting-type formulation and an interior penalty (IP) DG formulation for the Poisson equation. In Chapter6 the extensions, developed in the previous chapters, to the FEniCS 1.3. The FEniCS Project 5 framework are brought together in an implementation of a lifting-type discontinu- ous formulation for a simple strain gradient plasticity model proposed by Aifantis (Aifantis, 1984). The purpose is to illustrate how researchers and application de- velopers may create solvers for more complex problems on top of the FEniCS software. Finally, in Chapter7, conclusions are drawn and recommendations for future development related to this work are presented.

1.3 The FEniCS Project

The FEniCS Project is a suite of open source programs for automating the solution of PDEs. The concepts and components which are most important in relation to this work and which will be elaborated on in subsequent chapters are presented. Thus, only a subset of the components in the FEniCS Project is presented. Further details on the components presented here, and other components associated with the FEniCS Project, can be found in the FEniCS book (Logg et al., 2012a) or online at http://fenicsproject.org. All FEniCS software components, and the software developed in this work, can be obtained freely at https://bitbucket.org/ fenics-project2. The FEniCS Project is under continuous development, however, this presentation and all example code, and software developed and described in this work, is compliant with version 1.0 of the project and its associated components unless stated otherwise. The majority of developments in this work is implemented in the core com- ponents of FEniCS. However, some of the developments are implemented in the FEniCS Solid Mechanics library3 (Ølgaard and Wells, 2013). In this thesis, several code snippets are presented along with many results from numerical experiments. All example code, and the code which has been used to obtain all the results, can be downloaded from https://bitbucket.org/k.b.oelgaard/ oelgaard-thesis-supporting-material. Note that in order to run the code, work- ing installations of FEniCS version 1.0 and FEniCS Solid Mechanics version 1.0 are required4. The procedure of solving PDEs using the FEM can be broken down into the following four steps: 1. Formulate the variational problem of the PDE

2The FEniCS software components have recently moved from Launchpad (https://launchpad.net/ fenics) to Bitbucket. However, as the FEniCS Project is being actively developed the location might change again in the future. The FEniCS website (http://fenicsproject.org), which is less likely to move, might be a better starting point for locating the software. 3The FEniCS Solid Mechanics library was formerly known as FEniCS Plasticity (https://launchpad. net/fenics-plasticity) which focussed solely on plasticity problems. However, to reflect that the scope of the library has increased to also include more general solid mechanics problems the name was changed during a recent migration from Launchpad to Bitbucket. 4Version 1.0 of the FEniCS Solid Mechanics library can be downloaded from https://bitbucket. org/fenics-apps/fenics-solid-mechanics. 6 Chapter 1. Introduction

UFL FFC UFC DOLFIN

Figure 1.1: FEniCS toolchain for solving a PDE using the FEM.

2. Discretise the formulation

3. Finite element assembly

4. Solve the global system of equations

Facilities for each of these steps are implemented in separate software components in FEniCS. The relationship between input and output of each component in the FEniCS toolchain for the finite element procedure is shown in Figure 1.1. In short, the variational form of the PDE is expressed in the Unified Form Language (UFL) (Alnæs et al., 2013; Alnæs, 2012), which is given as input to the FEniCS Form Compiler (FFC)5 (Kirby and Logg, 2006, 2007; Logg et al., 2012c; Ølgaard et al., 2008a) that automatically generates efficient C++ code for evaluating the local element tensors. The output from FFC is compliant with the interface defined in Unified Form-assembly Code (UFC) (Alnæs et al., 2009, 2012) and is used by DOLFIN (Logg and Wells, 2010; Logg et al., 2012d), which is the finite element assembler and solver of FEniCS although, in principle, any assembly library which supports UFC can be used. The key advantage of this modular construction is that it becomes more trans- parent where and how new features and functionality should be implemented. Furthermore, developers and users can pick individual components to form their own applications. In this work for instance, the UFL is augmented with discontinu- ous Galerkin operators6, compiler optimisations are implemented in FFC, while more complex solvers for lifting-type formulations and solid mechanics problems can be implemented on top of the FEniCS toolbox.

1.3.1 Simple model problem

As a model boundary value problem for presenting the FEniCS framework consider the Poisson equation, which for a body Ω Rd, where 1 d 3, with boundary ⊂ ≤ ≤

5Any compiler that supports UFL as input, and outputs UFC code, can be used instead of FFC in the described toolchain. The Symbolic Form Compiler (Alnæs and Mardal, 2010, 2012), which is also part of FEniCS, is one such example. 6Historically, the DG operators were implemented in the original form language of FFC which was later merged into the richer UFL. 1.3. The FEniCS Project 7

∂Ω and outward unit normal vector n : ∂Ω Rd reads: → ∆u = f in Ω, − u = g on ΓD, (1.1) u n = h on Γ . ∇ · N Here, u is an unknown scalar field, f is a source term, g is a prescribed value for u on the Dirichlet boundary ΓD, and h is a prescribed value for the outward normal derivative of u on the Neumann boundary ΓN. The boundaries ΓD and ΓN divide the boundary such that Γ ∂Ω and Γ = ∂Ω Γ . To apply the FEniCS D ⊆ N \ D framework the problem must be posed as a variational formulation in the following canonical form: find u V such that ∈ a (u, v) = L (v) v Vˆ , (1.2) ∀ ∈ where V is the trial space and Vˆ is the test space, a (u, v) and L (v) denote the bilinear and linear forms, respectively. A typical variational form7 of (1.1) defines the bilinear and linear forms as: Z a (u, v) := u v dx (1.3) Ω ∇ · ∇ Z Z L (v) := f v dx + hv ds, (1.4) Ω ΓN with the trial and test spaces defined as: n o V := v H1(Ω) : v = g on Γ , (1.5) ∈ D n o Vˆ := v H1(Ω) : v = 0 on Γ . (1.6) ∈ D The variational problem in (1.2) must be discretised to compute a finite element solution to the Poisson problem. This is done by using a pair of discrete function spaces for the test and trial functions: find u V V such that h ∈ h ⊂ a (u , v) = L (v) v Vˆ Vˆ . (1.7) h ∀ ∈ h ⊂ Thus, after transforming the strong form of the problem into the variational coun- terpart, the FEniCS toolchain, starting with UFL, can be invoked to compute a solution.

7Chapters4 and5 presents discontinuous Galerkin formulations for (1.1). 8 Chapter 1. Introduction

UFL code element= FiniteElement("Lagrange", triangle, 1)

u= TrialFunction(element) v= TestFunction(element) f= Coefficient(element) h= Coefficient(element)

a= inner(grad(u), grad(v)) *dx L=f *v*dx+h *v*ds

Figure 1.2: UFL code for the Poisson problem using continuous-piecewise linear Lagrange polynomials on triangles.

1.3.2 Unified Form Language

In order to compute a solution to the variational problem using the FEM, it is neces- sary to discretise the formulation. The Unified Form Language (UFL) (Alnæs et al., 2013; Alnæs, 2012) enables a user to express the discretisation compactly using a notation which resembles the mathematical notation closely. UFL is implemented as a domain-specific embedded language (DSEL) in Python which, among other things, allow users to define custom operators using all features of the Python programming language when writing UFL code. This section presents the most basic features used throughout in this work, while some of the more advanced func- tionality is presented in subsequent chapters as needed. For a detailed description of the language, refer to Alnæs et al.(2013). The Poisson problem in (1.7) can be expressed in UFL by the code shown in Figure 1.2. The first line in the code defines the local finite element basis that spans the discrete function space V on an element T where denotes the standard h ∈ Th Th triangulation of Ω. Generally, finite elements are defined in UFL by their family, cell and degree:

UFL code element= FiniteElement(family, cell, degree) which in the given case, in Figure 1.2, means that the basis is a piecewise continuous linear Lagrange triangle. UFL contains a set of predefined finite element family names, for instance, "Lagrange" as already shown, "Discontinuous Lagrange" (short name "DG") and "Brezzi-Douglas-Marini" (short name "BDM"). The cell argument denotes the polygonal shape of the finite element while the degree argument denotes the degree of the polynomial space. Although valid cell shapes in UFL are: interval, triangle, tetrahedron, quadrilateral and hexahedron. FFC only supports the first three cell shapes at present. Also note that the permitted 1.3. The FEniCS Project 9

Mathematical notation UFL notation Mathematical notation UFL notation A B dot(A, B) A A.dx(i) · ,i A : B inner(A, B) ∂A Dx(A, i) ∂xi AB A B outer(A, B) dA Dn(A) T ≡ ⊗ dn A transpose(A), A grad(A) ∇ A.T A div(A) sym A sym(A) ∇ · tr A A tr(A) ≡ ii det A det(A)

Table 1.1: (Left) Table of tensor algebraic operators. (Right) Table of differential operators. value of cell and degree depend on the choice of finite element family. It is important to realise that UFL is only concerned with the abstract operations related to the finite element function spaces; it is left to the form compiler to support the element families, that is, to generate meaningful code for the representation of elements and forms. For mixed finite element methods, product spaces like:

V = [V V ] V . (1.8) 2 × 2 × 1 can easily be generated by either the MixedElement class or the * operator:

UFL code V_2= FiniteElement("Lagrange", triangle, 2) V_1= FiniteElement("Lagrange", triangle, 1) V= (V _2*V_2)*V_1 W= MixedElement(MixedElement((V _2, V_2)), V_2) meaning that V and W are identical. To create a mixed element in which all the component spaces are identical, the VectorElement can be used:

UFL code V= VectorElement(family, cell, degree, dim=None) where dim defaults to the dimension of the given cell unless explicitly specified. After defining the local finite element basis, the trial function u V , the ∈ h test function v V and the coefficient functions f , g V can be defined in ∈ h ∈ h a straightforward fashion as seen in the code in Figure 1.2. The bilinear and linear forms from (1.3) and (1.4) can then be implemented simply by using the tensor and differential operators defined in UFL, some of which can be seen in Table 1.1. An important thing to note is that the definition of the gradient operator grad(A) of, for instance, a vector valued function in UFL is grad(u) = ∂u /∂x { }ij i j 10 Chapter 1. Introduction

Mathematical notation UFL notation Mathematical notation UFL notation a a / b cos(f) b cos f b a a**b, pow(a,b) sin f sin(f) p f sqrt(f) tan f tan(f) exp f exp(f) arccos f acos(f) ln f ln(f) arcsin f asin(f) f abs(f) arctan f atan(f) | | sign f sign(f)

Table 1.2: (Left) Table of elementary functions. (Right) Table of trigonometric functions. and not grad(u) = ∂u /∂x . The latter operator is, however, provided in { }ij j i UFL by nabla_grad(A). A similar convention applies to the divergence operator where nabla_div(A) is provided as an alternative to div(A). In this work, the operators u and u follow the UFL definition for the gradient and divergence ∇ ∇ · operators, grad(u) and div(u), respectively and should not be confused with the UFL operators nabla_grad(u) and nabla_div(u). To complete the implementation of the variational forms, integration on the R relevant domains must be expressed. In UFL, the integral over the domain I dx Ωk is denoted by I*dx(k) while the integral over the exterior boundary of the domain R I ds is denoted by I*ds(k) where k is the subdomain number and I is a valid ∂Ωk UFL expression. Thus, having completed the implementation of (1.2) in the near- mathematical notation of UFL, the form compiler can be invoked to generate code from the abstract UFL representation. The last two classes of expressions to be presented in this short introduction to UFL are nonlinear scalar functions and geometric quantities. UFL provides a set of nonlinear scalar functions, presented in Table 1.2, which can be applied to, for instance, scalar valued coefficient functions such as f and g in the Poisson example. It is illegal to apply these functions to any test or trial function as this would render the variational form nonlinear in those arguments. Geometric quantities are related to the local finite element cell T. For instance, the coordinate of the integration point currently being evaluated on T (including its boundary) can be accessed via cell.x. Other geometric quantities which are particularly useful in relation to this work are the outward normal to the facet8 currently being evaluated cell.n and the circumradius, the radius of the circumscribed circle of T, cell.circumradius. Basic usage of nonlinear functions and the integration point coordinate is demonstrated later in Section 1.3.5, while the facet normal and circumradius are frequently used

8A facet is a topological entity of a computational mesh of dimension D 1 (codimension 1) where D is the topological dimension of the cells of the computational mesh. Thus− for a triangular mesh, the facets are the edges and for a tetrahedral mesh, the facets are the faces. 1.3. The FEniCS Project 11 when defining discontinuous Galerkin variational forms in Chapter4.

1.3.3 FEniCS Form Compiler As shown in Figure 1.1, the FEniCS Form Compiler (FFC) (Kirby and Logg, 2006, 2007; Logg et al., 2012c; Ølgaard et al., 2008a; Ølgaard and Wells, 2010) takes as input a variational form specified in UFL and generates as output C++ code which conforms to the UFC interface, to be described in Section 1.3.4. Central to the finite element method is the assembly of sparse tensors, described in Section 1.3.5, which relies on the computation of the local element tensor AT as well as the local-to- global mapping ιT. Although it is possible to hand code AT and ιT, the process is both tedious and error-prone especially for complex problems. This issue is eliminated by letting FFC generate the code automatically. Introducing a compiler also provides the possibility of applying various optimisation strategies for efficient computation of AT, which would normally not be feasible when developing code by hand. The automated code generation for the general and efficient solution of finite element variational problems is one of the key features of FEniCS. There are three different interfaces to FFC: a Python interface, a just-in-time (JIT) compilation interface and a command-line interface. Only the latter is presented here while details on the other two interfaces can be found in Logg et al.(2012c). The command-line interface takes a UFL form file or a list of form files as input:

Bash code $ ffc Poisson.ufl

The form file contains the UFL specification of elements and/or forms, as for instance the code from Figure 1.2 which in this case is saved in the file Poisson.ufl. The content of a form file is wrapped in a Python script and then executed for further processing in FFC. There exist a number of optional command-line options to control the code generation. Related to this work, the most important options are: -l language This parameter controls the output format for the generated code. The default value is “ufc”, which indicates that the code is generated according to the UFC specification. Alternatively, the value “dolfin” may be used to generate code according to the UFC format with a small set of additional DOLFIN-specific wrappers.

-r representation This parameter controls the representation used for the gener- ated element tensor code. There are three possibilities: “auto” (the default), “quadrature” and “tensor”. FFC implements two different approaches to code generation. One is based on traditional quadrature and another on a special tensor representation. This will be discussed in Section 3.2. In the case “auto”, FFC will try to select the better of the two representations; that 12 Chapter 1. Introduction

is, the representation that is believed to yield the best run-time performance for the problem at hand. This issue is addressed in detail in Section 3.6.

-O If this option is used, the code generated for the element tensor is optimised for run-time performance. The optimisation strategy used depends on the chosen representation. In general, this will increase the time required for FFC to generate code, but should reduce the run-time for the generated code. Note that for very complicated variational forms, hardware limitations can make compilation with some optimisation options impossible. Optimisation strategies are treated in Chapter3. As an illustration of the options presented above, the command:

Bash code $ ffc -l dolfin -r quadrature -O Poisson.ufl will cause FFC to generate code for the Poisson problem, including DOLFIN wrappers using the quadrature representation with the default optimisation. A list of all available command-line parameters can be seen in FFC manual page by typing ‘man ffc’ on the command-line. FFC follows the conventional design of a compiler in that it breaks compilation into several sequential stages. The output generated at each stage serves as input for the following stage, as illustrated in Figure 1.3. Introducing separate stages allows development and improvement of each stage to be implemented without affecting other stages of the compilation. Furthermore, adding new stages and dropping existing stages becomes trivial. Each of the stages involved when compiling a form is described in the following. Compilation of elements follow a similar (but simpler) set of stages, and is not described here. Compiler stage 0: Language (parsing). In this stage, the user-specified form is interpreted and stored as a UFL abstract syntax tree (AST). The actual pars- ing is handled by Python and the transformation to a UFL form object is implemented by operator overloading in UFL. Input: Python code or .ufl file Output: UFL form Compiler stage 1: Analysis. This stage preprocesses the UFL form and extracts form metadata (FormData), such as which elements were used to define the form, the number of coefficients and the cell type (interval, triangle or tetrahedron). This stage also involves selecting a suitable quadrature scheme and representation (as discussed earlier) for the form if these have not been specified by the user. Input: UFL form Output: preprocessed UFL form and form metadata 1.3. The FEniCS Project 13

Figure 1.3: Compilation of Foo.ufl finite element variational forms broken into six se- quential stages: Language, Analysis, Representation, Stage 0 Optimisation, Code gen- Language eration and Code Format- ting. Each stage gen- UFL erates output based on input from the previous Stage 1 stage. The input/output Analysis data consist of a UFL form file, a UFL object, a UFL object and metadata com- UFL + metadata puted from the UFL ob- ject, an intermediate rep- Stage 2 resentation (IR), an opti- Representation mised intermediate repre- sentation (OIR), C++ code IR and, finally, C++ code files (from Logg et al.(2012c)). Stage 3 Optimization

OIR

Stage 4 Code generation

C++ code

Stage 5 Code formatting

Foo.h / Foo.cpp 14 Chapter 1. Introduction

Compiler stage 2: Code representation. Most of the complexity of compilation is handled in this stage which examines the input and generates all data needed for the code generation. This includes generation of finite element basis func- tions, extraction of data for mapping of degrees of freedom, and generation of the form representation, see Section 3.2, which may involve precomputation of integrals. Both representations available in FFC use tabulated values of finite element basis functions and their derivatives at a suitable set of inte- gration points on the reference element. FFC itself does not generate these values, but relies on the library FIAT (Kirby, 2004, 2012) for the computation of basis functions and their derivatives. The intermediate representation is stored as a Python dictionary, mapping names of UFC functions to the data needed for generation of the correspond- ing code. In simple cases, like ufc::form::rank, this data may be a simple number like 2. In other cases, like ufc::cell_tensor::tabulate_tensor, the data may be a complex data structure that depends on the choice of form representation. Input: preprocessed UFL form and form metadata Output: intermediate representation (IR) Compiler stage 3: Optimisation. This stage examines the intermediate representa- tion and performs optimisations. The optimisation strategy depends on the chosen form representation, see Section 3.3 for optimisations pertinent to the quadrature representation. Data stored in the intermediate representation dictionary is then replaced by new data that encode an optimised version of the function in question. Input: intermediate representation (IR) Output: optimised intermediate representation (OIR) Compiler stage 4: Code generation. This stage examines the optimised intermedi- ate representation and generates the actual C++ code for the body of each UFC function. The code is stored as a dictionary, mapping names of UFC functions to strings containing the C++ code. As an example, the data generated for ufc::form::rank may be the string “return 2;”. This demonstrates the importance of separating stages 2, 3 and 4 as it allows stages 2 and 3 to focus on algorithmic aspects related to finite elements and variational forms, while stage 4 is concerned only with generating C++ code from a set of instructions prepared in earlier compilation stages. Input: optimised intermediate representation (OIR) Output: C++ code Compiler stage 5: Code formatting. This stage examines the generated C++ code and formats it according to the UFC format, generating as output one or more 1.3. The FEniCS Project 15

ufc::mesh ufc::function ufc::cell_integral ufc::cell ufc::finite_element ufc::exterior_facet_integral ufc::form ufc::dofmap ufc::interior_facet_integral

Table 1.3: C++ classes defined in the UFC interface.

.h/.cpp files conforming to the UFC specification. This is where the actual writing of C++ code takes place and the stage relies on templates for UFC code available as part of the UFC module ufc_utils. Input: C++ code Output: C++ code files The interface to the code which is generated by FFC is discussed in the following section.

1.3.4 Unified Form-assembly Code The purpose of Unified Form-assembly Code (UFC) (Alnæs et al., 2009, 2012) is to provide an interface between the problem-specific code generated by form compilers and general-purpose problem solving environments like DOLFIN (described in Section 1.3.5) which implements, among other things, the finite element assembly algorithm. In contrast to other FEniCS components, few changes are made to UFC in order maintain a stable interface between form compilers and DOLFIN. This section gives a brief introduction to the interface, with emphasis on the functions relevant for this work. Furthermore, the UFC numbering convention for mesh entities is discussed. The UFC interface provides a small set of abstract C++ classes, shown in Ta- ble 1.3 which are commonly used for assembling finite element tensors. The mesh and cell classes are simple data structures that provide information such as the geometric dimension and the topological dimension. In addition, the cell class provides an array of global indices for the mesh entities belonging to the given cell (cell.mesh_entities) and an array of coordinates of the vertices of the cell (cell.coordinates). The classes function and finite_element define interfaces for general tensor-valued functions and finite elements respectively. The form class defines an interface for assembly of the global tensor correspond- ing to the given form. This includes functions to create finite_element, dofmap and integral objects (ufc::cell_integral, ufc::exterior_facet_integral and ufc::interior_facet_integral) of the variational form. Of particular interest in relation to this work are the dofmap and integral classes. The local-to-global degree of freedom mapping on the finite element cell T, ιT, is computed by the dofmap::tabulate_dofs function for which the UFC interface is defined as: 16 Chapter 1. Introduction

C++ code /// Tabulate the local-to-global mapping of dofs ona cell virtual void tabulate_dofs(unsigned int* dofs, const mesh& m, const cell& c) const = 0; where dofs is a pointer to an array for the tabulated values on T. UFC only provides the interface of this function, it is not concerned with computing ιT. The code to compute ιT must be generated by the form compiler. For example, FFC will generate the following code for linear Lagrange elements on triangles.

C++ code /// Tabulate the local-to-global mapping of dofs ona cell virtual void tabulate_dofs(unsigned int* dofs, const ufc::mesh& m, const ufc::cell& c) const { dofs[0]= c.entity _indices[0][0]; dofs[1]= c.entity _indices[0][1]; dofs[2]= c.entity _indices[0][2]; }

Note that FFC associates each degree of freedom with the global vertex number which can be extracted from the cell::entity_indices array. For discontinuous linear Lagrange elements on triangles the generated code is

C++ code /// Tabulate the local-to-global mapping of dofs ona cell virtual void tabulate_dofs(unsigned int* dofs, const ufc::mesh& m, const ufc::cell& c) const { dofs[0]=3 *c.entity_indices[2][0]; dofs[1]=3 *c.entity_indices[2][0]+ 1; dofs[2]=3 *c.entity_indices[2][0]+ 2; } because FFC considers all degrees of freedom local to the given element and therefore compute degree of freedom numbers based on the global cell index. The local finite element tensor is computed inside the tabulate_tensor function which is implemented by all three integral classes although the interface varies slightly. For the cell_integral, the interface is

C++ code /// Tabulate the tensor for the contribution froma local cell virtual void tabulate_tensor(double* A, const double * const * w, const cell& c) const = 0; where A is a pointer to an array which will hold the values of the local ele- ment tensor and w contains nodal values of any coefficient functions present 1.3. The FEniCS Project 17

v2

Vertex Coordinates v0 x = (0, 0) v1 x = (1, 0) v2 x = (0, 1)

v0 v1

Figure 1.4: The UFC reference triangle and the coordinates of the vertices. in the integral. The code which FFC generates for this function varies depend- ing on, for example, the choice of representation and optimisation, issues which are discussed in Chapter3. (Figures 3.2 and 3.3, on page 63 and 65 respec- tively, show examples of code generated by FFC for this function.) The inter- face for exterior_facet_integral::tabulate_tensor is similar in nature to the interface for interior_facet_integral::tabulate_tensor which is discussed in Section 4.1.2 in connection to automation of discontinuous Galerkin methods. The UFC specification also defines a numbering scheme for mesh entities which allows form compilers to access necessary data consistently when generating code, for example, for computing the local tensors and local-to-global mapping as discussed above. Important aspects of this numbering scheme are summarised in the following for triangular cells. Further details on the UFC numbering convention can be found in Alnæs et al.(2012). The UFC reference triangle, including the coordinates of the three vertices, is shown in Figure 1.4. Mesh entities are identified by the tuple (d, i) where d is the topological dimension of the mesh entity and i is a unique global index of the mesh entity. For convenience, mesh entities of topological dimension 0 are referred to as vertices, entities of dimension 1 are referred to as edges and entities of dimension 2 are referred to as faces. Mesh entities of topological dimension D 1 (codimension 1), − with D denoting the topological dimension of the cells of the computational mesh, are referred to as facets. Thus for a triangular mesh, the facets are the edges and for a tetrahedral mesh, the facets are the faces. Following this convention, the vertices of a triangle are identified as v0 = (0, 0), v1 = (0, 1) and v2 = (0, 2), the edges (facets) are e0 = (1, 0), e1 = (1, 1) and e2 = (1, 2), and the cell itself is c0 = (2, 0). The vertices of simplicial cells (intervals, triangles and tetrahedra) are numbered locally based on the corresponding global vertex numbers such that a tuple of increasing local vertex numbers corresponds to a tuple of increasing global vertex numbers. This is illustrated for a simple mesh in Figure 1.5. The remaining mesh entities are numbered within each topological dimension based on a lexicographical 18 Chapter 1. Introduction

2 3

v1 v2 v2

v0 v0 v1 0 1

Figure 1.5: Local vertex numbering of simplicial mesh based on global vertex numbers.

Entity Incident vertices Non-incident vertices v0 = (0, 0)(v0)(v1, v2) v1 = (0, 1)(v1)(v0, v2) v2 = (0, 2)(v2)(v0, v1) e0 = (1, 0)(v1, v2)(v0) e1 = (1, 1)(v0, v2)(v1) e2 = (1, 2)(v0, v1)(v2) c0 = (2, 0)(v0, v1, v2) ∅

Table 1.4: Local numbering of mesh entities on triangular cells.

ordering of ordered tuples of non-incident vertices. For example, the first edge, e0, of a triangle is located opposite vertex v0 as shown in Figure 1.6a. The numbering of mesh entities on triangular cells is shown in Table 1.4. The relative ordering of mesh entities with respect to other incident mesh entities follows by sorting the entities by their indices. Therefore, the pair of vertices incident to edge e0 in Figure 1.6a is (v1, v2), not (v2, v1). Due to the vertex numbering convention, this means that two incident simplicial cells will always agree on the orientation of incident subsimplices (for instance facets). This is demonstrated in Figure 1.6b, which shows two incident triangles which agree on the orientation of the common edge. This feature is advantageous when generating code for discontinuous Galerkin methods, as will be demonstrated in Chapter4.

1.3.5 DOLFIN Up until now, only the variational form and finite element discretisation has been defined. To obtain a solution to the boundary value problem in (1.1) the computational domain and boundary conditions must be specified which in the 1.3. The FEniCS Project 19

2 3 v v v2 1 2 v2

e2 e0 e0

v0 v0 v1

v0 v1 0 1 (a) Edges are numbered based on the non- (b) Orientation of facets (edges) are defined incident vertex. Therefore, e0 is located op- by the ordered tuple of incident vertices thus posite vertex v0. e0 = (v0, v2) and e2 = (v0, v1).

Figure 1.6: Edge numbering and orientation based on sorted tuples of incident and non-incident vertices. As a consequence two incident triangles will always agree on the orientation of the common facet for simplicial cells. context of FEniCS is handled via a component called DOLFIN, a C++/Python library, which also provides algorithms for finite element assembly and linear algebra functionality to solve the arising system of equations. DOLFIN provides a problem solving environment and is the main user interface to FEniCS. A detailed presentation of DOLFIN is outside the scope of this work but can be found in Logg and Wells(2010) and Logg et al.(2012d). The necessary DOLFIN functionality to implement a complete solver for the Poisson problem is presented. The intention is to give an impression of the possibilities that are offered by DOLFIN and an understanding of the basic concepts that are developed and used in subsequent chapters. For the model problem under consideration the domain Ω = [0, 1] [0, 1], in which the source term f = × 8π2 sin(2πx) sin(2πy) is present, is subjected to homogeneous Dirichlet boundary conditions, g = 0 on ΓD = ∂Ω. A complete C++ solver for this problem is shown in Figures 1.7 and 1.8. The first line in Figure 1.7 includes the DOLFIN library, while the second line includes the UFC conforming code generated by FFC based on the UFL input for the Poisson problem shown in Figure 1.2. Then follows the definition of the class Source which is a subclass of the Expression class. An Expression represents a function that can be evaluated on a finite element space and to suit this purpose it implements an eval function. This function takes as arguments an array of values which holds the return values and an array x which contains the coordinates of the point where the Expression is currently being evaluated. The Source class overloads the eval function, which in this case simply inserts the value 8π2 sin(2πx) sin(2πy) into the 20 Chapter 1. Introduction

C++ code #include #include "Poisson.h"

using namespace dolfin;

// Source term class Source: public Expression { void eval(Array& values, const Array& x) const { values[0]=8 *pow(DOLFIN_PI, 2)*sin(2*DOLFIN_PI*x[0])*sin(2*DOLFIN_PI*x[1]); } };

// Sub domain for Dirichlet boundary condition class DirichletBoundary: public SubDomain { bool inside(const Array& x, bool on_boundary) const { return on_boundary; } };

Figure 1.7: Implementation of source term and Dirichlet boundary for the C++ solver for the boundary value problem in (1.1). Program continues in Figure 1.8. 1.3. The FEniCS Project 21 values array. Next follows the definition of the class DirichletBoundary, a subclass of SubDomain, for the part of the boundary where Dirichlet boundary conditions are to be applied. The SubDomain class implements the function inside which eval- uates to true or false depending on whether or not the point given by coordinates x is part of the subdomain. In addition to the argument x, the inside function also takes the argument on_boundary, a boolean value, supplied by DOLFIN, which is true if the point x is located on ∂Ω. In the given case, the Dirichlet condition is indeed applied on ∂Ω which means that the overloaded inside function can simply be implemented by returning the on_boundary argument. The remaining part of the C++ solver, the main function, is shown in Figure 1.8. The first line defines the computational mesh and consists of 2048 triangles as the unit square is divided into 32 32 cells and each cell is divided into two 2 × triangles. DOLFIN provides functionality for creating simple meshes through the classes: UnitInterval, UnitSquare, UnitCube, UnitCircle, UnitSphere, Interval, Rectangle and Box which are useful for testing. For ‘real’ applications, a user can read a mesh from file in the following way:

C++ code Mesh mesh("mesh.xml"); provided that the mesh is saved in the DOLFIN XML format. Meshes can be generated by external libraries, such as Gmsh (http://geuz.org/gmsh/), stored in the Gmsh data format and converted by the dolfin-convert script to the DOLFIN XML data format. Next, the FunctionSpace is defined for the finite element function space Vh in (1.7). A function space is represented by a Mesh, a DofMap and a FiniteElement. The DofMap and FiniteElement classes are generated by FFC based on the element definition in Figure 1.2. However, by including the ‘-l dolfin’ option when compiling the UFL input with FFC:

Bash code ffc -l dolfin Poisson.ufl the DOLFIN wrappers are generated, permitting a user to instantiate a function space simply by providing the mesh as argument to the constructor. The next three lines define an object for the Dirichlet boundary condition u = g = 0 on the boundary ΓD defined by the DirichletBoundary class from Figure 1.7. The value g = 0 is simply represented as a constant. Then follows the creation of the bilinear and linear forms of the Poisson prob- lem using the function space V as argument. The Poisson::BilinearForm and Poisson::LinearForm classes are part of the code in Poisson.h generated by FFC from the UFL input in Figure 1.2. Note how the coefficients f and h are defined 22 Chapter 1. Introduction

C++ code int main() { // Create mesh and function space UnitSquare mesh(32, 32); Poisson::FunctionSpace V(mesh);

// Define boundary condition Constant g(0.0); DirichletBoundary boundary; DirichletBC bc(V, g, boundary);

// Define variational forms Poisson::BilinearForm a(V, V); Poisson::LinearForm L(V); Source f; L.f= f; Constant h(0.0); L.h= h;

// Compute solution Function u(V); Matrix A; Vector b; assemble(A, a); assemble(b, L); bc.apply(A, b); solve(A, *u.vector(), b);

// Save solution in PVD format File file("poisson.pvd"); file<< u;

// Plot solution plot(u);

return 0; }

Figure 1.8: Continuation from Figure 1.7 of C++ code for the Poisson boundary value problem. 1.3. The FEniCS Project 23 and attached to the linear form. The coefficient f is defined by the class Source as shown in Figure 1.7, while the coefficient h, the Neumann boundary condition, is zero in the given case. A Function u is then declared to hold the computed solution. The Function class represents a finite element function in V and therefore takes a function space as argument. The function u also holds a vector of values of the degrees of freedom associated with the function. A function is evaluated based on linear combinations of basis functions and the values of this vector. This is in contrast to the Expression class which is evaluated by overloading the eval function as seen in Figure 1.7. To compute a solution for u which satisfies the variational problem, defined by the bilinear and linear forms a and L, the following three steps are applied. Firstly, the bilinear and linear forms a and L are assembled into the Matrix A and the Vector b by calling the free function assemble which implements an algorithm to assemble finite element variational forms. The assembly algorithm will be presented later in this section. Secondly, the Dirichlet boundary condition is applied to the linear system of equations using the apply member function of the DirichletBC object bc. Thirdly, after applying the boundary condition, the system of equations can be solved by calling the free function solve which solves linear systems on the form Ax = b using the assembled matrix A, the vector of degree of freedom values from u and the assembled vector b as arguments. As an alternative to the three steps outlined above, the solve function provides functionality to solve variational problems in a straightforward fashion namely by:

C++ code solve(a== L, u, bc); which automatically assembles the system, applies the boundary conditions and solves the linear system which is stored in the function u. Finally, the solution is saved in ParaView Data (PVD) format (http://www. paraview.org/) for external post processing and plotted by the built-in plot com- mand of DOLFIN which enables a quick visual inspection of the computed solution. The computed solution to the Poisson boundary value problem is shown in Fig- ure 1.9.

Python interface

As already mentioned, DOLFIN also provides a Python interface as an alternative to the C++ interface. Most of the Python interface is generated automatically from the C++ interface using SWIG (http://www.swig.org/). In addition, the Python interface offers seamless integration with UFL and FFC through just-in-time compilation of variational forms and elements which, in combination with the expressiveness of Python, allows solvers to be implemented very compactly. For 24 Chapter 1. Introduction

Figure 1.9: Computed solution to the Poisson boundary value problem. The warped scalar field u in the figure on the right has been scaled by a factor of 0.5. this reason, the Python interface to DOLFIN is preferred, whenever feasible, for the examples presented in this thesis. As an example, the complete solver for the Poisson boundary value problem using the Python interface is shown in Figure 1.10. The code is very similar to the C++ code in Figures 1.7 and 1.8 and the differences are mainly due to the difference in Python and C++ syntax. The two main differences are the definition of the FunctionSpace and the definition of the variational forms which are implemented directly as part of the solver and not in a separate file. Also note that the UFL coordinates have been used to implement the source term f directly as part of the variational formulation. It could also be implemented by subclassing the Expression class and overloading the eval function in a similar way to the approach in the C++ example:

Python code class Source(Expression): def eval(self, values, x): values[0]=8 *pi**2*sin(2*pi*x[0])*sin(2*pi*x[1]) f= Source()

As an alternative, it could be implemented by:

Python code f= Expression("8 *pow(pi,2)*sin(2*pi*x[0])*sin(2*pi*x[1])") where the string argument to the Expression class is given in C++ syntax which is automatically just-in-time compiled in order to evaluate the Expression. Compared to the subclassing approach, this is more efficient as the callback to the eval function 1.3. The FEniCS Project 25

Python code from dolfin import *

# Create mesh and define function space mesh= UnitSquare(32, 32) V= FunctionSpace(mesh,"Lagrange", 1)

# Define Dirichlet boundary def boundary(x, on_boundary): return on_boundary

# Define boundary condition g= Constant(0.0) bc= DirichletBC(V, g, boundary)

# Define variational problem u= TrialFunction(V) v= TestFunction(V) x= V.cell().x f=8 *pi**2*sin(2*pi*x[0])*sin(2*pi*x[1]) a= inner(grad(u), grad(v)) *dx L=f *v*dx

# Compute solution U= Function(V) solve(a== L, U, bc)

# Save solution in PVD format file= File("poisson.pvd") file<

# Plot solution plot(U, interactive=True)

Figure 1.10: Complete Python solver for the boundary value problem in (1.1). 26 Chapter 1. Introduction will take place in C++ rather than Python.

Assembly algorithm

To conclude this short introduction to the FEniCS Project, the assembly algorithm, implemented in the DOLFIN assemble function, is presented. The presentation is given for the assembly of the rank two tensor corresponding to the bilinear form of the Poisson problem in (1.2). A generalisation of the algorithm for multilinear forms is given in Alnæs et al.(2009) and Logg et al.(2012b). Setting the function space Vˆ = V, the tensor A which arises from assembling the bilinear form a is defined by  AI = a φI2 , φI1 , (1.9)

 N where I = (I1, I2) is a multi-index and φk k=1 is a basis for V. The tensor A is a sparse rank two tensor, a matrix, of dimensions N N. The matrix A is computed × by iterating over the cells of the mesh and adding the contribution from each local cell to the global matrix A. In this case, from (1.3), the local cell tensor AT is defined as: Z  T T  AT,i = aT φi , φi = u v dx, (1.10) 2 1 T ∇ · ∇ where i = (i1, i2) is a multi-index AT,i is the ith entry of the cell tensor AT, aT is 3 n To the local contribution to the form from a cell T h and φ is the local finite ∈ T k k=1 element basis for V on T, which is linear Lagrange elements on triangles in this case. To formulate the assembly algorithm, a local-to-global mapping of degrees of freedom is needed. Let ι : denote the collective local-to-global mapping T IT → I for each T ∈ Th   ι (i) = ι1 (i ), ι2 (i ) i , (1.11) T T 1 T 2 ∀ ∈ IT j where ι : [1, 3] [1, N] denotes the local-to-global mapping for each discrete T → function space V and is the index set j IT 2  T = ∏[1, 3] = (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3) . (1.12) I j=1

That is, ιT maps a tuple of local degrees of freedom to a tuple of global degrees of freedom. DOLFIN calls the tabulate_tensor and tabulate_dofs functions presented in Section 1.3.4, in order to compute the local contribution aT and the j local-to-global mapping ιT for each discrete function space from which DOLFIN constructs the collective local-to-global mapping ιT. 1.3. The FEniCS Project 27

The assembly of the matrix A can now be carried out efficiently by iterating over all cells T . On each cell T, the cell tensor A is computed and then added ∈ Th T to the global tensor A as outlined in Algorithm1. The algorithm can be extended

Algorithm 1 Assembly algorithm. A = 0 for T ∈ Th (1) Compute ιT (2) Compute AT (3) Add AT to A according to ιT: for i ∈ IT =+ AιT (i) AT,i end for end for to handle assembly over exterior and interior facets, the latter is demonstrated in Section 4.1.4.

2 FEniCS applications to solid mechanics

One of the goals of this work is to tackle complicated solid mechanics models using automated modelling tools. In the previous chapter it was shown how automated modelling could be employed to solve the finite element variational formulation of a Poisson boundary value problem. The Poisson problem provides a simple platform for introducing the concepts behind automated modelling as it is implemented in the FEniCS framework. However, from the simple presentation it may not be immediately clear how more complex problems, like plasticity, can be solved. A natural step is, therefore, to apply the concept of automated modelling to some standard solid mechanics problems. Solid mechanics problems typically involve the standard momentum balance equation, posed in a Lagrangian setting, with different models distinguished by the choice of nonlinear or linearised kinematics, and the constitutive model for determining the stress. The traditional development approach to solid mechanics problems, and traditional finite element codes, places a strong emphasis on the implementation of constitutive models at the quadrature point level. Automated methods, on the other hand, tend to stress more heavily the governing balance equations. Widely used finite element codes for solid mechanics applications provide application programming interfaces (APIs) for users to implement their own constitutive models. The interface supplies kinematic and history data, and the user code computes the stress tensor, and when required also the linearisation of the stress. Users of such libraries will typically not be exposed to code development other than via the constitutive model API. In addition to demonstrating how solid mechanics problems can be solved using automation tools, this chapter presents some of the models that will be further investigated and extended in subsequent chapters. It is not intended as a comprehensive treatment of solid mechanics problems, but should be viewed as a stepping stone towards implementation of classes of plasticity models. The chapter also focuses on some pertinent issues that arise due to the nature of the constitutive models. These issues, and solid mechanics problems in general, have motivated a number of developments in the FEniCS framework. The common problems of linearised elasticity, plasticity, hyperelasticity and elastic wave propagation are considered. Topics that are addressed in this chapter 30 Chapter 2. FEniCS applications to solid mechanics via these problems include ‘off-line’ computation of stress updates, linearisation of problems with off-line stress updates, automatic differentiation and time stepping for problems with second-order time derivatives. The presentation starts with the relevant governing equations and the constitutive models under consideration. The important issue of solving and linearising problems in which the governing equation is expressed in terms of the stress tensor (rather than explicitly in terms of the displacement field, or derivatives of the displacement field), and the stress tensor is computed via a separate algorithm is then addressed. These topics are then followed by a number of examples that demonstrate implementation approaches of the described models. To conclude the chapter, which is primarily based on the work in Ølgaard and Wells(2012a); Ølgaard et al.(2008b), extensions of the FEniCS framework that are particular interesting with respect to solid mechanics problems, and consequently to this work, are summarised.

2.1 Governing equations

2.1.1 Preliminaries

The considered problems will be posed on a polygonal domain Ω Rd, where ⊂ 1 d 3. The boundary of Ω, denoted by ∂Ω, is decomposed into regions Γ ≤ ≤ D and Γ such that Γ Γ = ∂Ω and Γ Γ = ∅. The outward unit normal N D ∪ N D ∩ N vector on ∂Ω will be denoted by n. For time-dependent problems, a time interval of interest I = (0, T] will be considered, where superimposed dots denote time derivatives. The current configuration of a solid body is denoted by Ω; that is, the domain Ω depends on the displacement field. It is sometimes convenient to also define a reference domain Ω Rd that remains fixed. For convenience, cases in 0 ⊂ which Ω and Ω0 coincide at time t = 0 are considered. To indicate boundaries, outward unit normal vectors, and other quantities relative to Ω0, the subscript ‘0’ will be used. When considering linearised kinematics, the domains Ω and Ω0 are both fixed and coincide at all times t. A triangulation of the domain Ω will be denoted by , and a triangulation of the domain Ω will be denoted by . Th 0 Th,0 The governing equations for the different models will be formulated in the common framework of: find u V such that ∈ F(u; w) = 0 w V, (2.1) ∀ ∈ where F : V V R is linear in w and V is a suitable function space. If F is also × → linear in u, then F can be expressed as

F(u; w) := a(u, w) L(w), (2.2) − 2.1. Governing equations 31 where a : V V R is linear in u and in w, and L : V R is linear in w. For this × → → case, the problem can be cast in the canonical setting of: find u V such that ∈ a(u, w) = L(w) w V, (2.3) ∀ ∈ which is identical to the form in (1.2). For nonlinear problems, a Newton method is typically employed to solve (2.1). Linearising F about u = u0 leads to a bilinear form,

dF (u0 + edu; w) a (du, w) := Du0 F (u0; w)[du] = , (2.4) de e=0 and a residual given by: L(w) := F(u0, w). (2.5) Using the definitions of a and L in (2.4) and (2.5), respectively, a Newton step involves solving a problem of the type in (2.3), followed by the correction u 0 ← u du. The process is repeated until (2.1) is satisfied to within a specified tolerance. 0 −

2.1.2 Balance of momentum

The standard balance of linear momentum problem for the body Ω reads:

ρu¨ σ = b in Ω I, (2.6) − ∇ · × u = g on Γ I, (2.7) D × σn = h on Γ I, (2.8) N × u (x, 0) = u0 in Ω, (2.9)

u˙(x, 0) = v0 in Ω, (2.10) where ρ : Ω I R is the mass density, u : Ω I Rd is the displacement field, × → × → σ : Ω I Rd Rd is the symmetric Cauchy stress tensor, b : Ω I Rd is a × → × × → body force, g : Ω I Rd is a prescribed boundary displacement, h : Ω I Rd × → × → is a prescribed boundary traction, u : Ω Rd is the initial displacement and 0 → v : Ω Rd is the initial velocity. To complete the boundary value problem, a 0 → constitutive model that relates σ to u is required. To develop finite element models, it is necessary to cast the momentum balance equation in a weak form by multiplying the balance equation (2.6) by a weight function w and integrating. It is possible to formulate a space-time method by considering a weight function that depends on space and time, and then integrating over Ω I. However, it is far more common in solid mechanics applications to × consider a weight function that depends on spatial position only and to apply finite difference methods to deal with time derivatives. Following this approach, at a time t I equation (2.6) is multiplied by a function w (w is assumed to satisfy ∈ 32 Chapter 2. FEniCS applications to solid mechanics w = 0 on ΓD) and integrate over Ω: Z Z Z ρu¨ w dx ( σ) w dx b w dx = 0. (2.11) Ω · − Ω ∇ · · − Ω · Applying integration by parts, using the divergence theorem and inserting the boundary condition (2.8), equation (2.11) can be expressed on the form of (2.2) as: Z Z Z Z F := ρu¨ w dx + σ : w dx h w ds b w dx = 0. (2.12) Ω · Ω ∇ − ΓN · − Ω ·

In this section, the momentum balance equation has been presented on the current configuration Ω. It can also be posed on the fixed reference domain Ω0 via a pull-back operation. However, for the particular presentation which is used in this chapter for geometrically nonlinear models details of the pull-back will not be needed.

2.1.3 Potential energy minimisation An alternative approach to solving static problems (problems without an inertia term) is to consider the minimisation of potential energy. This approach leads to the same governing equation when applied to a standard problem, but may be a preferable framework for problems that are naturally posed in terms of stored energy densities and for which external forcing terms are conservative (see Holzapfel(2000, p. 159) for an explanation of conservative loading), and for problems that involve coupled physical phenomena that are best described energetically. Consider a system for which the total potential energy Π associated with a body can be expressed as Π = Πint + Πext, (2.13) where Πint is the internal potential energy stored in Ω and Πext is the energy associated with external forces acting on the domain Ω. An internal potential energy functional of the form Z Πint = Ψ0 (v) dx, (2.14) Ω0 where Ψ0 is the stored strain energy density on the reference domain, and an external potential energy functional of the form Z Z Πext = b0 v dx h0 v ds, (2.15) − Ω0 · − Γ0,N · are considered. It is the form of the stored energy density function Ψ0 that defines 2.2. Constitutive models 33 a particular constitutive model. For later convenience, the potential energy terms have been presented on the reference domain Ω0. A minimiser u of (2.13) minimises the potential energy:

min Π, (2.16) v V ∈ where V is a suitably defined function space. Minimisation of Π corresponds to the directional derivative of Π being zero for all possible variations of u. Therefore, minimisation of Π corresponds to solving (2.1) with

dΠ (u + ew) F (u; w) := DuΠ (u)[w] = . (2.17) de e=0 For suitable definitions of the stress tensor, it is straightforward to show that minimising Π is equivalent to solving the balance of momentum problem, for the static case.

2.2 Constitutive models

A constitutive model describes the relationship between stress and deformation. The stress can be defined explicitly in terms of primal functions like the displacement field for linearised elasticity, it can be implicitly defined via stored energy density functions, or it can be defined as the solution to a secondary problem for instance the yield criterion in the case of plasticity. The constitutive model can be either linear or nonlinear. In the following sections examples of these cases are presented in the form of linearised elasticity, plasticity and hyperelasticity. The expressions for the stress or stored energy density presented in this section can be inserted into the balance equations or the minimisation framework in the preceding section to yield a governing equation.

2.2.1 Linearised elasticity

For linearised elasticity, the stress tensor as a function of the strain tensor for an isotropic, homogeneous material is given by

σ = 2µε + λtr(ε)I, (2.18)   where ε = u + ( u)T /2 is the strain tensor, µ and λ are the Lamé parameters ∇ ∇ and I is the second-order identity tensor. The relationship between the stress and the strain can also be expressed as

σ = : ε, (2.19) C 34 Chapter 2. FEniCS applications to solid mechanics where   = µ δ δ + δ δ + λδ δ , (2.20) Cijkl ik jl il jk ij kl and δij is the Kronecker-Delta.

2.2.2 Flow theory of plasticity The standard flow theory model of plasticity is considered, and only the background necessary to support the examples will be presented. In depth coverage can be found in many textbooks, such as Lubliner(2008) and Simo and Hughes(1998). For a geometrically linear plasticity problem, the stress tensor is computed by

σ = : εe, (2.21) C where εe is the elastic part of the strain tensor. It is assumed that the strain tensor can be decomposed additively into elastic and plastic parts:

ε = εe + εp. (2.22)

If εe can be determined, then the stress can be computed. The stress tensor in classical plasticity models must satisfy the yield criterion:  f (σ, εp, κ) := φ σ, q (εp) q (κ) σ 0, (2.23) kin − iso − y 6 p  where φ σ, qkin (ε ) is a scalar effective stress measure, qkin is a stress-like internal variable used to model kinematic hardening, qiso is a scalar stress-like term used to model isotropic hardening, κ is a scalar internal variable and σy is the initial scalar yield stress. For the commonly adopted von Mises model (also known as J2-flow) with linear isotropic hardening, φ and qiso read: r 3 φ (σ) = s s , (2.24) 2 ij ij qiso (κ) = Hκ, (2.25) where s = σ σ δ /3 is the deviatoric stress and the constant scalar H > 0 is a ij ij − kk ij hardening parameter. In the flow theory of plasticity, the plastic strain rate is given by:

∂g ε˙p = λ˙ , (2.26) ∂σ where λ˙ is the rate of the plastic multiplier and the scalar g is known as the plastic potential. In the case of associative plastic flow, g = f . The term λ˙ determines the magnitude of the plastic strain rate, and the direction is given by ∂g/∂σ. For 2.2. Constitutive models 35 isotropic strain-hardening, it is usual to set r 2 p p κ˙ = ε˙ ε˙ , (2.27) 3 ij ij which for associative von Mises plasticity implies that κ˙ = λ˙ . A feature of the flow theory of plasticity is that the constitutive model is postulated in a rate form. This requires the application algorithms to compute the stress from increments of the total strain. A discussion of algorithmic aspects on how the stress tensor can be computed from the equations presented in this section is postponed to Section 2.4.2.

2.2.3 Hyperelasticity

Hyperelastic models are characterised by the existence of a stored strain energy density function Ψ0. The linearised model presented at the start of this section falls within the class of hyperelastic models. Assuming linearised kinematics, the stored energy function λ Ψ = (tr ε)2 + µε : ε (2.28) 0 2 corresponds to the linearised model in (2.18). It is straightforward to show that using this stored energy function in the potential energy minimisation approach in (2.17) leads to the same equation as inserting the stress from (2.18) into the weak momentum balance equation (2.12). More generally, stored energy functions that correspond to nonlinear models can be defined. A wide range of stored energy functions for hyperelastic models have been presented and analysed in the literature (see, for example, Bonet and Wood(1997) for a selection). In order to present concrete examples, it is necessary to introduce some kinematics, and in particular strain measures. The Green–Lagrange d d strain tensor E is defined in terms of the deformation gradient F : Ω0 I R R  sym × → × and the right Cauchy–Green tensor C : Ω I Rd Rd : 0 × → × F = I + u, (2.29) ∇ C = FTF, (2.30) 1 E = (C I) , (2.31) 2 − where I is the second-order identity tensor. Using E in place of the infinitesimal strain tensor ε in (2.28), the following expression for the strain energy density function is obtained: λ Ψ = (tr E)2 + µE : E, (2.32) 0 2 36 Chapter 2. FEniCS applications to solid mechanics which is known as the St. Venant–Kirchhoff model. Unlike the linearised case, this energy density function is not linear in u (or spatial derivatives of u), which means that when minimising the total potential energy Π, the resulting equations are nonlinear. Other examples of hyperelastic models are the Mooney–Rivlin model:

Ψ = c (I 3) + c (II 3) , (2.33) 0 1 C − 2 C −   where I = tr C and II = 1 I2 tr C2 and the compressible neo-Hookean C C 2 C − model: µ λ Ψ = (I 3) µ ln J + (ln J)2 , (2.34) 0 2 C − − 2 where J = det F. In most presentations of hyperelastic models, one would proceed from the definition of the stored energy function to the derivation of a stress tensor, and then often to a linearisation of the stress for use in a Newton method. This process can be lengthy and tedious. For a range of models, features of UFL will permit problems to be posed as energy minimisation problems, and it will not be necessary to compute an expression for a stress tensor, or its linearisation, explicitly. A particular model can then be posed in terms of a particular expression for Ψ0, as will be demonstrated in the example in Section 2.4.3. It is also possible to follow the momentum balance route, in which case UFL can be used to compute the stress tensor and its linearisation automatically from an expression for Ψ0.

2.3 Linearisation issues for complex constitutive models

Solving problems with nonlinear constitutive models, such as plasticity, using Newton’s method requires linearisation of (2.12). There are two particular issues that deserve attention. The first is that if the stress σ is computed via some algorithm, then proper linearisation of F requires linearisation of the algorithm for computing the stress, and not linearisation of the continuous problem. This point is well known in computational plasticity, and has been extensively studied (Simo and Taylor, 1985). The second issue is that the stress field, and its linearisation, will not in general come from a finite element space. Hence, if all functions are assumed to be in a finite element space, or are interpolated in a finite element space, suboptimal convergence of a Newton method will be observed. This is illustrated in the following sections.

2.3.1 Consistency of linearisation Consider the following one-dimensional problem: Z F (u; w) := σw,x dx, (2.35) Ω 2.3. Linearisation issues for complex constitutive models 37 where the scalar stress σ is a nonlinear function of the strain field u,x, and will be computed via a separate algorithm. A continuous, piecewise quadratic displace- ment field (and likewise for w) is considered. The strain field u,x is computed via an L2-projection onto the space of discontinuous, piecewise linear elements. For the considered spaces, this is equivalent to a direct evaluation of the strain. Because the stress is computed via a separate algorithm based on nodal values from the strain field, it is chosen to also represent the stress using a discontinuous, piecewise linear basis. Since the polynomial degree of the integrand is two, (2.35) can be integrated exactly using two Gauss quadrature points on an element T : ∈ Th 2 2     f := ψT x σ φT x W , (2.36) T,i1 ∑ ∑ α q α i1,x q q q=1 α=1 where q is the integration point index, α is the degree of freedom index for the local basis of σ, ψT and φT denotes the linear and quadratic basis functions on the element T, respectively, and Wq is the quadrature weight at integration point xq. Note that σα is the computed value of the stress at the element node α. To apply a Newton method, the Jacobian (linearisation) of (2.36) is required. ? This will be denoted by AT,i. To achieve quadratic convergence of a Newton method, the linearisation must be exact. The Jacobian of (2.36) is:

? d fT,i1 AT,i := , (2.37) dui2 where ui2 are the displacement degrees of freedom. Because the stress is computed from the strain field u,x, only σα in (2.36) depends on dui2 , and the linearisation of this term reads: dσα dσα dεα dεα = = Dα , (2.38) dui2 dεα dui2 dui2 where Dα is the tangent value at node α. To compute the values of the strain at nodes, εα, from the displacement field, the derivative of the displacement field is evaluated at xα: 3 ε = φT (x ) u . (2.39) α ∑ i2,x α i2 i2=1 Inserting (2.38) and (2.39) into (2.37) yields:

2 2 A? = ψT(x )D φT (x )φT (x )W . (2.40) T,i ∑ ∑ α q α i2,x α i1,x q q q=1 α=1

This is the exact linearisation of (2.36). The linearisation of the weak form (2.35) is now considered, which leads to the 38 Chapter 2. FEniCS applications to solid mechanics bilinear form: Z a(u, w) := Du,x w,x dx, (2.41) Ω where D = dσ/dε is the tangent. As before, D is represented using a discontinuous, piecewise linear basis where the nodal values of D are computed via a separate algorithm. If two quadrature points are used to integrate the form (which is exact for this form), the resulting element matrix is:

2 2 A = ψT(x )D φT (x )φT (x )W . (2.42) T,i ∑ ∑ α q α i2,x q i1,x q q q=1 α=1

The representation of the element matrix in (2.42) is what would be produced by FFC. Equations (2.40) and (2.42) are not identical since φT is being evaluated in i2,x different locations (x = x in general). As a consequence, the bilinear form in q 6 α (2.42) is not an exact linearisation of (2.35), and a Newton method will therefore exhibit suboptimal convergence. For the special case where a continuous, piecewise linear basis is used for u and w and a discontinuous, piecewise constant basis is used for the strain, stress and tangent fields, only one integration point is needed and thus xq = xα which makes the linearisation exact. In general, the illustrated problem arises when some coefficients in a form are computed by a nonlinear operation elsewhere, and then interpolated and evaluated at points that differ from where the coefficient values were computed. This situation is different from the use of nonlinear operators in UFL (see Table 1.2, page 10). An example of such an operator is the ‘ln J’ term in the neo-Hookean model (2.34) where ‘J’ will be computed at quadrature points during assembly after which the operator ‘ln’ is applied to compute ‘ln J’. The linearisation issue highlighted in this section is further illustrated in the following section, as too is a solution in the context of automated modelling that involves the definition of so-called ‘quadrature elements’.

2.3.2 Quadrature elements Before introducing the concept of quadrature elements, a model problem that will be used in numerical examples is presented. Given the finite element space n o V := w H1(Ω), w P (T) T , (2.43) ∈ 0 ∈ k ∀ ∈ Th where Ω R and k 1, the model problem of interest involves: given f V, find ⊂ ≥ ∈ u V such that ∈ Z Z  2 F := 1 + u u,xw,x dx f w dx = 0 w V. (2.44) Ω − Ω ∀ ∈ 2.3. Linearisation issues for complex constitutive models 39

Solving this problem via Newton’s method involves solving a series of linear problems with Z Z  2  L (w) := 1 + un un,xwn,x dx f w dx, (2.45) Ω − Ω Z Z  2  a (dun+1, w) := 1 + un dun+1,xw,x dx + 2unun,xdun+1w,x dx, (2.46) Ω Ω with the update u u du . To draw an analogy with complex constitutive n ← n − n+1 models, the above is rephrased as: Z Z L (w) := σnw,x dx f w dx, (2.47) Ω − Ω Z Z a (dun+1, w) := Cndun+1,xw,x dx + 2unun,xdun+1w,x dx, (2.48) Ω Ω

 2   2  where σn = 1 + un un,x and Cn = 1 + un . Apart from the second term in the bilinear form, the forms now resemble those for a plasticity problem where σ is the ‘stress’, C is the ‘tangent’ and u,x is the ‘strain’. Similar to a plasticity problem, the idea is to compute nodal values of σ and C ‘off-line’, and to supply σ and C as functions in a space W to the forms used in the Newton solution process. To access un,x for use off-line, an approach is to perform an L2-projection of the derivative of u onto a space W. For the example in question, the term 1 + u2 will also be projected onto W. A natural choice would be to make W one polynomial order less that V and discontinuous across cell facets. However, following this approach leads to a convergence rate for a Newton solver that is less than the expected quadratic rate. The reason is that the linearisation that follows from this process is not consistent with the problem being solved as explained in the previous section. To resolve this issue within the context of UFL and FFC, the concept of quadrature elements has been developed1. This special type of element is used to represent ‘functions’ that can only be evaluated at particular points (quadrature points), and cannot be differentiated, but can be integrated (approximately). In the remainder of this section key features of the quadrature element are presented together with a demonstration of its use for the model problem considered above. A quadrature element is declared in UFL by: UFL code element= FiniteElement("Quadrature", tetrahedron, k)

1The concept was introduced in Ølgaard et al.(2008b) although the syntax for declaring a ‘quadrature element’ and the underlying interpretation has changed slightly. Specifically, the argument k used to refer to the number of integration points in each spacial direction of the quadrature scheme, which is different from the current interpretation in which it refers to the polynomial degree that the underlying quadrature rule will be able to integrate exactly. 40 Chapter 2. FEniCS applications to solid mechanics where k is the polynomial degree that the underlying quadrature rule will be able to integrate exactly. The declaration of a quadrature element is similar to the declaration of any other element in UFL, as demonstrated in Section 1.3.2, and it can be used as such, with some limitations. Note, however, the subtle difference that the element order does not refer to the polynomial degree of the finite element shape functions, but instead relates to the quadrature scheme. For ‘sufficient’ integration of a second-order polynomial in three dimensions, FFC will use four quadrature points per cell. FFC interprets the quadrature points of the quadrature element as degrees of freedom where the value of a shape function for a degree of freedom is equal to one at the quadrature point and zero otherwise. This has the implication that a function that is defined on a quadrature element can only be evaluated at quadrature points. Furthermore, it is not possible to take derivatives of functions defined on a quadrature element. The following examples illustrate simple usage of a quadrature element. Con- sider the bilinear form for a mass matrix weighted by a coefficient f that is defined on a quadrature element: Z a (u, w) := f uw dx. (2.49) Ω If the test and trial functions w and u come from a space of linear Lagrange functions, the polynomial degree of their product is two. This means that the coefficient f should be defined as:

UFL code ElementQ= FiniteElement("Quadrature", tetrahedron, 2) f= Coefficient(ElementQ) to ensure appropriate integration of the form in (2.49). The reason for this is that the quadrature element in the form dictates the quadrature scheme that FFC will use for the numerical integration since the quadrature element, as described above, only has nonzero values at points that coincide with the underlying quadrature scheme of the quadrature element. Thus, if the degree of ElementQ is set to one, the form will be integrated using only one integration point, since one point is enough to integrate a linear polynomial exactly, and as a result the form is under-integrated. If quadratic Lagrange elements are used for w and u, the polynomial degree of the integrand is four, therefore the declaration for the coefficient f should be changed to:

UFL code ElementQ= FiniteElement("Quadrature", tetrahedron, 4) f= Coefficient(ElementQ)

As a final demonstration of quadrature elements, consider the DOLFIN code in 2.4. Implementations and examples 41

Iteration CG1/DG0 CG1/Q1 CG2/DG1 CG2/Q2 1 1.114e+00 1.101e+00 1.398e+00 1.388e+00 2 2.161e-01 2.319e-01 2.979e-01 2.691e-01 3 3.206e-03 3.908e-03 2.300e-02 6.119e-03 4 7.918e-07 7.843e-07 1.187e-03 1.490e-06 5 9.696e-14 3.662e-14 2.656e-05 1.242e-13 6 5.888e-07 7 1.317e-08 8 2.963e-10

Table 2.1: Computed relative residual norms after each iteration of the Newton solver for the nonlinear model problem using different elements for V and W. Quadratic convergence is observed when using quadrature elements, and when using piecewise constant functions for W, which coincides with a one-point quadra- ture element. The presented results are computed using the code in Figure 2.1 using the different combinations of function spaces.

Figure 2.1 for solving the nonlinear model problem in (2.44) with a source term f = x2 4, Dirichlet boundary conditions u = 1 at x = 0, continuous quadratic elements − for V, and quadrature elements of degree two for W. NonlinearModelProblem is a subclass of the DOLFIN class NonlinearProblem, which implements the lin- ear form F and the bilinear form J, the derivative or Jacobian of F, according to (2.5) and (2.4) respectively. The DOLFIN class NewtonSolver solves prob- lems expressed in the canonical form of (2.1) based on the information provided by the NonlinearModelProblem object. Further details on the DOLFIN classes NonlinearProblem and NewtonSolver can be found in Logg et al.(2012d). The relative residual norm after each iteration of the Newton solver for four different combinations of spaces V and W is shown in Table 2.1. Continuous, dis- continuous and quadrature elements are denoted by CGk, DGk and Qk respectively where k refers to the polynomial degree as discussed previously. It is clear from the table that using quadratic elements for V requires the use of quadrature elements for W in order to ensure quadratic convergence of the Newton solver.

2.4 Implementations and examples

This section presents implementation examples that correspond to the afore pre- sented models. Where feasible, complete solvers are presented. When this is not feasible, relevant code extracts are presented. Python examples are preferred due to the compactness of the code extracts, however, in the case of plasticity efficiency demands a C++ implementation. It is possible in the future that an efficient Python interface for plasticity problems will be made available via just-in-time compilation. 42 Chapter 2. FEniCS applications to solid mechanics

Python code from dolfin import *

# Sub domain for Dirichlet boundary condition class DirichletBoundary(SubDomain): def inside(self, x, on_boundary): return x[0]< DOLFIN _EPS and on_boundary

# Class for interfacing with the Newton solver class NonlinearModelProblem(NonlinearProblem): def __init__(self, a, L, u, C, S, W, bc): NonlinearProblem.__init__(self) self.a, self.L= a, L self.u, self.C, self.S, self.W, self.bc= u, C, S, W, bc

def F(self, b, x): assemble(self.L, tensor=b) self.bc.apply(b, x)

def J(self, A, x): assemble(self.a, tensor=A) self.bc.apply(A)

def form(self, A, b, x): C= project((1.0+ self.u **2), self.W) self.C.vector()[:]= C.vector() S= project(Dx(self.u, 0), self.W) self.S.vector()[:]= S.vector() self.S.vector()[:]= self.S.vector() *self.C.vector()

# Create mesh and define function spaces mesh= UnitInterval(8) V= FunctionSpace(mesh,"Lagrange", 2) W= FunctionSpace(mesh,"Quadrature", 2)

# Define boundary condition bc= DirichletBC(V, Constant(1.0), DirichletBoundary())

# Define source and functions f= Expression("x[0] *x[0]-4") u, C, S= Function(V), Function(W), Function(W)

# Define variational problems du, w= TrialFunction(V), TestFunction(V) L=S *Dx(w, 0)*dx-f *w*dx a=C *Dx(du, 0)*Dx(w, 0)*dx+2 *u*Dx(u, 0)*du*Dx(w, 0)*dx

# Create nonlinear problem, solver and solve problem= NonlinearModelProblem(a, L, u, C, S, W, bc) solver= NewtonSolver(); solver.solve(problem, u.vector())

Figure 2.1: DOLFIN implementation for the nonlinear model problem in (2.44) with ‘off-line’ computation of terms used in the variational forms. 2.4. Implementations and examples 43

In the code extracts, commentary is only provided for non-trivial aspects as the more generic aspects, such as the creation of meshes, application of boundary conditions and the solution of linear systems, already have been treated in the introduction to the FEniCS Project in Section 1.3.

2.4.1 Linearised elasticity This example is particularly simple since the stress can be expressed as a straightfor- ward function of the displacement field, and the expression for the stress in (2.18) can be inserted directly into (2.12). For the steady case (inertia terms are ignored), a complete solver for a linearised elasticity problem is presented in Figure 2.2. The solver in Figure 2.2 is for a simulation on a unit cube with a source term b = (1, 0, 0) and u = 0 on ∂Ω. A continuous, piecewise quadratic finite element space is used. The expressiveness of the UFL input means that the expressions for sigma and F in Figure 2.2 resemble closely the mathematical expressions used in the text for σ and F. To unify the presentation of linear and nonlinear equations, the problem in Figure 2.2 is presented in terms of F, where the UFL functions lhs (left-hand side) and rhs (right-hand side) have been used to automatically extract the bilinear and linear forms, respectively, from F (Alnæs et al., 2013).

2.4.2 Plasticity The computation of the stress tensor, and its linearisation, for the model outlined in Section 2.2.2 in a displacement-driven finite element model is rather involved. A method of computing point-wise a stress tensor that satisfies (2.23) from the strain, strain increment and history variables is known as a ‘return mapping algorithm’. Return mapping strategies are discussed in detail in Simo and Hughes(1998). A widely used return mapping approach, the ‘closest-point projection’, is summarised below for a plasticity model with linear isotropic hardening. From (2.21) and (2.22) the stress at the end of a strain increment reads:

p σ = : (ε ε ). (2.50) n+1 C n+1 − n+1 p Therefore, given εn+1, it is necessary to determine the plastic strain εn+1 in order to compute the stress. In a closest-point projection method the increment in plastic strain is computed from:

p p ∂g (σ ) ε ε = ∆λ n+1 , (2.51) n+1 − n ∂σ where g is the plastic potential function and ∆λ = λ λ . Since ∂ g is evaluated n+1 − n σ at σn+1, (2.50) and (2.51) constitute as system of coupled equations with unknowns ∆λ and σn+1. In general, the system is nonlinear. To obtain a solution, Newton’s 44 Chapter 2. FEniCS applications to solid mechanics

Python code from dolfin import *

# Create mesh mesh= UnitCube(8, 8, 8)

# Create function space V= VectorFunctionSpace(mesh,"Lagrange", 2)

# Create test and trial functions, and source term u, w= TrialFunction(V), TestFunction(V) b= Constant((1.0, 0.0, 0.0))

# Elasticity parameters E, nu= 10.0, 0.3 mu, lmbda=E/(2.0 *(1.0+ nu)), E *nu/((1.0+ nu) *(1.0- 2.0 *nu))

# Stress sigma=2 *mu*sym(grad(u))+ lmbda *tr(grad(u))*Identity(w.cell().d)

# Governing balance equation F= inner(sigma, grad(w)) *dx- dot(b, w) *dx

# Extract bilinear and linear forms fromF a, L= lhs(F), rhs(F)

# Dirichlet boundary condition on entire boundary c= Constant((0.0, 0.0, 0.0)) bc= DirichletBC(V, c, DomainBoundary())

# Set up PDE and solve u= Function(V) problem= LinearVariationalProblem(a, L, u, bcs=bc) solver= LinearVariationalSolver(problem) solver.parameters["symmetric"]= True solver.solve()

Figure 2.2: DOLFIN solver for a linearised elasticity problem on a unit cube. 2.4. Implementations and examples 45 method is employed as follows, with k denoting the iteration number. First, a ‘trial stress’ is computed: p σ = : (ε ε ). (2.52) trial C n+1 − n Subtracting (2.52) from (2.50) and inserting (2.51), the following equation is ob- tained: ∂g (σ ) R := σ σ + ∆λ : n+1 = 0, (2.53) n+1 n+1 − trial C ∂σ where Rn+1 is the ‘stress residual’. During the Newton iterations this residual is driven towards zero. If the trial stress in (2.52) leads to satisfaction of the yield criterion in (2.23), then σtrial is the new stress and the Newton procedure is terminated. Otherwise, the Newton increment of ∆λ is computed from:

fk Rk : Qk : ∂σ fk dλk = − , (2.54) ∂σ fk : Ξk : ∂σgk + h

h i 1 where Q = I + ∆λ : ∂2 g − , Ξ = Q : and h is a hardening parameter, which C σσ C for the von Mises model with linear hardening is equal to H (the constant hardening parameter). The stress increment is computed from:   ∆σ = dλ : ∂ g R : Q , (2.55) k − kC σ k − k k after which the increment of the plastic multiplier and the stresses for the next iteration can be computed:

∆λk+1 = ∆λk + dλk, (2.56)

σk+1 = σk + ∆σk. (2.57)

The yield criterion is then evaluated again using the updated values, and the proce- dure continues until the yield criterion is satisfied to within a prescribed tolerance. Note that to start the procedure ∆λ0 = 0 and σ0 = σtrial. After convergence is achieved, the consistent tangent can be computed:

Ξ : ∂σg ∂σ f : Ξ Ctan = Ξ ⊗ , (2.58) − ∂σ f : Ξ : ∂σg + h which is used when assembling the global Jacobian (stiffness matrix). The return mapping algorithm is applied at all quadrature points. The closest-point return mapping algorithm described above (Simo and Hughes, 1998) is common to a range of plasticity models that are defined by the form of the functions f and g. The process can be generalised for models with more complicated hardening behaviour. To aid the implementation of different models, a return mapping algorithm and support for quadrature point level history parameters 46 Chapter 2. FEniCS applications to solid mechanics is provided by the FEniCS Solid Mechanics library. The library is implemented in C++ and adopts a polymorphic design, with the base class PlasticityModel providing an interface for users to implement, and thereby supply functions for f , ∂σ f , ∂σg, and ∂σσg. Figure 2.3 shows the public interface of the PlasticityModel class. Supplied with details of f (and possibly g), the library can compute stress updates and linearisations using the closest-point projection method. Computational efficiency is important in the return mapping algorithm as the stress and its linearisation are computed at all quadrature points at each global Newton iteration. Therefore, FEniCS Solid Mechanics relies on the linear algebra library Armadillo (http://arma.sourceforge.net/) to perform the block opera- tions inside the return mapping algorithm to get the benefit of BLAS. Furthermore, the algorithm is executed in C++ rather than in Python. For this reason, the FEniCS Solid Mechanics library provides a C++ interface only at this stage. To reconcile ease and efficiency, it would be possible to use just-in-time compilation for a Python implementation of the PlasticityModel interface, just as DOLFIN presently does for the Expression class (see Logg et al.(2012d)). In the following, the outline of a solver based on the FEniCS Solid Mechanics library is presented. The UFL input for a formulation in three dimensions using a continuous, piecewise quadratic basis is shown in Figure 2.4. Note that the stress and the linearised tangent, s and t, are defined using quadrature elements and supplied as coefficients to the form, line 2, 3 and 7, as they are computed inside the plasticity library. Note also in Figure 2.4 that symmetry has been exploited to flatten the stress and the tangent terms, line 13 and 18. Recall from Section 2.3 that when constitutive updates are computed outside of the form file care must be taken to ensure quadratic convergence of a Newton method. By using quadrature elements in Figure 2.4, it is possible to achieve quadratic convergence during a Newton solve for plasticity problems. The solver is implemented in C++, and Figure 2.5 shows an extract of the most relevant parts of the solver in the context of plasticity. First, the necessary function spaces are created, line 3-5. V is used to define the bilinear and linear forms and the displacement field u, while Vt and Vs are used for the two coefficient spaces: the consistent tangent and the stress, which enter the bilinear and linear forms of the plasticity problem. The forms defining the plasticity problem are then created and the relevant functions are attached to the forms, line 8-12. Then the object defining the plasticity model is created, line 25. The class VonMises is a subclass of the PlasticityModel class shown in Figure 2.3 and it implements functions for f , ∂σ f and ∂σσg. It is constructed with values for the Young’s modulus, Poisson’s ratio, yield stress and linear hardening parameter. This object can then be passed to the constructor of the PlasticityProblem class along with the forms, displacement field u, coefficient functions and boundary conditions, line 28. PlasticityProblem class, a subclass of the DOLFIN class NonlinearProblem, handles the assembly over 2.4. Implementations and examples 47

C++ code class PlasticityModel { public:

/// Constructor PlasticityModel(double E, double nu);

/// Return hardening parameter virtual double hardening_parameter(double eps_eq) const;

/// Equivalent plastic strain virtual double kappa(double eps_eq, const arma::vec& stress, double lambda_dot) const;

/// Value of yield functionf virtual double f(const arma::vec& stress, double equivalent_plastic_strain) const = 0;

/// First derivative off with respect to sigma virtual void df(arma::vec& df_dsigma, const arma::vec& stress) const = 0;

/// First derivative ofg with respect to sigma virtual void dg(arma::vec& dg_dsigma, const arma::vec& stress) const;

/// Second derivative ofg with respect to sigma virtual void ddg(arma::mat& ddg_ddsigma, const arma::vec& stress) const = 0;

};

Figure 2.3: PlasticityModel public interface defined by the plasticity library. Users are required to supply implementations for at least the pure virtual functions. These functions describe the plasticity model. 48 Chapter 2. FEniCS applications to solid mechanics

UFL code 1 element= VectorElement("Lagrange", tetrahedron, 2) 2 elementT= VectorElement("Quadrature", tetrahedron, 2, 36) 3 elementS= VectorElement("Quadrature", tetrahedron, 2, 6) 4 5 u, w= TrialFunction(element), TestFunction(element) 6 b, h= Coefficient(element), Coefficient(element) 7 t, s= Coefficient(elementT), Coefficient(elementS) 8 9 def eps(u): 10 return as_vector([u[i].dx(i) for i in range(3)]\ 11 +[u[i].dx(j)+u[j].dx(i) for i, j in [(0, 1), (0, 2), (1, 2)]]) 12 13 def sigma(s): 14 return as_matrix([[s[0], s[3], s[4]], 15 [s[3], s[1], s[5]], 16 [s[4], s[5], s[2]]]) 17 18 def tangent(t): 19 return as_matrix([[t[i*6+j] for j in range(6)] for i in range(6)]) 20 21 a= inner(dot(tangent(t), eps(u)), eps(w)) *dx 22 L= inner(sigma(s), grad(w)) *dx- dot(b, w) *dx- dot(h, w) *ds

Figure 2.4: Definition of the linear and bilinear variational forms for plasticity expressed using UFL syntax. 2.4. Implementations and examples 49 cells, loops over cell quadrature points, and variable updates in addition to defining the linear and bilinear forms of the plasticity problem. The PlasticityProblem is solved by the NewtonSolver like any other NonlinearProblem object as described earlier in this chapter, line 41. After each Newton solve, the history variables are updated by calling the update_variables function before proceeding with the next solution increment, line 44.

2.4.3 Hyperelasticity The construction of a solver for a hyperelastic problem, phrased as a minimisation problem, is now presented and follows the minimisation framework presented in Section 2.1.3. The compressible neo-Hookean model in (2.34) is adopted. The automatic functional differentiation features of UFL permit the solver code to resemble closely the abstract mathematical presentation. Differentiation of forms with respect to functions are handled by the UFL function derivative. For instance, given the potential energy functional Π (u) as a function of the displacements u, the derivative of Π with respect to u in the direction w is given by

dΠ (u + ew) DuΠ (u)[w] := , (2.59) de e=0 which can be implemented in UFL by the expression:

UFL code derivative(Pi, u, w)

If w is a test function, the result from applying the derivative is a linear form, which can be differentiated again to yield a bilinear form as shown in (2.4). Noteworthy in this approach is that it is not necessary to provide an explicit expression for the stress tensor. Changing model is therefore as simple as redefining the stored energy density function Ψ0. A complete hyperelastic solver is presented in Figure 2.6. It corresponds to a problem posed on a unit cube, and loaded by a body force b = (0, 0.5, 0), and 0 − restrained such that u = (0, 0, 0) where x = 0. Elsewhere on the boundary the traction h0 = (0.1, 0, 0) is applied. Continuous, piecewise linear functions for the displacement field are used. The code in Figure 2.6 adopts the same notation used in Sections 2.1.3 and 2.2.3. The problem is posed on the reference domain, and for convenience the subscripts ‘0’ have been dropped in the code. The solver in Figure 2.6 solves the problem using one Newton step. For problems with stronger nonlinearities, perhaps as a result of greater volumetric or surface forcing terms, it may be necessary to apply a pseudo time-stepping approach and solve the problem in number of Newton increments, or it may be necessary to apply a path following solution method. 50 Chapter 2. FEniCS applications to solid mechanics

C++ code 1 // Create mesh and define function spaces 2 UnitCube mesh(4, 4, 4); 3 Plasticity::FunctionSpace V(mesh); 4 Plasticity::BilinearForm::CoefficientSpace_t Vt(mesh); 5 Plasticity::LinearForm::CoefficientSpace_s Vs(mesh); 6 7 // Create functions, forms and attach functions 8 Function u(V); Function tangent(Vt); Function stress(Vs); 9 Plasticity::BilinearForm a(V, V); 10 Plasticity::LinearForm L(V); 11 a.t= tangent; 12 L.s= stress; 13 14 // Young’s modulus and Poisson’s ratio 15 doubleE= 20000.0; double nu= 0.3; 16 17 // Slope of hardening(linear) and hardening parameter 18 doubleE _t(0.1*E); 19 double hardening_parameter=E _t/(1.0-E _t/E); 20 21 // Yield stress 22 double yield_stress= 200.0; 23 24 // Object of class von Mises 25 fenicssolid::VonMises J2(E, nu, yield_stress, hardening_parameter); 26 27 // Create PlasticityProblem 28 fenicssolid::PlasticityProblem nonlinear_problem(a, L, u, tangent, stress, bcs, J2); 29 30 // Create nonlinear solver 31 NewtonSolver nonlinear_solver; 32 33 // Pseudo time stepping parameters 34 doublet= 0.0; double dt= 0.005; doubleT= 0.02; 35 36 // Apply load in steps 37 while (t< T) 38 { 39 // Increment time and solve nonlinear problem 40 t+= dt; 41 nonlinear_solver.solve(nonlinear_problem, *u.vector()); 42 43 // Update variables for next load step 44 nonlinear_problem.update_variables(); 45 }

Figure 2.5: DOLFIN code extract for solving a plasticity problem using the FEniCS Solid Mechanics library. 2.4. Implementations and examples 51

Python code from dolfin import *

# Optimization options for the form compiler parameters["form_compiler"]["cpp_optimize"]= True

# Create mesh and define function space mesh= UnitCube(16, 16, 16) V= VectorFunctionSpace(mesh,"Lagrange", 1)

# Define Dirichlet boundary(x=0) def left(x): return x[0]< DOLFIN _EPS bc= DirichletBC(V, Constant((0.0, 0.0, 0.0)), left)

# Define test and trial functions du, w= TrialFunction(V), TestFunction(V)

# Displacement from previous iteration u= Function(V) b= Constant((0.0,-0.5, 0.0)) # Body force per unit mass h= Constant((0.1, 0.0, 0.0)) # Traction force on the boundary

# Kinematics I= Identity(V.cell().d) # Identity tensor F=I+ grad(u) # Deformation gradient C= F.T *F # Right Cauchy-Green tensor Ic, J= tr(C), det(F) # Invariants of deformation tensors

# Elasticity parameters E, nu= 10.0, 0.3 mu, lmbda=E/(2 *(1+ nu)), E *nu/((1+ nu) *(1-2 *nu))

# Stored strain energy density(compressible neo-Hookean model) Psi= (mu/2) *(Ic- 3)- mu *ln(J)+ (lmbda/2) *(ln(J))**2

# Total potential energy Pi= Psi *dx- dot(b, u) *dx- dot(h, u) *ds

# Compute first variation of Pi(directional derivative aboutu in the direction ofw) F= derivative(Pi, u, w)

# Compute Jacobian ofF dF= derivative(F, u, du)

# Create nonlinear variational problem and solve problem= NonlinearVariationalProblem(F, u, bcs=bc, J=dF) solver= NonlinearVariationalSolver(problem); solver.solve()

Figure 2.6: Complete DOLFIN solver for the compressible neo-Hookean model, formulated as a minimisation problem. 52 Chapter 2. FEniCS applications to solid mechanics

2.4.4 Elastodynamics

As a final example, a linearised elastodynamics problem to illustrate the solution of time-dependent problems is considered. The example is based on the Newmark family of methods, which are widely used in structural dynamics. It is a direct integration method, in which the equations are evaluated at discrete points in time separated by a time increment ∆t. Thus, the time step tn+1 is equal to tn + ∆t. While this section addresses the Newmark scheme, it is straightforward to extend the approach (and implementation) to generalised-α methods (Hilber et al., 1977). The Newmark relations between displacements, velocities and accelerations at tn and tn+1 read:

1    u = u + ∆tu˙ + ∆t2 2βu¨ + 1 2β u¨ , (2.60) n+1 n n 2 n+1 − n  u˙ = u˙ + ∆t γu¨ + (1 γ) u¨ , (2.61) n+1 n n+1 − n where β and γ are scalar parameters. Various well-known schemes are recovered for particular combinations of β and γ. Setting β = 1/4 and γ = 1/2 leads to the trapezoidal scheme, and setting β = 0 and γ = 1/2 leads to a central difference scheme. For β > 0, re-arranging (2.60) and using (2.61) leads to: ! 1 1 u¨ + = (u + un ∆tu˙ n) 1 u¨n, (2.62) n 1 β∆t2 n 1 − − − 2β − ! ! γ γ γ u˙ + = (u + un) 1 u˙ n ∆t 1 u¨n, (2.63) n 1 β∆t n 1 − − β − − 2β − in which un+1 is the only unknown term on the right-hand side. To solve a time dependent problem, the governing equation can be posed at time tn+1,

F (u ; w) = 0 w V, (2.64) n+1 ∀ ∈ with the expressions in (2.62) and (2.63) used for first and second time derivatives of u at time tn+1. The viscoelastic model under consideration is a minor extension of the elasticity model in (2.18). For the viscoelastic model, the stress tensor is given by:  σ = 2µε + λtr(ε) + ηtr(ε˙) I, (2.65) where the constant scalar η 0 is a viscosity parameter. ≥ A simple, but complete, elastodynamics solver is presented in Figures 2.7 and 2.8. The solver mirrors the notation used in (2.62), (2.63) and (2.65), with expressions for the acceleration, velocity and displacement at time tn (a0, v0, u0), and expressions 2.5. Current and future developments 53 for the acceleration and velocity at time tn+1 (a1, v1) in terms of the displacement at tn+1 (u1) and other fields at time tn. For simplicity, the source term b = (0, 0, 0). The body is fixed such that u = (0, 0, 0) at x = 0 and the initial conditions are u0 = v0 = (0, 0, 0). A traction h is applied at x = 1 and is increased linearly from zero to one over the first five time steps. Therefore, no forces are acting on the body at t = 0 and the initial acceleration is zero. Again, the UFL functions lhs and rhs have been used to extract the bilinear and linear terms from the form. This is particularly convenient for time-dependent problems since it allows the code implementation to be posed in the same format as is usually adopted in the mathematical presentation, with the equation of interest posed in terms of fields at some point between times tn and tn+1. The presented solver could be made more efficient by exploiting linearity of the governing equation and thereby re-using the factorisation of the system matrix.

2.5 Current and future developments

In this chapter a range of standard solid mechanics problems have been presented in the context of automated modelling. The implementation of the models was shown to be relatively straightforward due to the high level of abstraction provided in the FEniCS framework. The presented cases cover a range of typical solid mechanics problems that can be solved using FEniCS version 1.0. To broaden the range of problems that can be handled in the FEniCS framework the following two extensions are of particular interest from a solid mechanics viewpoint:

Assembly of forms on manifolds In FEniCS version 1.0, it is assumed that two- dimensional elements, like triangles, are embedded in R2 and three-dimensional elements, like tetrahedra, are embedded in R3. At the time of writing, support for two-dimensional elements embedded in R3 and one-dimensional elements embedded in R2 or R3 is being implemented in the development version of FEniCS. This does, among other things, facilitate the development of support for shell and truss problems within the automated framework.

Isoparametric elements This issue relates to quadrilateral and hexahedral ele- ments, which are currently not supported, and to elements with higher order mappings that allow curved mesh boundaries to be represented.

Finally, to attract more users with a solid mechanics background another exten- sion to consider is improving the interface of the FEniCS Solid Mechanics library to make it more similar to conventional finite element software packages. This involves supplying the users with information like strain, strain rates and possibly gradients of strain at integration point level for the user to formulate the constitutive relation without working with the weak form of the governing equations. 54 Chapter 2. FEniCS applications to solid mechanics

Python code from dolfin import *

# External load class Traction(Expression): def __init__(self, end): Expression.__init__(self) self.t= 0.0 self.end= end

def eval(self, values, x): values[0]= 0.0 values[1]= 0.0 if x[0]> 1.0- DOLFIN _EPS: values[0]= self.t/self.end if self.t< self.end else 1.0

def value_shape(self): return (2,)

def update(u, u0, v0, a0, beta, gamma, dt): # Get vectors(references) u_vec, u0_vec= u.vector(), u0.vector() v0_vec, a0_vec= v0.vector(), a0.vector()

# Update acceleration and velocity a_vec= (1.0/(2.0 *beta))*( (u_vec- u0 _vec- v0 _vec*dt)/(0.5*dt*dt)- (1.0-2.0*beta)*a0_vec )

#v= dt * ((1-gamma)*a0+ gamma *a)+ v0 v_vec= dt *((1.0-gamma)*a0_vec+ gamma *a_vec)+ v0 _vec

# Update(t(n) <--t(n+1)) v0.vector()[:], a0.vector()[:]=v _vec, a_vec u0.vector()[:]= u.vector()

# Load mesh and define function space mesh= UnitSquare(32, 32)

# Define function space V= VectorFunctionSpace(mesh,"Lagrange", 1)

# Test and trial functions u1, w= TrialFunction(V), TestFunction(V)

E, nu= 10.0, 0.3 mu, lmbda=E/(2.0 *(1.0+ nu)), E *nu/((1.0+ nu) *(1.0- 2.0 *nu))

# Mass density and viscous damping coefficient rho, eta= 1.0, 0.2

Figure 2.7: DOLFIN code for solving for a dynamic problem using an implicit Newmark scheme. Program continues in Figure 2.8. 2.5. Current and future developments 55

Python code # Time stepping parameters beta, gamma= 0.25, 0.5 dt= 0.1 t, T= 0.0, 20 *dt

# Fields from previous time step(displacement, velocity, acceleration) u0, v0, a0= Function(V), Function(V), Function(V) h= Traction(T/4.0)

# Velocity and acceleration att _(n+1) v1= (gamma/(beta *dt))*(u1- u0)- (gamma/beta- 1.0) *v0- dt *(gamma/(2.0*beta) - 1.0)*a0 a1= (1.0/(beta *dt**2))*(u1- u0- dt *v0)- (1.0/(2.0 *beta)- 1.0) *a0

# Stress tensor def sigma(u, v): return 2.0*mu*sym(grad(u))+ (lmbda *tr(grad(u))+ eta*tr(grad(v)))*Identity(u.cell().d)

# Governing equation F= (rho *dot(a1, w)+ inner(sigma(u1, v1), sym(grad(w)))) *dx- dot(h, w) *ds

# Extract bilinear and linear forms a, L= lhs(F), rhs(F)

# Set up boundary condition at left end zero= Constant((0.0, 0.0)) def left(x): return x[0]< DOLFIN _EPS bc= DirichletBC(V, zero, left)

# Set up PDE, advance in time and solve u= Function(V) problem= LinearVariationalProblem(a, L, u, bcs=bc) solver= LinearVariationalSolver(problem) # Save solution in VTK format file= File("displacement.pvd") while t<=T: t+= dt h.t=t solver.solve() update(u, u0, v0, a0, beta, gamma, dt) file<

Figure 2.8: Continuation from Figure 2.7 of DOLFIN code extract for solving for a dynamic problem.

3 Representations and optimisations of finite element variational forms

The previous chapter demonstrated that solvers for various solid mechanics prob- lems can be implemented with relatively little effort using an automated modelling approach which relies on the abstractions offered by UFL and the ability of FFC to generate C++ code from the UFL input. For the approach to be competitive with hand written code, it is important that the run-time performance of the correspond- ing low-level code generated from the UFL representation is comparable to that of hand written code. To this end, FFC implements two different types of repre- sentations of finite element tensors, the so-called tensor contraction representation and the classical quadrature-loop representation, including optimisations of both representations. The development of different strategies for representing and optimising fi- nite element variational forms has been motivated by the desire of applying the automated modelling approach to problems of increasing complexity. The first representation available in FFC was the tensor contraction. However, this repre- sentation is not effective for problems like plasticity in Section 2.4.2. This led to the development of a representation based on quadrature which included the opti- misations described in Sections 3.3.1 and 3.3.2. With the availability of automatic differentiation in UFL, problems like hyperelasticity could easily be implemented in the automated framework, Sections 2.2.3 and 2.4.3. For these types of prob- lems, further optimisations of the quadrature representation was necessary for efficient computation. These optimisations are important as FFC will automatically select the quadrature representation for moderately complex and highly complex problems if the representation is not set by the user. The automatic selection of representation is discussed in Section 3.6. Many FEniCS users will, therefore, be using the quadrature representation and optimisations unknowingly, particularly if they work through the Python interface of DOLFIN. This chapter presents the developments in FFC in terms of representations and optimisations for finite element variational forms and is primarily based on the work in Ølgaard and Wells(2009, 2010, 2012b) with the main difference being that 58 Chapter 3. Representations and optimisations of finite element variational forms code examples and results have been updated to be compliant with FEniCS version 1.0. The developments have been applied by researchers and application developers to various problems such as multiphase flow through porous media (Wells et al., 2008), free surface flows (Labeur and Wells, 2009), the Navier–Stokes equations (Mortensen et al., 2011; Labeur and Wells, 2012; Jansson et al., 2011; Selim et al., 2012), fluid structure interaction (Selim, 2012; Hoffman et al., 2013), shape memory alloys (Grandi et al., 2012), electromagnetics (Marchand and Davidson, 2011; Lezar and Davidson, 2012), magnetic fluid hyperthermia for cancer therapy (Miaskowski et al., 2012), oscillatory hydraulic tomography (Saibaba et al., 2012), the Föppl–Von Kármán shell model (Vidoli, 2013), nonlinear elliptic problems (Lakkis and Pryer, 2011), microstructural processes (Maraldi et al., 2011, 2012), mantle convection simulations (Vynnytska et al., 2013, 2012), glacier ice motion (Riesen et al., 2010; Riesen, 2011), PDE-constrained optimisation and optimal control (Brandenburg et al., 2012; Funke and Farrell, 2013; Rosseel and Wells, 2012; Clason and Kunisch, 2012; Rognes and Logg, 2012), Nitsche’s method for overlapping meshes (Massing et al., 2012b,a, 2013), automated modelling of evolving discontinuities (Nikbakht and Wells, 2009; Nikbakht, 2012), liquid crystal elastomers (Luo and Calderer, 2012), and crack propagation in elastomers (Horst et al., 2013).

3.1 Motivation and approach

The tensor contraction representation of element tensors (Kirby and Logg, 2006; Ølgaard et al., 2008a) is based on the multiplicative decomposition of an element tensor into two tensors; one of which depends only on the differential equation and the chosen finite element bases and can be computed prior to run-time. It has been shown for classes of problems that the tensor contraction representation is more efficient than the traditional quadrature approach, and the speed-ups can be dramatic (Kirby and Logg, 2006; Ølgaard and Wells, 2010). Furthermore, strategies which analyse the structure of the tensor contraction representation can yield improved performance (Kirby et al., 2005, 2006). However, in contrast to the quadrature-loop approach, the tensor contraction representation is somewhat specialised as it cannot be extended trivially to non-affine isoparametric mappings while maintaining efficiency, and it is not effective for classes of nonlinear problems which require the integration of functions that do not come from a finite element space (Ølgaard et al., 2008b). The attractive feature of the approach is the run-time performance for classes of problems. A general experience is that the tensor contraction approach does not scale well when forms become more complicated. This is manifest in three ways: the time required to generate low-level code for a variational form becomes prohibitive or 3.1. Motivation and approach 59 may fail due to memory limitations or limitations of underlying libraries1; the size of the generated code is such that the compilation of the generated low-level code is prohibitively slow and file size limitations of compilers acting on the low-level code may be exceeded; and the run-time performance deteriorates rapidly relative to a quadrature approach. Complicated forms are by no means exotic. Many common nonlinear equations, when linearised, result in forms which involve numerous function products. Factors that determine the complexity of a form are the number of coefficient functions, the number of derivatives and the polynomial orders of the finite element basis functions. Approaches to reduce the time required for the code generation phase when using the tensor contraction representation have been developed and implemented in FFC (Kirby and Logg, 2007), although these cannot mitigate the inherently expensive nature of the approach for complicated forms. Using a quadrature representation for more complicated forms mitigates the problems regarding the time required to generate the code and the file size of the generated code. However, a naive implementation of the quadrature representation can have a serious impact on the run-time performance of the generated code. Fortunately, the automated generation of computer code provides scope for various optimisations to be applied such that optimal or near-optimal run-time performance is maintained also for complex forms. The optimisations that have been developed in this work are discussed in Section 3.3, see also Ølgaard and Wells(2010, 2012b). To demonstrate the issues pertinent to automated code generation for compli- cated forms this chapter presents the tensor contraction representation and the quadrature representations, and discusses four optimisation strategies for the latter for run-time performance of the generated code. Adopting the approach in Øl- gaard and Wells(2010), the two representations are then compared to each other by considering

1. The run-time performance of the generated code;

2. The size of the generated code; and

3. The speed of the code generation phase.

The relative importance of these points may well shift during a development cycle. During initial development, it is likely that the speed of the code generation phase and the size of the generated code are most important, whereas at the end of the development cycle run-time performance is likely to be the most crucial consideration. However, there is typically a correlation between the three points. After comparing the two representations, the four optimisations for the quadrature representation are compared to each other in terms of run-time performance.

1For instance, the implementation of the tensor contraction representation in FFC relies on the Python module NumPy (http://www.numpy.org/) for computations involving n-dimensional arrays. The maximum dimension which is allowed is version specific, but for NumPy version 1.6.2 nmax = 32. 60 Chapter 3. Representations and optimisations of finite element variational forms

It should be noted that the presented representations and optimisation tech- niques are possible to implement with conventional ‘hand’ coding. Automation, however, makes the approach generic and allows the application of these simple but tedious to implement by hand strategies to an unlimited range of problems. Auto- mated code generation is most appealing when considering complicated variational forms for which the strategies could not be reasonably expected of a developer to program by hand.

3.2 Representation of finite element tensors

The bilinear form for the weighted Laplace operator (w u), where u is −∇ · ∇ unknown and w is a prescribed coefficient is chosen as a canonical example to illustrate the two different representations and the optimisations implemented in FFC. The bilinear form for this operator reads Z a (u, v) := w u v dx. (3.1) Ω ∇ · ∇

The quadrature approach can deal with cases in which not all functions come from a finite element space including nonlinear functions like ln, exp, sin etc., using ‘quadrature functions’ (see Section 2.3.2) that can be evaluated directly at quadrature points. The tensor representation approach only supports cases in which all functions come from a finite element space (using interpolation if necessary). Therefore, to ensure a proper performance comparison between the representations, it is assumed in this chapter that all functions in a form, including coefficient functions, come from a finite element function space. In the case of (3.1), all functions will come from n o V := v H1 (Ω) : v P (T) T , (3.2) h ∈ |T ∈ q ∀ ∈ Th where Pq (T) denotes the space of Lagrange polynomials of degree q on the element n o T of the standard triangulation of Ω, which is denoted by . Letting φT denote Th i the local finite element basis that spans the discrete function space Vh on T, the local element tensor for an element T can be computed as Z T T AT,i = w φi φi dx, (3.3) T ∇ 1 · ∇ 2 where i = (i1, i2) is a multi-index. The UFL input for (3.1) is shown in Figure 3.1 for continuous piecewise linear functions on triangles as a basis for all functions in the form. 3.2. Representation of finite element tensors 61

UFL code element= FiniteElement("Lagrange", triangle, 1)

u= TrialFunction(element) v= TestFunction(element) w= Coefficient(element)

a=w *inner(grad(u), grad(v))*dx

Figure 3.1: UFL input for the weighted Laplacian form on linear triangular elements.

3.2.1 Quadrature representation

FFC generates an intermediate representation of the UFL input in Figure 3.1 as explained in Section 1.3.3. Assuming a standard affine mapping F : T T from T 0 → a reference element T to a given element T , this intermediate representation 0 ∈ Th reads

N n q AT,i = ∑ ∑ Φα3 (X )wα3 q=1 α3=1 d d q d q ∂Xα1 ∂Φi1 (X ) ∂Xα2 ∂Φi2 (X ) q ∑ ∑ ∑ det FT0 W , (3.4) ∂xβ ∂Xα ∂xβ ∂Xα β=1 α1=1 1 α2=1 2 where a change of variables from the reference coordinates X to the real coordinates x = FT(X) has been used. In the above equation, N denotes the number of integration points, d is the dimension of Ω, n is the number of degrees of freedom for the local basis of w, Φi denotes basis functions on the reference element, det FT0 is the determinant of the Jacobian, and Wq is the quadrature weight at integration point Xq. By default, FFC applies a quadrature scheme that will integrate the variational form exactly. From the intermediate representation in (3.4), code for computing entries of the local element tensor is generated. This code is shown in Figure 3.2. Code generated for the quadrature representation is structured in the following way. First, values of geometric quantities that depend on the current element T, like the components of the inverse of the Jacobian matrix ∂Xα1 /∂xβ and ∂Xα2 /∂xβ, are computed and assigned to the variables like K_01 in the code (this code is not shown as it is not important for understanding the nature of the quadrature representation). Next, values of basis functions and their derivatives at integration q q points on the reference element, like Φα3 (X ) and ∂Φi1 (X )/∂Xα1 are tabulated. Finite element basis functions are computed by FIAT. Basis functions and their derivatives on a reference element are independent of the current element T and 62 Chapter 3. Representations and optimisations of finite element variational forms are, therefore, tabulated at compile-time and stored in the tables Psi_w, Psi_vu_D01 and Psi_vu_D10 in Figure 3.2. After the tabulation of basis function values, the loop over integration points begins. In the example, linear elements are considered, and only one integration point is necessary for exact integration. The loop over integration points has therefore been omitted. The first task inside a loop over integration points is to compute the values of coefficients at the current integration point. For the considered problem, this involves computing the value of the coefficient w. The code for evaluating F0 in Figure 3.2 is an exact translation of the representation n Φ (Xq)w . The last part of the code in Figure 3.2 is the loop ∑α3=1 α3 α3 over the basis function indices i1 and i2, where the contribution to each entry in the local element tensor, AT, from the current integration point is added. The code presented in Figure 3.2 is the default output of the quadrature representation and is not optimised for run-time performance. Optimisation strategies are discussed in Section 3.3. To generate code using the quadrature representation the FFC command-line option -r quadrature should be used.

3.2.2 Tensor contraction representation An alternative to the run-time quadrature approach presented in the previous section is the tensor contraction representation, which is reviewed here by fol- lowing the work of Kirby and Logg(2006). Taking equation (3.4) as the point of departure, the tensor contraction representation of the element matrix for the weighted Laplacian is expressed as

d d n d Z ∂Xα1 ∂Xα2 ∂Φi1 ∂Φi2 AT,i = ∑ ∑ ∑ det FT0 wα3 ∑ Φα3 dX. (3.5) ∂xβ ∂xβ T ∂Xα ∂Xα α1=1 α2=1 α3=1 β=1 0 1 2

Noteworthy is that the integral appearing in equation (3.5) is independent of the cell geometry and can, therefore, be evaluated prior to run-time. The remaining terms, with the exception of wα3 , depend only on the geometry of the cell. Exploiting this observation, the element tensor AT,i can then be expressed as a tensor contraction,

0 α AT,i = ∑ AiαGT, (3.6) α

0 α where the tensors Aiα (the reference tensor) and GT (the geometry tensor) are defined as Z 0 ∂Φi1 ∂Φi2 Aiα = Φα3 dX, (3.7) T0 ∂Xα1 ∂Xα2 d α ∂Xα1 ∂Xα2 GT = det FT0 wα3 ∑ . (3.8) β=1 ∂xβ ∂xβ 3.2. Representation of finite element tensors 63

C++ code virtual void tabulate_tensor(double* A, const double * const * w, const ufc::cell& c) const { ... // Quadrature weight. static const double W1= 0.5;

// Tabulated basis functions at quadrature points. static const double Psi_w[1][3]=\ {{0.33333333333333, 0.33333333333333, 0.33333333333333}}; static const double Psi_vu_D01[1][3]=\ {{-1.0, 0.0, 1.0}}; static const double Psi_vu_D10[1][3]=\ {{-1.0, 1.0, 0.0}};

// Compute coefficient value. double F0= 0.0; for (unsigned intr= 0; r< 3; r++) F0+= Psi _w[0][r]*w[0][r];

// Loop basis functions. for (unsigned intj= 0; j< 3; j++) { for (unsigned intk= 0; k< 3; k++) { A[j*3+k]+= ((K_00*Psi_vu_D10[0][j]+K _10*Psi_vu_D01[0][j])* (K_00*Psi_vu_D10[0][k]+K _10*Psi_vu_D01[0][k])+ (K_01*Psi_vu_D10[0][j]+K _11*Psi_vu_D01[0][j])* (K_01*Psi_vu_D10[0][k]+K _11*Psi_vu_D01[0][k]) )*F0*W1*det; } } }

Figure 3.2: Part of the generated code for quadrature representation of the bilinear form associated with the weighted Laplacian using linear elements in two dimen- sions. The variables like K_00 are components of the inverse of the Jacobian matrix and det is the determinant of the Jacobian. The code to compute these variables is not shown. A holds the values of the local element tensor and w contains nodal values of the weighting function w. 64 Chapter 3. Representations and optimisations of finite element variational forms

During assembly, one may then iterate over all elements of the triangulation α and on each element T compute the geometry tensor GT, compute the tensor contraction (3.6) and then add the resulting element tensor AT,i to the global sparse matrix A. A generalisation of the approach to general multilinear variational forms is presented in Kirby and Logg(2007). The code which FFC will generate from the representation in (3.6) is shown in Figure 3.3. As was the case with the quadrature representation, values of geometric quantities that depend on the current element T are computed first and assigned to the variables like K_01 in the code (again, this code is not shown as it is not important for understanding the nature of the tensor contraction representation). Based on these values, the geometry tensor (3.8) is computed and the contraction in (3.6) is performed using the reference tensor from (3.7) which is precomputed during the code generation stage (the literal constants 0.166667). Notice that the contraction to compute entries in AT,i is unrolled which allows any zero-valued entry of the reference tensor to be detected during the code generation stage and the corresponding code can, therefore, be omitted. For a certain class of simple forms this can lead to a tremendous speed-up when evaluating the element matrices relative to a quadrature approach (Kirby and Logg, 2006). Inevitably, the tensor contraction approach, due to unrolling the contraction, leads to code which is much less compact compared to the quadrature represen- tation (see Figure 3.2). Furthermore, as the number of functions and derivatives present in the variational form increases, the rank of both the reference tensor and the geometry tensor increases, thereby increasing the complexity of the ten- sor contraction. Thus, for complicated forms the size of the generated code may cause problems for the compilers acting on the generated low-level code, and the complexity of the tensor contraction may exceed that of the quadrature representa- tion leading to poor run-time performance. This influence of the complexity on the performance is investigated in Section 3.4. To generate code using the tensor contraction representation the FFC command-line option -r tensor should be used.

3.3 Quadrature optimisations

The automated generation of code provides scope for employing optimisations which may not be practically feasible in hand-generated code. An example of such an approach which is pertinent to the tensor contraction representation involves the 0 analysis of the reference tensor, Aiα, in order to find so-called complexity-reducing relations between subtensors which will minimise the number of floating point operations required to compute the element tensor (Kirby et al., 2005, 2006; Kirby and Logg, 2008). For simple problems, this can lead to a significant reduction in the number of operations required to compute the local element tensor, AT,i. However, 3.3. Quadrature optimisations 65

C++ code virtual void tabulate_tensor(double* A, const double * const * w, const ufc::cell& c) const { ... // Compute geometry tensor const double G0_0_0_0= det *(w[0][0]*((K_00*K_00+K _01*K_01))); const double G0_0_0_1= det *(w[0][0]*((K_00*K_10+K _01*K_11))); const double G0_0_1_0= det *(w[0][0]*((K_10*K_00+K _11*K_01))); const double G0_0_1_1= det *(w[0][0]*((K_10*K_10+K _11*K_11))); const double G0_1_0_0= det *(w[0][1]*((K_00*K_00+K _01*K_01))); const double G0_1_0_1= det *(w[0][1]*((K_00*K_10+K _01*K_11))); const double G0_1_1_0= det *(w[0][1]*((K_10*K_00+K _11*K_01))); const double G0_1_1_1= det *(w[0][1]*((K_10*K_10+K _11*K_11))); const double G0_2_0_0= det *(w[0][2]*((K_00*K_00+K _01*K_01))); const double G0_2_0_1= det *(w[0][2]*((K_00*K_10+K _01*K_11))); const double G0_2_1_0= det *(w[0][2]*((K_10*K_00+K _11*K_01))); const double G0_2_1_1= det *(w[0][2]*((K_10*K_10+K _11*K_11)));

// Compute element tensor A[0]= 0.166667 *G0_0_0_0+ 0.166667 *G0_0_0_1+ 0.166667 *G0_0_1_0+ 0.166667*G0_0_1_1+ 0.166667 *G0_1_0_0+ 0.166667 *G0_1_0_1+ 0.166667*G0_1_1_0+ 0.166667 *G0_1_1_1+ 0.166667 *G0_2_0_0+ 0.166667*G0_2_0_1+ 0.166667 *G0_2_1_0+ 0.166667 *G0_2_1_1; A[1]=-0.166667 *G0_0_0_0- 0.166667 *G0_0_1_0- 0.166667 *G0_1_0_0- 0.166667*G0_1_1_0- 0.166667 *G0_2_0_0- 0.166667 *G0_2_1_0; A[2]=-0.166667 *G0_0_0_1- 0.166667 *G0_0_1_1- 0.166667 *G0_1_0_1- 0.166667*G0_1_1_1- 0.166667 *G0_2_0_1- 0.166667 *G0_2_1_1; A[3]=-0.166667 *G0_0_0_0- 0.166667 *G0_0_0_1- 0.166667 *G0_1_0_0- 0.166667*G0_1_0_1- 0.166667 *G0_2_0_0- 0.166667 *G0_2_0_1; A[4]= 0.166667 *G0_0_0_0+ 0.166667 *G0_1_0_0+ 0.166667 *G0_2_0_0; A[5]= 0.166667 *G0_0_0_1+ 0.166667 *G0_1_0_1+ 0.166667 *G0_2_0_1; A[6]=-0.166667 *G0_0_1_0- 0.166667 *G0_0_1_1- 0.166667 *G0_1_1_0- 0.166667*G0_1_1_1- 0.166667 *G0_2_1_0- 0.166667 *G0_2_1_1; A[7]= 0.166667 *G0_0_1_0+ 0.166667 *G0_1_1_0+ 0.166667 *G0_2_1_0; A[8]= 0.166667 *G0_0_1_1+ 0.166667 *G0_1_1_1+ 0.166667 *G0_2_1_1; }

Figure 3.3: Part of the generated code for tensor contraction representation of the bilinear form associated with the weighted Laplacian using linear elements in two dimensions. The variables like K_00 are components of the inverse of the Jacobian matrix and det is the determinant of the Jacobian. The code to compute these variables is not shown. A holds the values of the local element tensor and w contains nodal values of the weighting function w. Due to space considerations the number of digits of the literal constant 0.166667 has been reduced from fifteen to six. 66 Chapter 3. Representations and optimisations of finite element variational forms when dealing with complicated, or even moderately complicated, variational for- mulations the experience that one is not generally well-rewarded for sophisticated optimisation strategies is not uncommon. Such strategies may not scale well in terms of the required computer time to perform the optimisations for moderately complex variational forms and prove to be prohibitive in terms of time and memory. Experience indicates that simple optimisations, some of which are described in this section, offer the greatest rewards, even to the extent that the cost of evaluating element tensors becomes negligible relative to other aspects of a computation, such as insertion of entries into a sparse matrix. This section discusses four automated a priori optimisation strategies, eliminate operations on zeros, simplify expressions, precompute integration point constants and precompute basis constants, that have been developed for the quadrature representa- tion from Section 3.2.1 for improved run-time performance of the generated code. The underlying philosophy of the optimisation strategies, which are implemented in FFC, is to manipulate the representation in such a way that the number of operations to compute the local element tensor decreases. Each strategy described in the following sections, with the exception of eliminate operations on zeros, share some features which can be categorised as:

Loop invariant code motion This procedure seeks to identify terms that are inde- pendent of one or more of the summation indices and to move them outside the loop over those particular indices. For instance, in (3.4) the terms regard- q ing the coefficient w, the quadrature weight W and the determinant det FT0 are all independent of the basis function indices i1 and i2 and therefore only need to be computed once for each integration point. A generic discussion of this technique, which is also known as ‘loop hoisting’, can be found in Alfred et al.(1986).

Reuse common terms Terms that appear multiple times in an expression can be identified, computed once, stored as temporary values and then reused in all occurrences in the expression. This can have a great impact on the operation count since the expression to compute an entry in AT is located inside loops over the basis function indices as shown in the code for the standard quadrature representation in Figure 3.2.

The optimisations described in this section take place after the representation stage of the code generation process (see Figure 1.3 on page 13) where any given form is represented as simple loop and algebra instructions. Therefore, the opti- misations are general and apply to all forms and elements that can be handled by FFC. While the above optimisations are straightforward for simple forms and ele- ments, their implementation using conventional programming approaches requires manual inspection of the form and the basis. This is often done in specialised codes, but the extension to non-trivial forms is difficult, time consuming and error 3.3. Quadrature optimisations 67 prone. Furthermore, the optimised code may bear little relation to the mathematical problem at hand. This makes maintenance and re-use of the hand-generated code problematic. To switch on optimisation the command-line option -O should be used in addition to any of the FFC optimisation options presented in the following sections.

3.3.1 Eliminate operations on zeros Some basis functions, in particular those concerning mixed elements, and deriva- tives of basis functions may be zero-valued at all integration points for a particular problem. Since these values are tabulated at compile-time, the columns containing nonzero values can be identified. This enables a reduction in the loop dimension for indices concerning these tables, a process which is comparable to dead-code elimination in compiler jargon. However, a consequence of reducing the tables is that a mapping of indices must be created in order to access values correctly. The mapping results in memory not being accessed contiguously at run-time and can lead to a decrease in run-time performance. In some cases the elimination of operations on zero terms is similar to the strategy that the tensor contraction representation applies when unrolling the code as shown in Figure 3.3. The major difference being that the quadrature representation can only eliminate contributions that are zero for all quadrature points, unlike the tensor contraction representation which can eliminate all zero-valued contributions. The unrolled tensor contraction code is, however, longer which introduces some drawbacks, such as increased C++ compile-time as discussed previously. To generate code with this optimisation, the FFC command-line option -f eliminate_zeros should be used. Code for the weighted Laplace equation gener- ated with this option is shown in Figure 3.4. For brevity, only code different from the standard quadrature code in Figure 3.2 has been included. As seen in Figure 3.4, the loop dimension for the loops involving the indices j and k has decreased from three to two due to the elimination of zeros when compared to the code standard quadrature code in Figure 3.2. However, the total number of operations has increased. The reason is that the mapping causes four entries to be computed at the same time inside the loop, and the code to compute each entry has not been reduced significantly if compared to the code in Figure 3.2. In fact, using this optimisation strategy by itself is usually not recommended, but in combination with the strategies outlined in the following sections it can improve run-time performance significantly. This effect is particularly pronounced when forms contain mixed elements in which many of the values in the basis function tables are zero. Another reason for being careful when applying this strategy is that it might prevent FFC compilation due to hardware limitations because the increase in the number of entries, which is computed inside the loop, will require more memory during the compilation. 68 Chapter 3. Representations and optimisations of finite element variational forms

C++ code // Tabulated basis functions. static const double Psi_vu[1][2]= {{-1.0, 1.0}};

// Arrays of nonzero columns. static const unsigned int nzc0[2]= {0, 2}; static const unsigned int nzc1[2]= {0, 1};

// Loop basis functions. for (unsigned intj= 0; j< 2; j++) { for (unsigned intk= 0; k< 2; k++) { A[nzc0[j]*3+ nzc0[k]]+= (K_10*Psi_vu[0][j]*K_10*Psi_vu[0][k]+ K_11*Psi_vu[0][j]*K_11*Psi_vu[0][k])*F0*W1*det; A[nzc0[j]*3+ nzc1[k]]+= (K_11*Psi_vu[0][j]*K_01*Psi_vu[0][k]+ K_10*Psi_vu[0][j]*K_00*Psi_vu[0][k])*F0*W1*det; A[nzc1[j]*3+ nzc0[k]]+= (K_00*Psi_vu[0][j]*K_10*Psi_vu[0][k]+ K_01*Psi_vu[0][j]*K_11*Psi_vu[0][k])*F0*W1*det; A[nzc1[j]*3+ nzc1[k]]+= (K_01*Psi_vu[0][j]*K_01*Psi_vu[0][k]+ K_00*Psi_vu[0][j]*K_00*Psi_vu[0][k])*F0*W1*det; } }

Figure 3.4: Part of the generated code for the weighted Laplacian using linear elements in two dimensions with optimisation option -f eliminate_zeros. The arrays nzc0 and nzc1 contain the nonzero column indices for the mapping of values. Note how eliminating zeros makes it possible to replace the two tables with derivatives of basis functions Psi_vu_D01 and Psi_vu_D10 from Figure 3.2 with one table (Psi_vu). 3.3. Quadrature optimisations 69

3.3.2 Simplify expressions

The code expressions to evaluate an entry in the local element tensor can become very complex. Since such expressions are typically located inside loops, a reduction in complexity can reduce the total operation count significantly. The approach can be illustrated by the expression x(y + z) + 2xy, which after expansion of the first term and grouping common terms reduces to x(y + z) + 2xy xy + xz + 2xy → → 3xy + xz. As x appears in both products in the sum a reduction of one operation can be achieved by moving x outside parenthesis 3xy + xz x(3y + z). By applying → these simplifications, the number of operations has been reduced from five to three which may seem trivial although it is, in fact, a reduction of 40%. The algorithm developed and implemented in FFC to perform simplifications as described above, bears resemblance to the algorithm presented by Hosangadi et al.(2006) and later extended and applied to optimised code generation for finite element assembly by Russell and Kelly(2013). An additional benefit of this strategy is that the expansion of expressions, which take place before the simplification, will typically allow more terms to be precomputed and hoisted outside loops, as explained in the beginning of this section. The FFC command-line option -f simplify_expressions should be used to generate code with this optimisation enabled. Code generated by this option for the representation in (3.4) is presented in Figure 3.5, where again only code different from that in Figure 3.2 has been included. The number of operations has decreased compared to the code in Figure 3.2 for the standard quadrature representation. An improvement in run-time performance can therefore be expected. To understand how the optimisations lead to the code in Figure 3.5, consider the terms d d q d q ∂Xα1 ∂Φi1 (X ) ∂Xα2 ∂Φi2 (X ) q ∑ ∑ ∑ det FT0 W , (3.9) ∂xβ ∂Xα ∂xβ ∂Xα β=1 α1=1 1 α2=1 2 in the representation (3.4) for the weighted Laplace equation. These terms are transformed by FFC into an expression equivalent to the code

C++ code ((K_00*Psi_vu_D10[0][j]+K _10*Psi_vu_D01[0][j])* (K_00*Psi_vu_D10[0][k]+K _10*Psi_vu_D01[0][k])+ (K_01*Psi_vu_D10[0][j]+K _11*Psi_vu_D01[0][j])* (K_01*Psi_vu_D10[0][k]+K _11*Psi_vu_D01[0][k]) )*W1*det; which is, apart from a missing F0, identical to the standard quadrature code inside the loops in Figure 3.2. This expression is then expanded into a new expression, a sum of products, equivalent to the code 70 Chapter 3. Representations and optimisations of finite element variational forms

C++ code // Geometry constants. doubleG[3]; G[0]= W1 *det*(K_00*K_00+K _01*K_01); G[1]= W1 *det*(K_00*K_10+K _01*K_11); G[2]= W1 *det*(K_10*K_10+K _11*K_11);

// Integration point constants. doubleI[3]; I[0]= F0 *G[0]; I[1]= F0 *G[1]; I[2]= F0 *G[2];

// Loop basis functions. for (unsigned intj= 0; j< 3; j++) { for (unsigned intk= 0; k< 3; k++) { A[j*3+k]+= (Psi_vu_D10[0][j]*Psi_vu_D10[0][k]*I[0]+ Psi_vu_D10[0][j]*Psi_vu_D01[0][k]*I[1]+ Psi_vu_D01[0][j]*Psi_vu_D10[0][k]*I[1]+ Psi_vu_D01[0][j]*Psi_vu_D01[0][k]*I[2]); } }

Figure 3.5: Part of the generated code for the weighted Laplacian using linear elements in two dimensions with optimisation option -f simplify_expressions. 3.3. Quadrature optimisations 71

C++ code K_00*K_00*W1*det*Psi_vu_D10[0][j]*Psi_vu_D10[0][k]+ K_00*K_10*W1*det*Psi_vu_D10[0][j]*Psi_vu_D01[0][k]+ K_00*K_10*W1*det*Psi_vu_D01[0][j]*Psi_vu_D10[0][k]+ K_10*K_10*W1*det*Psi_vu_D01[0][j]*Psi_vu_D01[0][k]+ K_01*K_01*W1*det*Psi_vu_D10[0][j]*Psi_vu_D10[0][k]+ K_01*K_11*W1*det*Psi_vu_D10[0][j]*Psi_vu_D01[0][k]+ K_01*K_11*W1*det*Psi_vu_D01[0][j]*Psi_vu_D10[0][k]+ K_11*K_11*W1*det*Psi_vu_D01[0][j]*Psi_vu_D01[0][k];

In the next step of the optimisation process, identical terms depending on the loop indices j and k are identified and grouped such that the expression is equivalent to

C++ code (K_00*K_00*W1*det+K _01*K_01*W1*det)*Psi_vu_D10[0][j]*Psi_vu_D10[0][k]+ (K_00*K_10*W1*det+K _01*K_11*W1*det)*Psi_vu_D10[0][j]*Psi_vu_D01[0][k]+ (K_00*K_10*W1*det+K _01*K_11*W1*det)*Psi_vu_D01[0][j]*Psi_vu_D10[0][k]+ (K_10*K_10*W1*det+K _11*K_11*W1*det)*Psi_vu_D01[0][j]*Psi_vu_D01[0][k]; where the terms in parentheses only depend on geometry information. The terms in parentheses can, therefore, be moved outside of the loops over the basis function indices j and k and stored in the array G. During the process of generating values for G, FFC will discover that two of the four parentheses are identical and thus only three unique values in G are computed. The expressions to compute the values in G have been simplified further by moving the variables det and W1, that appear in both products, outside the parentheses as seen in Figure 3.5. The weighting coefficient F0 (left out of the detailed explanation above) will generally depend on the integration point. Therefore, each value in G is multiplied by F0 and the result is stored in the array I which contain values that are constant inside the loop over integration points. The optimisation described above is the most expensive of the quadrature optimisations to perform in terms of FFC code generation time and memory consumption as it involves creating new terms when expanding the expressions. The procedure does not scale well for complex expressions, but it is in many cases the most effective approach in terms of reducing the number of operations. This particular optimisation strategy, in combination with the elimination of zeros outlined in the previous section, was the first to be implemented in FFC. It has been investigated and compared to the tensor representation in Ølgaard and Wells (2010).

3.3.3 Precompute integration point constants The optimisations described in the previous section are performed at the expense of increased code generation time. In order to reduce the generation time while 72 Chapter 3. Representations and optimisations of finite element variational forms

C++ code // Geometry constants. doubleG[1]; G[0]= W1 *det;

// Integration point constants. doubleI[1]; I[0]= F0 *G[0];

// Loop basis functions. for (unsigned intj= 0; j< 3; j++) { for (unsigned intk= 0; k< 3; k++) { A[j*3+k]+= ((Psi_vu_D01[0][j]*K_10+ Psi _vu_D10[0][j]*K_00)* (Psi_vu_D01[0][k]*K_10+ Psi _vu_D10[0][k]*K_00)+ (Psi_vu_D01[0][j]*K_11+ Psi _vu_D10[0][j]*K_01)* (Psi_vu_D01[0][k]*K_11+ Psi _vu_D10[0][k]*K_01) )*I[0]; } }

Figure 3.6: Part of the generated code for the weighted Laplacian using linear elements in two dimensions with optimisation option -f precompute_ip_const. achieving a reduction in the operation count, another approach can be taken involving hoisting expressions that are constant with respect to integration points without expanding the expression first. To generate code with this optimisation the FFC command-line option -f precompute_ip_const should be used. Code generated by this method for the representation in (3.4) can be seen in Figure 3.6 which includes only code different from that in Figure 3.2. It is clear from the generated code that this strategy will not lead to a significant reduction in the number of operations for this particular form. The only difference between the code inside the loop in Figure 3.2 and Figure 3.6 is that F0*W1*det has been reduced to I[0] which reduces the number of operations by sixteen (two operations for each of the nine times the loop is executed minus the two operations to compute the I[0] entry). However, for more complex forms, with many coefficients, the number of terms that can be hoisted will increase significantly, leading to improved run-time performance.

3.3.4 Precompute basis constants

This optimisation strategy is an extension of the strategy described in the previous section. In addition to hoisting terms related to the geometry and the integra- 3.3. Quadrature optimisations 73

tion points, values that depend on the basis indices are precomputed inside the loops. This will result in a reduction in operations for cases in which some terms appear frequently inside the loop such that a given value can be reused once computed. To generate code with this optimisation, the FFC command-line option -f precompute_basis_const should be used. Code generated by this method for the representation in (3.4) can be seen in Figure 3.7, where only code that differs from that in Figure 3.6 has been included. Inside the loop, the value of each binary operation is stored in the array B such that it can be reused in subsequent computations. The UFL representation of (3.4), which is the input to FFC, can be viewed as a directed acyclic graph (DAG). When FFC generates code from this input, it uses algorithms from UFL to traverse the DAG such that code to evaluate subexpressions is generated before code to evaluate any expression which depends on these subexpressions. This ensures that values in B are computed in the correct order. In this particular case, no additional reduction in operations has been achieved, if compared to the previous method, since no terms can be reused inside the loop over the indices j and k. However, as the complexity of forms increases so does the scope for reusing terms inside the loop, leading to improved run-time performance.

3.3.5 Further optimisations Preliminary investigations suggest that the performance of the quadrature rep- resentation can be improved by applying two additional optimisations. Looking at the code in Figure 3.7, it is seen that about half of the temporary values in the array B only depend on the loop index j, and they can therefore be hoisted, as has been done for other terms in previous sections. Another approach is to unroll the loops with respect to j and k in the generated code. This will lead to a dramatic increase in the number of values that can be reused, and the approach can be readily combined with all of the other optimisation strategies. However, the total number of temporary values will also increase. Therefore, this optimisation strategy might not be feasible for all forms. FFC implements a few efficient quadrature schemes for integrating polynomi- als of degree less than or equal to six on simplices. For polynomials of degree higher than six, it calls FIAT to compute the quadrature scheme. FIAT supplies schemes that are based on the Gauss–Legendre–Jacobi rule mapped onto simplices (see Karniadakis and Sherwin(2005) for details of such schemes). This means that for integrating a seventh-order polynomial, FFC will use four quadrature points in each spatial direction, that is, 43 = 64 points per cell in three dimensions. A further optimisation of the quadrature representation can thus be achieved by implementing more efficient quadrature schemes for higher order polynomials on simplices since a reduction in the number of integration points will yield improved run-time performance. FFC does, however, provide an option for a user to specify 74 Chapter 3. Representations and optimisations of finite element variational forms

C++ code for (unsigned intj= 0; j< 3; j++) { for (unsigned intk= 0; k< 3; k++) { doubleB[16]; B[0]= Psi _vu_D01[0][j]*K_10; B[1]= Psi _vu_D10[0][j]*K_00; B[2]= (B[0]+B[1]); B[3]= Psi _vu_D01[0][k]*K_10; B[4]= Psi _vu_D10[0][k]*K_00; B[5]= (B[3]+B[4]); B[6]=B[2] *B[5]; B[7]= Psi _vu_D01[0][j]*K_11; B[8]= Psi _vu_D10[0][j]*K_01; B[9]= (B[7]+B[8]); B[10]= Psi _vu_D01[0][k]*K_11; B[11]= Psi _vu_D10[0][k]*K_01; B[12]= (B[10]+B[11]); B[13]=B[12] *B[9]; B[14]= (B[13]+B[6]); B[15]=B[14] *I[0]; A[j*3+k]+=B[15]; } }

Figure 3.7: Part of the generated code for the weighted Laplacian using linear elements in two dimensions with optimisation option -f precompute_basis_const. The array B contain precomputed values that depend on indices j and k. 3.4. Performance comparisons of representations 75

the quadrature degree of a variational form thereby permitting inexact quadrature. For instance, to set the quadrature degree equal to two, the command-line option -f quadrature_degree=2 should be used in which case FFC will use a quadrature rule which is able to integrate a quadratic polynomial exactly. For tetrahedra, this will result in a four point quadrature scheme.

3.4 Performance comparisons of representations

Generated tensor contraction and quadrature-based code is now compared in terms of the metrics outlined in Section 3.1, namely the run-time performance, the size of generated code and the speed of the code generation phase. The aim is to elucidate features of the two representations for various problems with the goal of finding a guiding principle for selecting the most appropriate representation for a given problem. First some typical forms of differing complexity and nature are considered to illustrate some trends and differences between the representations. This leads to a systematic comparison using some very simple forms for which the tensor contraction representation is expected to prove superior, before increasing the complexity of the forms in order to investigate the cross-over point at which the quadrature representation becomes the better representation in terms of run-time performance. Exact quadrature is used for all examples. All tests were performed on an Intel Core i7-2600 CPU at 3.40GHz (8 cores, although tests were run in serial) with 15.7GB of RAM running Ubuntu 12.10 with kernel 3.5.0-23. Python version 2.7.3 and NumPy version 1.6.2 (both pertinent to FFC) is used when generating code, while g++ version 4.7.2 with the ‘-O2 - funroll-loops’ optimisation flags is used to compile the generated C++ code which is compliant with UFC version 2.0.5. DOLFIN version 1.0.0 is used to assemble the global sparse matrix for tests which involve compressed sparse matrices. DOLFIN provides various linear algebra backends, and PETSc (Balay et al., 2001) is used as the backend for the assembly tests. The nonzero structure of the compressed sparse matrix is initialised and no special reordering of degrees of freedom has been used in the assembly tests. Results presented in this section is obtained with FFC version 1.0.0 using the optimisation options -f eliminate_zeros and -f simplify for the quadrature representation.

3.4.1 Performance for a selection of forms The two representations are now compared for three different ‘real’ forms to demonstrate the strengths and weaknesses. The first form considered is a mixed Poisson formulation using fifth-order Brezzi–Douglas–Marini (BDM) elements (Brezzi et al., 1985), automation aspects of which have been addressed by Rognes et al.(2010). The bilinear form, which leads to the finite element stiffness matrix, 76 Chapter 3. Representations and optimisations of finite element variational forms

UFL code BDM= FiniteElement("Brezzi-Douglas-Marini", triangle, 5) DG= FiniteElement("Discontinuous Lagrange", triangle, 5- 1)

mixed_element= BDM *DG

(sigma, u)= TrialFunctions(mixed _element) (tau, w)= TestFunctions(mixed _element)

a=(dot(sigma, tau)-u *div(tau)+ div(sigma) *w)*dx

Figure 3.8: UFL code for the stiffness matrix of the mixed Poisson problem in (3.10) using BDM elements of order five. for the mixed Poisson problem reads Z a(σ, u; τ, w) := σ τ u ( τ) + ( σ) w dx, (3.10) Ω · − ∇ · ∇ · where τ, σ V, w, u W and ∈ ∈  V := τ H (div, Ω) : τ BDM (T) T , (3.11) ∈ |T ∈ k ∀ ∈ Th n 2 o W := w L (Ω) : w T Pk 1 (T) T h . (3.12) ∈ | ∈ − ∀ ∈ T The UFL code for this form with k = 5 is shown in Figure 3.8. The generation of code for a discontinuous Galerkin formulation of the bihar- monic equation with Lagrange basis functions which involves both cell and interior facet integrals (Ølgaard et al., 2008a) is also considered. The bilinear form for this problem reads

Z Z Z a (u, v) := 2u 2v dx 2u v ds u 2v ds Ω ∇ ∇ − Γ0 h∇ i · J∇ K − Γ0 J∇ K · h∇ i Z α + u v ds, (3.13) Γ0 h J∇ K · J∇ K where the functions u, v V and ∈ n o V := v H1 (Ω) : v P (T) T , (3.14) ∈ 0 T ∈ k ∀ ∈ Th and Γ0 denotes the set of interior facets, α > 0 is a penalty parameter and h is a measure of the cell size. See Section 4.2.4 for more details. The UFL code for this bilinear form for the case k = 3 is shown in Figure 3.9. The third example is a complicated form which has arisen in modelling temperature-dependent 3.4. Performance comparisons of representations 77

UFL code element= FiniteElement("Lagrange", triangle, 3) u= TrialFunction(element) v= TestFunction(element)

n= VectorConstant(element.cell()) h= Constant(element.cell()) h_avg= 0.5 *(h(’+’)+ h(’-’))

alpha= 10.0

a= inner(div(grad(u)), div(grad(v))) *dx \ - inner(avg(div(grad(u))), jump(grad(v), n))*dS \ - inner(jump(grad(u), n), avg(div(grad(v))))*dS \ + alpha*h_avg*inner(jump(grad(u), n), jump(grad(v),n))*dS

Figure 3.9: UFL code for the stiffness matrix of a discontinuous Galerkin for- mulation for the biharmonic equation using two-dimensional elements of order three (3.13). multiphase flow through porous media (Wells et al., 2008). It comes from the approximate linearisation of a stabilised finite element formulation for a particular problem and is characterised by standard Lagrange basis functions of low order but the products of many functions from a number of different spaces. The physical significance of the equation is unimportant in the context of this work, therefore it is presented in an abstract form. The bilinear form reads:

2 ! Z  a(p, q) := f0g2g3g4 pq (1 g5) ∑ giui p q Ω − − i=0 · ∇ 2 ! 2 !   g6(1 g5) ∑ f2i+1 p q + f0g2g3g4 p g7 ∑ giui q − − i=0 ∇ · ∇ i=0 · ∇ 2 ! 2 ! (1 g5) ∑ giui p g7 ∑ giui q − − i=0 · ∇ i=0 · ∇ 2 ! 2 !  2 g6(1 g5) ∑ f2i+1 p g7 ∑ giui q dx, (3.15) − − i=0 ∇ i=0 · ∇ where the test and trial functions q, p V with ∈ n o V := v H1 (Ω) : v P (T) T , (3.16) ∈ T ∈ 2 ∀ ∈ Th and the functions f V , g V and u V are coefficient functions. The i ∈ f i ∈ g i ∈ u 78 Chapter 3. Representations and optimisations of finite element variational forms

UFL code scalar_p= FiniteElement("Lagrange", triangle, 2) scalar= FiniteElement("Lagrange", triangle, 1) dscalar= FiniteElement("Discontinuous Lagrange", triangle, 0) vector= VectorElement("Discontinuous Lagrange", triangle, 1)

p= TrialFunction(scalar _p) q= TestFunction(scalar _p)

f0, f1, f2, f3, f4, f5, f6=[Coefficient(scalar) for i in range(7)] g0, g1, g2, g3, g4, g5, g6, g7=[Coefficient(dscalar) for i in range(8)] u0, u1, u2=[Coefficient(vector) for i in range(3)]

Sgu= g0 *u0+ g1 *u1+ g2 *u2 S= g6 *(1- g5) *(f1+ f3+ f5)

a_0=p *g3*f0*g2*g4*q\ - (1- g5) *inner(Sgu, grad(p))*q\ -S *inner(grad(p), grad(q))

a_1= g3 *f0*g2*g4*p*g7*inner(Sgu, grad(q))\ - (1- g5) *inner(Sgu, grad(p))*g7*inner(Sgu, grad(q))\ +S *div(grad(p))*g7*inner(Sgu, grad(q))

a= (a _0+a _1)*dx

Figure 3.10: UFL code for the ‘pressure equation’ (3.15) in two dimensions. coefficients spaces are: n o V := f H1 (Ω) : f P (T) T , (3.17) f ∈ T ∈ 1 ∀ ∈ Th n o V := g L2 (Ω) : g P (T) T , (3.18) g ∈ T ∈ 1 ∀ ∈ Th    2 2 V := u L2 (Ω) : u P (T) T . (3.19) u ∈ T ∈ 1 ∀ ∈ Th

The coefficient functions are either prescribed or come from the solution of other equations. The UFL input to the compiler for this form is shown in Figure 3.10. Due to the origins of this form, it will informally be denoted as the ‘pressure equation’. The three forms have been compiled with FFC using the tensor contraction and quadrature representations. In Table 3.1, the time required to generate the code, the size of the generated code and the time required to compile the C++ code are reported for each form. Results are presented for the tensor contraction case, together with the ratio of the time/size for the quadrature representation case divided by the time/size required for the tensor contraction representation case, denoted by q/t. In measuring the C++ compile-time and the run-time performance, 3.4. Performance comparisons of representations 79

Form generation [s] q/t size [kB] q/t C++ [s] q/t mixed Poisson 6.3 0.79 4300 0.91 27.2 0.11 DG biharmonic 23.4 0.04 4800 0.07 77.1 0.06 pressure equation 4.0 0.14 5300 0.05 356.0 0.01

Table 3.1: Timings and code size for the compilation phase for the various variational forms. ‘generation’ is the time required by FFC to generate the tensor contraction code; ‘size’ is the size of the generated tensor contraction code; and ‘C++’ is the time required to compile the generated C++ code. The ratio q/t is the ratio between quadrature and tensor contraction representations.

Form flops q/t run-time [s] q/t mixed Poisson 38138 34.26 11.7 17.600 DG biharmonic 37353 1.41 15.3 1.175 pressure equation 271356 0.04 158.8 0.014

Table 3.2: Run-time performance for the various variational forms. the generated code has been compiled against the library DOLFIN. Noteworthy from the results in Table 3.1 is that the generation phase for the quadrature repre- sentation is faster than the tensor contraction representation generation phase for all forms. In all cases the size of the generated quadrature code is smaller than the tensor contraction code, which is reflected in the C++ compile-time. The differences in the C++ compile-time are substantial for all forms (more than a factor of hundred for the pressure equation), which is important during the code development phase with frequent recompilations.2 Timings and operation counts for the three forms are presented in Table 3.2. The number of floating point operations (flops) is defined as the sum of all ‘+’ and ‘ ’ ∗ operators in the code for computing the element matrix. Although multiplications are generally more expensive than additions, this definition provides a good measure for the performance of the generated code. The compound operator ‘+=’ is counted as one operation. For the run-time performance, the time required to compute the local element tensors N times is recorded. The time needed to insert the local tensor into the global sparse matrix is not included. For the mixed Poisson problem N = 5 105 and for the discontinuous Galerkin biharmonic × 2It should be noted that the C++ compile-time reduces substantially for the tensor contraction representation if no g++ optimisations are used (approximately around a factor of ten). The C++ compile-time for the quadrature representation is typically a couple of seconds irrespective of which g++ optimisation option is used. 80 Chapter 3. Representations and optimisations of finite element variational forms

UFL code element= FiniteElement("Lagrange", tetrahedron, 2)

u= TrialFunction(element) v= TestFunction(element)

a=u *v*dx

Figure 3.11: UFL code for the mass matrix in three dimensions with element order q = 2.

problem and the pressure equation N = 1 106. Table 3.2 presents the timings and × operation counts for tensor contraction representation, together with the ratio of the quadrature representation case and the tensor contraction representation case, q/t. The run-time performance is indicative of an aspect of the two representations; there can be significant performance difference depending on the nature of the differential equation. For the mixed Poisson problem, the tensor contraction representation is close to a factor of twenty faster than the quadrature representation, whereas for the pressure equation the quadrature representation is close to a factor of seventy faster than the tensor contraction case. Furthermore, the run-time performance ratio and the flops ratio are in the same order of magnitude suggesting a coupling between the two. This observation of dramatic differences in run-time performance suggests the possibility of devising a strategy for determining the best representation, without generating the code for each case. Such concepts have been successfully developed in digital signal processing (Püschel et al., 2005). For forms with a relatively simple structure, devising such a scheme is straightforward. However, it turns out to be non-trivial for arbitrary forms.

3.4.2 Performance for common, simple forms The performance of the two representations for two canonical examples: the scalar ‘mass’ matrix and the ‘elasticity-like’ stiffness matrix is now investigated. The input for the mass matrix form is shown in Figure 3.11 and the input for the elasticity-like stiffness matrix is shown in Figure 3.12. The performance of the two representations are compared for three-dimensional cases on simplices and for various polynomial orders. Code is generated using FFC, and the number of floating point operations required to form the element matrix for all cases is reported. In addition to reporting the number of floating point operations, the time required to compute the element matrix N times is also presented, which is expected in most cases to be strongly correlated to the floating point operations count. As before, values are reported for the tensor contraction representation case together with the ratio of the quadrature value over the tensor contraction value. 3.4. Performance comparisons of representations 81

UFL code element= VectorElement("Lagrange", tetrahedron, 3)

u= TrialFunction(element) v= TestFunction(element)

def eps(v): return grad(v)+ grad(v).T

a= 0.25 *inner(eps(u), eps(v))*dx

Figure 3.12: UFL code for the elasticity-like matrix in three dimensions with element order q = 3.

The time required for insertion into a sparse matrix, which is independent of the element matrix representation, is also reported. The total assembly time is the ‘run-time’ plus the ‘insertion’ time, which provides a picture of the overall assembly performance. The ratio of the total assembly time for the quadrature representation over the total assembly time for the tensor contraction representation, denoted by aq/at, is also presented. When taking this into account, for some forms the difference in performance between different representations appears less drastic. The various timings for the mass matrix problem are reported in Table 3.3. What is clear from these results is that tremendous speed-ups for computing the element matrices can be achieved using the tensor contraction representation, particularly as the element order is increased. This is perhaps not surprising considering that the geometry tensor for this case is simply a scalar, therefore the entire matrix is essentially precomputed. Also note that the g++ compiler appears to be performing particularly well for the tensor contraction representation in the two cases where q = 2 and q = 3. For the case q = 3, the ratio of flops suggest that the run-time ratio should be around hundred while in fact it is close to 6500. However, as the number of flops increase for the tensor contraction representation this effect disappears and the two ratios become almost equal (compare 365 to 378 for the case q = 4). The effect of the speed-up of computing the element matrix is reduced, however, if the time required to insert terms into a sparse matrix is taken into account. For the case of q = 4, the tensor contraction representation is a factor of 378 faster for computing the element matrix, but when insertion is included an overall speed-up factor of 9.72 is observed. Although this is a substantial speed-up, the efficiency of matrix insertion must be addressed to reap the full benefits of the tensor contraction approach for these types of problems. If in addition the time required to perform the remaining parts of the finite element procedure such as mesh initialisation, application of boundary conditions, and solving the resulting system of equations is taken into account the q/t ratio will become even closer to unity. 82 Chapter 3. Representations and optimisations of finite element variational forms

flops q/t run-time [s] q/t insertion [s] aq/at q = 1 (N = 1 109) 52 3.8 1.05 4.1 21.4 1.03 × q = 2 (N = 1 108) 136 31.0 0.11 764.3 67.1 2.25 × q = 3 (N = 1 108) 316 91.2 0.12 6493.3 362.3 3.15 × q = 4 (N = 1 107) 1260 364.7 3.40 377.9 143.5 9.72 ×

Table 3.3: Timings for the mass matrix in three dimensions for varying polynomial order basis q.

flops q/t run-time [s] q/t insertion [s] aq/at q = 1 (N = 1 107) 2242 0.6 2.47 1.4 10.17 1.09 × q = 2 (N = 1 106) 18046 2.7 4.79 3.2 9.68 1.74 × q = 3 (N = 1 105) 91522 9.5 2.63 10.5 5.08 4.24 × q = 4 (N = 1 104) 321984 16.3 1.13 13.7 1.86 5.79 ×

Table 3.4: Timings for the elasticity-like matrix in three dimensions for varying polynomial order basis q.

The various timings for the elasticity-like stiffness matrix are presented in Table 3.4. Compared to the mass matrix, the differences in performance of the tensor contraction representation relative to quadrature representation are less dramatic, but nonetheless substantial, especially for higher-order functions.

3.4.3 Performance for forms of increasing complexity The complexity of the forms investigated in the previous section is now increased systematically in order to examine under which circumstances the quadrature representation will be more favourable in terms of run-time performance. The comparison is based on the floating point operation count3 and the size of the generated file for a large class of problems. The ‘complexity’ of a variational form is considered to increase when the number of function products increases and when the number of derivatives present increases. Increasing the number of derivatives and/or the numbers of functions appearing in a form leads to higher rank tensors for the tensor contraction representation. Also, increases in the polynomial order of the basis of a coefficient function leads to an increase in complexity of the geometry α tensor GT while increases in the polynomial order of the basis of test and trial 0 functions lead to an increase in complexity of the reference tensor Aiα, see (3.8) and (3.7). Initially, attention is restricted to manipulating the number of function

3While the tables concerning flops and run-time performance in the previous two sections suggest that the flop count is a reasonably good indicator of performance, it is demonstrated in Section 3.5 that this is not always the case. 3.4. Performance comparisons of representations 83

UFL code element= FiniteElement("Lagrange", tetrahedron, 2) element_f= FiniteElement("Lagrange", tetrahedron, 3)

u= TrialFunction(element) v= TestFunction(element)

f= Coefficient(element _f) g= Coefficient(element _f)

a=f *g*u*v*dx

Figure 3.13: UFL code for the mass matrix in three dimensions with with q = 2, premultiplied by two coefficient functions (n f = 2) of order p = 3. multiplications in the forms and the polynomial order of these functions, before introducing products of derivatives. To generate forms of greater complexity than those in the previous section, the mass matrix and elasticity-like variational forms with a Lagrange basis of order q are premultiplied with n f functions of order p. In case of the mass matrix, the modified form reads:   Z n f a (u, v) := ∏ fi uv dx, (3.20) Ω i=1 where the test and trial functions v, u V with ∈ n o V := v H1 (Ω) : v P (T) T , (3.21) ∈ T ∈ q ∀ ∈ Th and f V are coefficient functions with i ∈ f n o V := v H1 (Ω) : v P (T) T . (3.22) f ∈ T ∈ p ∀ ∈ Th An example of UFL code is shown in Figure 3.13 for the mass matrix pre-multiplied by coefficient functions where q = 2, n f = 2 and p = 3. A comparison of the representations for the mass matrix with a different number of premultiplying functions and a range of orders p and q are presented in Table 3.5. In terms of flops, a ratio q/t > 1 indicates that the tensor representation is more efficient while q/t < 1 indicates that the quadrature representation is more efficient. What is clear from Table 3.5 is that with few premultiplying functions, the tensor contraction approach is generally more efficient, even for relatively high order premultiplying functions. The situation changes quite dramatically as the 84 Chapter 3. Representations and optimisations of finite element variational forms

n f = 1 n f = 2 n f = 3 n f = 4 flops q/t flops q/t flops q/t flops q/t p = 1, q = 1 156 1.86 580 1.61 2324 0.49 9492 0.21 p = 1, q = 2 648 7.18 3136 2.44 12512 1.68 52416 0.80 p = 1, q = 3 2700 28.68 12484 12.21 46628 3.29 205716 1.30 p = 1, q = 4 7994 57.62 38058 20.97 155850 5.13 622970 2.04 p = 2, q = 1 360 2.72 3472 0.63 36020 0.39 370020 0.08 p = 2, q = 2 1884 4.10 20236 2.12 203926 0.39 2044176 0.06 p = 2, q = 3 7656 19.95 79936 3.36 766628 0.57 8049636 0.08 p = 2, q = 4 23330 34.23 239550 5.32 2452810 0.78 24548810 0.11 p = 3, q = 1 700 1.93 14020 1.17 288020 0.13 5920020 0.02 p = 3, q = 2 3808 5.75 81136 1.02 1572608 0.09 FFC stopped p = 3, q = 3 14740 10.53 315652 1.39 6380156 0.11 -- p = 3, q = 4 47850 16.78 980010 1.96 19602234 0.14 --

Table 3.5: The number of operations and the ratio between number of operations for the two representations for the mass matrix in three dimensions as a function of different polynomial orders and numbers of functions. number of premultiplying functions increases, and as the polynomial order of the premultiplying functions increases. The cases with numerous premultiplying functions are typical of the Jacobian resulting from the linearisation of a nonlinear differential equation in a practical simulation, and are therefore important. It is also noted that the tensor contraction representation is more efficient for increases in q, however, this effect is less pronounced for the cases where n f > 1 and p > 1. Obviously, the selection of the representation can have a tremendous performance impact. For the most complicated cases where n f = 4, p = 3 and q > 1 FFC was stopped after more than one hour of generating code for the tensor contraction representation. FFC generated the quadrature representation code for all cases in a couple of seconds. Interestingly, for complicated forms the operation count is not always a good indicator of performance. For the three-dimensional mass matrix case with p = 1, q = 4 and n f = 4, it would be expected from the operation count (q/t = 2.04) that the tensor contraction representation would be faster. However, when computing the element tensor 100000 times, a ratio of q/t = 0.81 is observed, meaning that the quadrature representation is faster. Noteworthy for this case is that the size of the generated code for tensor contraction representation is 13 MB, while the size of the generated quadrature code is only 2.4 MB. This size difference leads not only to a significant difference in the C++ compile-time (almost twenty minutes for the tensor contraction code and only two seconds for the quadrature code), but also appears to result in a drop in run-time performance. The performance drop 3.4. Performance comparisons of representations 85

n f = 1 n f = 2 n f = 3 flops q/t flops q/t flops q/t p = 1, q = 1 9928 0.13 42832 0.11 183088 0.03 p = 1, q = 2 80020 0.75 331228 0.51 1154620 0.16 p = 1, q = 3 405064 2.31 1466704 1.02 6806512 0.59 p = 1, q = 4 1426374 9.82 5920974 4.60 23425902 1.17 p = 2, q = 1 24940 0.19 268120 0.06 2758888 0.01 p = 2, q = 2 204760 0.82 2071972 0.14 21617452 0.07 p = 2, q = 3 902188 1.66 10789336 0.72 FFC stopped p = 2, q = 4 3680298 7.43 37846422 1.25 -- p = 3, q = 1 19936 0.29 750880 0.04 21556504 0.01 p = 3, q = 2 367732 0.49 8611804 0.18 FFC stopped p = 3, q = 3 2068552 1.93 43364368 0.31 -- p = 3, q = 4 7366950 3.71 152974350 0.50 --

Table 3.6: The number of operations and the ratio between number of operations for the two representations for the elasticity-like tensor in three dimensions as a function of different polynomial orders and numbers of functions. could be attributed to the increased memory traffic noted by Kirby and Logg(2006). Also, it may be that the compiler is unable to perform effective optimisations on the unrolled code, or that the compiler is particularly effective at optimising the loops in the generated quadrature code. A similar comparison is made for elasticity-like forms and the results are presented in Table 3.6. The trends in this table are similar to those observed for the mass matrix. Again, FFC was stopped after one hour of generating code for a number of the more complex forms when using the tensor contraction representation. Code generation using the quadrature representation completes in a few seconds for all cases. Compared to the mass matrix case, the number of operations has increased significantly which has a big impact on both the FFC generation time and the size of the generated code. As an example, FFC spent 63 minutes generating a file of 2.8 GB for the case where n f = 2, p = 3 and q = 4 for the tensor contraction representation. For the quadrature representation the code was generated in 8.6 seconds and the resulting file size was 9.2 MB.

As seen in Table 3.6, increasing the number of coefficient functions n f in the form clearly works in favor of quadrature representation. For n f = 3 the quadrature representation can be expected to perform best for all values of q and p even though q/t = 1.17 for the case where p = 1 and q = 4. In this specific case the size of the generated code for the tensor contraction representation is 442 MB which will reduce the run-time performance as discussed previously, assuming that g++ is able to compile the code at all. Increasing the polynomial order of the coefficients, 86 Chapter 3. Representations and optimisations of finite element variational forms

UFL code element= VectorElement("Lagrange", triangle, 2) element_f= VectorElement("Lagrange", triangle, 3)

u= TrialFunction(element) v= TestFunction(element)

f= Coefficient(element _f) g= Coefficient(element _f)

a= div(f) *div(g)*inner(grad(u), grad(v))*dx

Figure 3.14: UFL code for the vector-valued Poisson problem in two dimension with with q = 2, premultiplied by the divergence of two vector valued functions (n f = 2) of order p = 3. p, also works in favor of quadrature representation although the effect is less pronounced compared to the effect of increasing the number of coefficients. The tensor representation appears to perform better when the polynomial order of the test and trial functions, q, is increased although the effect is most pronounced when the number of coefficients is low. However, file size considerations, will rule out the tensor contraction representation for a number of forms where, based on the ratio, it would be expected to outperform the quadrature representation. It is more difficult in these cases to make broad generalisation as to the best representation. This again suggests that a method for automatically determining the best representation based on inspection of the form may be interesting. A discussion of such a strategy is, however, postponed until Section 3.6. Finally, the influence of premultiplying a vector-valued Poisson variational form by the divergence of vector-valued functions is investigated. The UFL code for the case n f = 2, p = 3 and q = 2 is shown in Figure 3.14. A comparison of tensor contraction and quadrature representations is performed, as in the previous cases, and the results are shown in Table 3.7. Premultiplying forms with derivatives of functions clearly increases the complexity to such a degree that the tensor contraction representation involves fewer operations for only a very limited number of the considered cases.

3.5 Performance comparisons of quadrature optimisations

In this section the impact of the optimisation strategies, outlined in Section 3.3, on the run-time performance is investigated. The point is not to present a rigorous analysis of the optimisations, but to provide indications as to when the different strategies will be most effective. The performance of the quadrature optimisations 3.5. Performance comparisons of quadrature optimisations 87

n f = 1 n f = 2 flops q/t flops q/t p = 1, q = 1 708 0.29 6148 0.07 p = 1, q = 2 2202 0.90 18394 0.13 p = 1, q = 3 8090 1.48 66394 0.19 p = 1, q = 4 22548 2.53 183892 0.32 p = 2, q = 1 1412 0.16 24580 0.04 p = 2, q = 2 7790 0.52 162766 0.03 p = 2, q = 3 24902 0.57 516606 0.05 p = 2, q = 4 60156 1.27 1246436 0.10 p = 3, q = 1 2116 0.30 96772 0.02 p = 3, q = 2 11862 0.36 545422 0.02 p = 3, q = 3 45086 0.54 1695358 0.03 p = 3, q = 4 110668 1.08 4093924 0.04

Table 3.7: The number of operations and the ratio between number of operations for the two representations for the vector-valued Poisson problem in two dimensions as a function of different polynomial orders and numbers of functions.

will be investigated using two forms, namely the bilinear form for the weighted Laplace equation (3.1), see UFL input in Figure 3.1, and the bilinear form for the Mooney–Rivlin hyperelasticity model from (2.33), page 36, in three dimensions. The UFL input for the hyperelasticity model is seen in Figure 3.15. In both cases quadratic Lagrange finite elements will be used. All tests were performed using the same hardware and software setup as de- scribed in the previous section with the small difference that the g++ compiler options are varied. The two forms are compiled with the different FFC optimi- sations, and the number of floating point operations (flops) to compute the local element tensor is determined. The number of flops is defined as in the previous section, that is, as the sum of all appearances of the operators ‘+’ and ‘*’ in the code. The ratio between the number of flops of the current FFC optimisation and the standard quadrature representation, o/q is also computed. The gener- ated code is then compiled with g++ using four different optimisation options for g++, and the time needed to compute the element tensor N times is measured. In the following, -zeros will be used as shorthand for the -f eliminate_zeros option, -simplify is shorthand for the -f simplify_expressions option, -ip is shorthand for the -f precompute_ip_const option and -basis is shorthand for the -f precompute_basis_const option. The operation counts for the weighted Laplace equation with different FFC optimisations can be seen in Table 3.8, while Figure 3.16 shows the run-time performance for different compiler options for N = 5 107. The FFC compiler × 88 Chapter 3. Representations and optimisations of finite element variational forms

UFL code element= VectorElement("Lagrange", tetrahedron, 2)

w= TestFunction(element) du= TrialFunction(element) u= Coefficient(element) c1= Constant(tetrahedron) c2= Constant(tetrahedron)

I= Identity(3) # Identity tensor F=I+ grad(u) # Deformation gradient C= F.T *F # Right Cauchy--Green tensor

I_C= tr(C) # First invariant ofC II_C= (I _C**2- tr(C *C))/2.0 # Second invariant ofC

# Stored strain energy density(Mooney--Rivlin model) Psi= c1 *(I_C- 3.0)+ c2 *(II_C- 3.0)

Pi= Psi *dx # Potential energy F= derivative(Pi, u, w) # First variation of Pi aboutu in directionw J= derivative(F, u, du) # Jacobian

Figure 3.15: UFL input for the Mooney–Rivlin hyperelasticity model in three dimensions using quadratic elements. It is the bilinear form, the Jacobian J, which is of interest in the performance comparison. 3.5. Performance comparisons of quadrature optimisations 89

FFC optimisation flops o/q None 4176 1.00 -zeros 6672 1.60 -simplify 2712 0.65 -simplify -zeros 1920 0.46 -ip 3756 0.90 -ip -zeros 4290 1.03 -basis 3756 0.90 -basis -zeros 3690 0.88

Table 3.8: Operation counts for the weighted Laplace equation.

options can be seen on the x-axis in the figure and the four g++ compiler options are shown with different colors. The FFC and g++ compile-times were less than one second for all optimisation options. It is clear from Figure 3.16 that run-time performance is greatly influenced by the g++ optimisations. Compared to the case where no g++ optimisations are used (the -O0 flag), the run-time for the standard quadrature code improves by a factor of 4.70 when using the -O2 option, 6.86 when using the -O2 -funroll-loops option and 10.65 when using the -O3 option. The -O3 option does not appear to improve the run-time noticeably beyond the improvement observed for the -O2 -funroll-loops option when the FFC optimisation option -zeros is used. Using the FFC optimisation option -zeros alone for this form does not improve run- time performance. In fact, using this option in combination with any of the other optimisation options increases the run-time, even when combining with the option -simplify, which has a significant lower operation count compared to the standard quadrature representation. A curious point to note is that without g++ optimisation there is a significant difference in run-time for the -ip and -basis options, even though they involve the same number of flops. When g++ optimisations are switched on, this difference is eliminated completely and the run-times for the two FFC optimisations are identical. This suggests that it is not possible to predict run- time performance from the operation count alone since the type of FFC optimisation must be taken into account as well as the intended use of g++ compiler options. The optimal combination of optimisations for this form is FFC option -ip or -basis combined with g++ option -O2 -funroll-loops, in which case the run-time has improved by a factor of 12.3 compared to standard quadrature code with no g++ optimisations. The operation counts and FFC code generation time for the bilinear form for hyperelasticity with different FFC optimisations are presented in Table 3.9, while Figure 3.17 shows the run-time performance for different compiler options for 90 Chapter 3. Representations and optimisations of finite element variational forms

103 -O0 -O2 -O2 -funroll-loops -O3

102 time [s]

101

none -ip -zeros -basis -simplify -ip -zeros -basis -zeros -simplify -zeros

Figure 3.16: Run-time performance for the weighted Laplace equation for different compiler options. The x-axis shows the FFC compiler options, and the colors denote the g++ compiler options.

N = 5 104. Comparing the number of flops involved to compute the element × tensor to the weighted Laplace example, it is clear that this problem is considerably more complex. The FFC code generation times in Table 3.9 show that the -simplify optimisation, as anticipated, is the most expensive to perform. The g++ compile- times for all test cases were less than three seconds for all optimisation options. A point to note is that the scope for reducing the flop count is considerably greater for this problem than for the weighted Laplace problem, with a difference in the number of flops spanning several orders of magnitude between the different FFC optimisations. This compares to a difference in flops of roughly a factor two between the non-optimised and the most effective optimisation strategy for the weighted Laplace problem. In the case where no g++ optimisation is used the run-time performance for the hyperelastic problem can be directly related to the number of floating point operations. When the g++ optimisation -O2 is switched on, this effect becomes less pronounced. Another point to note, in connection with the g++ optimisations, is that switching on additional optimisations beyond -O2 does not seem to provide any further improvements in run-time. For the hyperelasticity example, the option -zeros has a positive effect on the performance, in particular when combined with the -basis and -simplify optimisations. This is in contrast with the weighted Laplace equation. The reason is that the test and trial functions are vector valued rather than scalar valued, which allows more zeros to be eliminated. Finally, it is noted that the -simplify option performs particularly 3.5. Performance comparisons of quadrature optimisations 91

FFC FFC time optimisation [s] o/q flops o/q None 1.8 1.00 56228760 1.000 -zeros 1.8 1.00 38844456 0.691 -simplify 6.9 3.83 3086595 0.055 -simplify -zeros 5.8 3.22 185697 0.003 -ip 2.0 1.11 44310392 0.788 -ip -zeros 2.9 1.61 12562106 0.223 -basis 2.0 1.11 3664392 0.065 -basis -zeros 3.0 1.67 1609430 0.029

Table 3.9: FFC code generation times and operation counts for the hyperelasticity example.

104 -O0 -O2 -O2 -funroll-loops 103 -O3

102 time [s]

101

100

none -ip -zeros -basis -simplify -ip -zeros -basis -zeros -simplify -zeros

Figure 3.17: Run-time performance for the hyperelasticity example for different compiler options. The x-axis shows the FFC compiler options, and the colors denote the g++ compiler options. 92 Chapter 3. Representations and optimisations of finite element variational forms well for this example compared to the weighted Laplace problem. The reason is that the nature of the hyperelasticity form results in a relatively complex expression to compute the entries in the local element tensor. However, this expression only consists of a few different variables (components of the inverse of the Jacobian and basis function values) which makes the -simplify option very efficient since many terms are common and can be precomputed and hoisted. For the hyperelasticity form, the optimal combination of optimisations is FFC option -simplify -zeros and g++ option -O2 -funroll-loops. This combination improves the run-time performance by approximately one order of magnitude compared to all other FFC options when g++ optimisations are included. Compared to the case where no optimisation is used by either FFC or g++, the run-time performance of the code is improved by a factor of 744. For the considered examples, it is clear that no single optimisation strategy is the best for all cases. Furthermore, the generation phase optimisations that one can best use depends on which optimisations are performed by the g++ compiler. It is also very likely that different C++ compilers will give different results for the test cases presented in this section. The general recommendation for selecting the appropriate optimisation for production code will therefore be that the choice should be based on a benchmark program for the specific problem.

3.6 Automatic selection of representation

In this chapter it has been illustrated how the run-time performance of the generated code for variational forms can be improved by using various optimisation options for the FFC and g++ compilers, and by changing the representation of the form. Numerical experiments have shown that the relative run-time performance of the two representations can differ substantially depending on the nature of the considered variational form. In general, the tensor contraction approach deals well with forms which involve high-order bases and few coefficient functions, whereas the quadrature representation is more efficient as the number of coefficient functions (other than constants coefficients) and derivatives in a form increases. Hence, in general the quadrature representation is significantly faster for more complicated forms. In an automated modelling framework, like FEniCS, it seems natural to attempt to select the most favourable representation automatically. When comparing the two representations in Section 3.4 it was found that the operation count is a reasonably good indicator for which form will exhibit the best run-time performance. FFC presently computes the operation count for the code which is generated, on the basis of which a choice could be made, but this involves generating computer code for each case which can be time consuming. Ideally, the form compiler would select the best representation based on an a priori inspection of the form. It turns 3.7. Future optimisations 93 out, however, that this is a non-trivial task if the goal is a general approach which holds for any form which FFC can handle. Furthermore, as it has been shown in the previous section, the code with the lowest number of flops, at least for the quadrature representation, does not always perform best for a given form. Finally, the run-time performance even depends on which g++ options are used. A strategy for selecting between representations based only on an estimation of flops does, therefore, not seem feasible. Choosing the combination of form representation and optimisation options that leads to optimal performance will inevitably require a benchmark study of the specific problem. However, very often many variational forms of varying complexity are needed to solve more complex problems. Setting up benchmarks for all of them is cumbersome and time consuming. Additionally, during the model development stage run-time performance is of minor importance compared to rapid prototyping of variational forms as long as the generated code performs reasonably well. The default behavior of FFC is, therefore, to automatically determine which form representation should be used based on a measure for the cost of using the tensor representation. In short, the cost is simply computed as the maximum value of the sum of the number of coefficients and derivatives present in the monomials representing the form. If this cost is larger than a specified threshold, currently set to three, the quadrature representation is selected. Recall from Table 3.6 that when n f = 3 the flops for quadrature representation was significantly lower for virtually all the test cases. Although this approach may seem ad hoc, it will work well for those situations where the difference in run-time performance is significant. It is important to remember that the generated code is only concerned with the evaluation of the local element tensor and that the time needed to insert the values into a sparse matrix and to solve the system of equations will reduce any difference, particularly for simple forms. Therefore, making a correct choice of representation is less important for forms where the difference in run-time performance is small. A future improvement could be to devise a strategy for also letting the system select the optimisation strategy for the quadrature representation automatically. Regardless of whether it is possible to define an optimal strategy for automatically selecting the representation (and possibly the optimisation), the applicability of automated modelling is definitely extended by having both tensor contraction and quadrature representations, and their optimisations, as part of the computational arsenal.

3.7 Future optimisations

The optimisations proposed in Section 3.3.5 for the quadrature representation are primarily concerned with the run-time performance of the generated code and the 94 Chapter 3. Representations and optimisations of finite element variational forms strategies follow along similar lines as the ones already implemented and discussed in Section 3.3. However, as the number of FEniCS users has increased, so has the complexity of the problems that users are trying to solve. In Section 3.4 it was demonstrated that, for some of the more complicated forms, the tensor contraction representation can take hours to generate code for the given problem and that the size of the generate code can become very large. For very complex forms, typically nonlinear forms that are linearised automatically by UFL, similar trends can be observed also for the quadrature representation. It is, therefore, necessary to develop new strategies for the code generation process to reduce the generation time and the size of the generated code. Two possible approaches that could be investigated are outlined below. Cur- rently, the code to compute derivatives of, for instance, basis functions like the term     d d ∂X /∂x ∂Φ (Xq)/∂X in (3.4) is located inside the loop over ∑β=1 ∑α1=1 α1 β i1 α1 basis function indices j and k, see for instance Figure 3.2. From the UFL input

UFL code element= FiniteElement("Lagrange", triangle, 1) u= TrialFunction(element) v= TestFunction(element) a= inner(grad(u),grad(v)) *dx the generated code for the loop over basis function indices will be

C++ code for (unsigned intj= 0; j< 3; j++) { for (unsigned intk= 0; k< 3; k++) { A[j*3+k]+= (((K _00*FE0_D10[0][j]+K _10*FE0_D01[0][j]))* ((K_00*FE0_D10[0][k]+K _10*FE0_D01[0][k]))+ ((K_01*FE0_D10[0][j]+K _11*FE0_D01[0][j]))* ((K_01*FE0_D10[0][k]+K _11*FE0_D01[0][k])))*W1*det; } } which is almost identical to that in Figure 3.2. However, the only difference between the code to compute the derivative of u and v is the loop index because u and v are defined using the same finite element. Thus precomputing the derivatives outside the loop will lead to a reduction in the code size (and in the number of operations needed). The improved code for the given case would then become:

C++ code double FE0_d0[3]; double FE0_d1[3]; for (unsigned intr= 0; r< 3; r++) { 3.7. Future optimisations 95

FE0_d0[r]= (K _00*FE0_D10[0][r]+K _10*FE0_D01[0][r]); FE0_d1[r]= (K _01*FE0_D10[0][r]+K _11*FE0_D01[0][r]); } for (unsigned intj= 0; j< 3; j++) { for (unsigned intk= 0; k< 3; k++) { A[j*3+k]+= ((FE0 _d0[j]*FE0_d0[k])+ (FE0 _d1[j]*FE0_d1[k]))*W1*det; } }

The drawback of this approach is that the optimisations discussed in Section 3.3, particularly the -f simplify optimisation, could be less effective as fewer common expressions involving the geometry constants like K_00 will be present. To reduce the size of the code even further (and possibly also improve run- time performance), a linear algebra library, for instance Armadillo (http://arma. sourceforge.net/) could be employed to perform block operations using optimised BLAS. The generated code will then become:

C++ code arma::vec FE0_d0(3); arma::vec FE0_d1(3); for (unsigned intr= 0; r< 3; r++) { FE0_d0[r]= (K _00*FE0_D10[0][r]+K _10*FE0_D01[0][r]); FE0_d1[r]= (K _01*FE0_D10[0][r]+K _11*FE0_D01[0][r]); }

arma::mat R= (FE0 _d0*arma::trans(FE0_d0)+ FE0 _d1*arma::trans(FE0_d1))*W1*det;

// Copy values toA double* p= R.memptr(); for (intr=0; r<9; r++) A[r]=p[r];

In the given case, the size of the code has not been reduced significantly. The approach will be particularly effective in situations involving, for instance, the inverse operator in UFL. The inverse operator in UFL (only defined for 1 1, 2 2 × × and 3 3 matrices, is hardcoded as a function of the matrix components. This × leads to a very complex expression inside the loop over basis functions when following the conventional quadrature approach which can be substituted by a simple function call to arma::inv.4 The strategy outlined above could have a negative influence on the run-time performance due to overhead in the linear algebra library or by making it more difficult for the g++ compiler to perform optimisations.

4This approach might not be feasible for linearisations of the inverse when using the automatic differentiation functionality in UFL. 96 Chapter 3. Representations and optimisations of finite element variational forms

As demonstrated in this chapter, having multiple representations and optimi- sations available when considering variational forms of different complexity is an advantage in an automated framework as it is the combination of form com- plexity, FFC optimisations and g++ compiler options that determines the run-time performance of the generated code. Implementing the strategies outlined above will, therefore, extend the applicability of FEniCS to a range of even more complex problems than what can be handled at present. 4 Automation of discontinuous Galerkin methods

Discontinuous Galerkin methods in space have emerged as a generalisation of finite element methods for solving a range of partial differential equations. While histori- cally used for first-order hyperbolic equations, discontinuous Galerkin methods are now applied to a range of hyperbolic, parabolic and elliptic problems. In addition to the usual integration over cell volumes that characterises the conventional finite element method, discontinuous Galerkin methods also involve the integration of flux terms over interior facets. Discontinuous Galerkin methods exist in many vari- ants, and are generally distinguished by the form of the flux on facets. A sample of fluxes for elliptic problems can be found in Arnold et al.(2002). Integration of functions on interior facets and evaluating flux terms, expressed as jumps and averages of quantities of interest, adds complexity to the standard finite element procedure. Therefore, it is obviously desirable to also handle these types of formulations in an automated fashion as this permits the rapid prototyping and testing of new methods. In this chapter the necessary extensions to the FEniCS framework for implementing discontinuous Galerkin formulations are presented. Specifically, new abstractions in UFL, FFC, UFC and DOLFIN are needed in order to handle the automation of the characteristic features of discontinuous Galerkin methods. The extended framework is then demonstrated through a range of common problems, including the Poisson, advection–diffusion, Stokes and biharmonic equations. The presentation of the extensions in this chapter is based on the work in Ølgaard et al.(2008a) 1. Although the functionality is implemented with discontinuous Galerkin meth- ods in mind, it also allows a range of novel finite element methods that draw upon discontinuous Galerkin methods to be handled automatically by the FEniCS framework. These methods may not involve discontinuous function spaces but do involve integration over interior facets. Such examples can be found in Hughes

1In the original paper, the discontinuous Galerkin operators were implemented in the form language of FFC which was later merged into UFL. The code examples from the paper have also been updated to be compliant with FEniCS version 1.0. 98 Chapter 4. Automation of discontinuous Galerkin methods

T− S

T+

+ Figure 4.1: Two cells T and T− sharing a common facet S. et al.(2006); Wells and Dung(2007); Labeur and Wells(2007).

4.1 Extending the framework to discontinuous Galerkin methods

Discontinuous Galerkin methods involve variational forms that include integrals over the interior facets of a finite element mesh. Consider for example the following bilinear form which may appear as a term in a discontinuous Galerkin formulation: Z a (u, v) := ∑ u v ds, (4.1) S Γ SJ KJ K ∈ 0 where Γ denotes the set of all interior facets of the triangulation and v denotes 0 Th the jump in the function value of v across the facet S: J K

+ v = v v−. (4.2) J K − + Here, v and v− denote the values of v on the facet S as seen from the two cells + T and T− incident with S, respectively (see Figure 4.1). Note that each interior + facet is incident to exactly two cells which may be labelled T and T−. The union of these two cells, T = T+ T , will be referred to as the macro cell. ∪ − In order to handle variational forms such as (4.1) in the FEniCS framework, additional functionality is needed in a number of components. Obviously, UFL must be extended to support the definition of integrals over interior facets. These integrals may involve functions which can be evaluated on either of the two cells incident to the interior facet. DOLFIN must be extended to support assembly of multilinear forms containing interior facet integrals which in turn requires the UFC interface to be extended with a new integral class. As UFC is only concerned with the interface of this class, FFC must support code generation for interior facet integrals defined using the UFL syntax. The following sections describe the extensions that have been developed in each of these four components. 4.1. Extending the framework to discontinuous Galerkin methods 99

Mathematical notation UFL notation + f , f − f(’+’), f(’-’) f avg(f) h i f jump(f), jump(f, n) J K Table 4.1: Table of discontinuous Galerkin operators in UFL.

4.1.1 Extending the Unified Form Language As illustrated in (4.1) and (4.2), a central concept of discontinuous Galerkin methods + is the possibility that an expression f has two values, denoted f and f −, on an + interior facet S when it is evaluated based on the two cells T and T− which are incident with S. The UFL notation f(’+’) and f(’-’) is used to restrict an + expression f to T and T− respectively. It is possible to implement a number of common operators for discontinuous Galerkin methods using these simple definitions. For convenience, UFL provides the set of operators presented in Table 4.1 to facilitate compact implementation of these methods. Two typical operators are the average and jump operators, frequently denoted by f and f , h i respectively. The definition of the average operator is f = ( f + + f )/2 whileJ theK h i − definition of the jump operator is, in general, f = f + f as shown in (4.2). For − − convenience, these two operators are availableJ inK UFL by avg(f) and jump(f). It is common to use the outward unit normal, denoted by n, to the interior facet when defining the jump operator such that for a scalar valued expression f = f +n+ + + J +K f −n−, while for a vector or tensor valued expression f = f n + f − n−. + J K · · In both definitions, n and n− denote the outward unit normal to the interior + facet, S, as seen from the two cells T and T− respectively. These two definitions are implemented in a single operator jump(f, n) by letting UFL automatically determine the rank of the expression, f, and return the appropriate definition. It should be pointed out that because UFL is an embedded language and because of the restriction operators f(’+’) and f(’-’) a user can easily implement custom operators for discontinuous Galerkin methods. What remains, in order to express variational forms of the type shown in (4.1), is to define a notation for the interior facet integral. Following the notation for the domain and exterior boundary integrals introduced in Section 1.3.2, the integral over R interior facets I dS is simply written as I * dS(k) where k is the subdomain Γ0,k number and I is a valid UFL expression. The extensions described above facilitate compact implementation of a range of discontinuous Galerkin methods using a syntax which is close to the mathematical notation. As a simple illustration, the bilinear form in (4.1) is represented in UFL by: UFL code a= jump(u) *jump(v)*dS 100 Chapter 4. Automation of discontinuous Galerkin methods

4.1.2 Extending the Unified Form-assembly Code

As DOLFIN relies on the UFC interface when evaluating local finite element tensors, the UFC interface must define the tabulate_tensor function also for interior facet integrals. This function is provided by the class ufc::interior_facet_integral and the interface is

C++ code /// Tabulate the tensor for the contribution froma local interior facet virtual void tabulate_tensor(double* A, const double * const * w, const cell& c0, const cell& c1, unsigned int facet0, unsigned int facet1) const = 0; where A is a pointer to an array which will hold the values of the local element tensor and w contains nodal values of any coefficient functions present in the + integral. The two cells c0 and c1 correspond to the cells T and T− incident with the given facet S while facet0 and facet1 are the local indices of the facet S relative to the cells c0 and c1 respectively. This is illustrated in Figure 1.6b, page 19, where the local facet (edge) index of the shared facet is e0 relative to one cell while it is e2 relative to the other cell. The implication of this aspect is elaborated in the following section.

4.1.3 Extending the FEniCS Form Compiler

FFC must also be extended in order to generate code for the new integral class in UFC to evaluate the local facet tensor. In Section 3.2.2, it was shown how the cell tensor (element tensor) can be computed from the tensor representation

0 α AT,i = ∑ AiαGT. (4.3) α

Similarly, one may use the affine mappings (defined in Section 3.2.1) F + and F to T T− obtain a tensor representation for the interior facet tensor AS. However, depending on the topology of the macro cell T, one obtains different tensor representations. For a triangular mesh, each cell has three facets (edges) and there are thus 3 3 = × 9 different topologies to consider; there are nine different ways in which two edges can meet. Similarly, for a tetrahedral mesh, there are 4 4 = 16 different × topologies to consider. Notice that this is only true because FFC assumes the UFC numbering convention of mesh entities, outlined in Section 1.3.4 and illustrated in Figure 1.6b, which guarantees that two incident simplicial cells always agree on the orientation of an incident facet. If no particular ordering of the mesh entities is assumed, one needs to consider 3 3 2 = 18 different topologies for triangles × × and 4 4 6 = 96 topologies for tetrahedra. This is because there are two different × × ways to superimpose two edges, and there are six different ways to superimpose two faces. The tensor representation for the interior facet tensor can then be written 4.1. Extending the framework to discontinuous Galerkin methods 101 in the form 0, +( ), ( ) = f S f − S α , (4.4) AS,i ∑ Aiα GT(S) α + where f and f − denote the local numbers of the two facets that meet at S relative + α to the two cells T and T− respectively. Note that the geometry tensor GT in (4.3) involves the mapping from the reference cell and differs from the geometry tensor Gα in (4.4), which may involve the mapping from the reference cell and the T(S) + mapping from the reference facet. The reference tensor A0, f , f − is precomputed + for each facet–facet combination ( f , f −) and a run-time decision must be made as to which reference tensor should be contracted with the geometry tensor. The FFC machinery which generates code for each facet–facet combination based on UFL expressions is largely unaffected by the extensions to discontinu- ous Galerkin methods. As a consequence, the quadrature representation can be extended in a similar fashion taking into account the differences between the two representations described in Section 3.2. Furthermore, the optimisations presented in Section 3.3 also apply to variational forms containing interior facet integrals.

4.1.4 Extending DOLFIN

To assemble the global sparse tensor A for variational forms that contain integrals over interior facets as in (4.1), one may extend the standard assembly algorithm over the cells of the computational mesh (see Algorithm1, page 27) by including an iteration over the interior facets of the mesh. The approach is described for the bilinear form in (4.1) where, for ease of notation, it is assumed that u, v V. ∈ Adopting the notation from Section 1.3.5 the tensor A which arises from assembling the bilinear form in (4.1) can be expressed as:

  Z AI = a φI2 , φI1 = ∑ aS φI2 , φI1 = ∑ u v ds, (4.5) S S SJ KJ K

 N where I = (I1, I2) is a multi-index and φk k=1 is a global (possibly discontinuous) basis for V. To assemble the global sparse tensor A efficiently by iterating over the interior facets of the mesh, a local-to-global mapping that maps the basis functions on the local facet S to the set of global basis functions is needed. This mapping is + constructed by considering two cells T and T− sharing a common facet S as n + on n on shown in Figure 4.1. Let φT and φT− denote the local finite element k k=1 k k=1 + basis on T and T− respectively. These local basis functions are now extended to 102 Chapter 4. Automation of discontinuous Galerkin methods the macro cell T by the following construction:

 + +  φT (x) , k = 1, 2, . . . , n, x T ,  k ∈  0, k = 1, 2, . . . , n, x T , φ¯T (x) = ∈ − (4.6) k 0, k = n + 1, n + 2, . . . , 2n, x T+,  ∈  T− φk n (x) , k = n + 1, n + 2, . . . , 2n, x T−. − ∈ + The local basis functions on T and T− are thus extended to T by zero to obtain a n o2n local finite element space, φ¯T , on T of dimension 2n. Recall from Section 1.3.5 k k=1 j that, for each T h, ιT : [1, n] [1, N] denotes the local-to-global mapping for ∈ T → j each discrete function space V . The local-to-global mapping for T (or S), ι , can j T then be obtained by the construction (4.6) such that

j j j j j j j j ι (1) = ι + (1),..., ι (n) = ι + (n), ι (n + 1) = ι (1),..., ι (2n) = ι (n). (4.7) T T T T T T− T T−

The local interior facet tensor AS can now be defined. Consider first the case when ιj is an injective mapping and note that ιj is injective when the ranges of T T j j ι + and ι are disjoint (which is the case for discontinuous elements). Continuing T T− from (4.5), the tensor A can be computed from    ¯T ¯T   AI = aS φI2 , φI1 = aS φ 1 , φ 1 = A 1 1 , (4.8) ∑ ∑ ι− (I2) ι− (I1) ∑ S, ι− (I ),ι− (I2) S S T T S T 1 T where the local interior facet tensor AS is thus defined by

A = a (φ¯T , φ¯T ), (4.9) S,i S i2 i1 where i = (i1, i2) is a multi-index. Note that the size of AS, due to the construction in (4.6) is 2n 2n and not n n as would be the case for a local cell tensor A . × × T Similar to (1.11) and (1.12), the collective local-to-global mapping for each S Γ is ∈ 0 defined as   ι (i) = ι1 (i ) , ι2 (i ) i , (4.10) T T 1 T 2 ∀ ∈ IT where is the index set IT 2  T = ∏[1, 2n] = (1, 1), (1, 2),..., (2n, 2n 1), (2n, 2n) . (4.11) I j=1 −

The global tensor A can now be computed by Algorithm2. Now, if ιj is not injective (two local basis functions are restrictions of the same T global basis function), which may happen if the basis functions are continuous, one 4.2. Examples 103

Algorithm 2 Assembly algorithm over interior facets. A = 0 for S Γ ∈ 0 (1) Compute ιT (2) Compute AS (3) Add AS to A according to ιT: for i T ∈ I + A = A ιT (i) S,i end for end for

may still assemble the global tensor A by Algorithm2 and compute the interior facet tensor according to (4.9). To see this, assume that ι1 (i ) = ι1 (i ) = I for T 1 T 10 1 some i1 = i10 . It then follows that the entry AI ,ι2 (i ) will be a sum of the two terms 6 1 T 2 A , , and A (and possibly other terms). Since a is bilinear, we have S i1 i2 S,i10 ,i2 S     ¯T ¯T ¯T ¯T AS,i ,i + AS,i ,i = aS φ , φ + aS φ , φ 1 2 10 2 i2 i1 i2 i10  T T T   T  = aS φ¯ , φ¯ + φ¯ = aS φ¯ , φI , (4.12) i2 i1 i10 i2 1

¯T ¯T where by the construction (4.6) φI1 is the global basis function that both φ and φ i1 i10 are mapped to. DOLFIN implements Algorithm2 in the assemble function. To compute the _ local contribution aS, DOLFIN calls the tabulate tensor function for interior facet integrals using the interface described in Section 4.1.2. For each discrete function space DOLFIN calls the tabulate_dofs function, see Section 1.3.4, on the cells T+ j and T to construct the local-to-global mapping ι from (4.7). These mappings − T are then used by DOLFIN to construct the collective local-to-global mapping ιT from (4.10).

4.2 Examples

The developments described in the previous section extend the applicability of the FEniCS framework to a new range of problems. In this section, it is demonstrated how the extensions make it possible to apply discontinuous Galerkin formulations to a number of problems. The examples are presented on the usual form: find u V such that ∈ a (u, v) = L (v) v Vˆ , (4.13) ∀ ∈ 104 Chapter 4. Automation of discontinuous Galerkin methods where V is the trial space and Vˆ is the test space, a (u, v) and L (v) denote the bilinear form and linear form respectively. Some of the examples are presented as complete DOLFIN solvers while others only present the UFL input for the bilinear and linear forms of the corresponding problem. For all examples, the test and trial functions are assumed to come from the same function space, that is, Vˆ = V.

4.2.1 The Poisson equation

Consider the function space V, n o V := v L2 (Ω) : v P (T) T , (4.14) ∈ |T ∈ k ∀ ∈ Th where Pk (T) denotes the space of polynomials of degree k on the element T. The bilinear and linear forms for the Poisson equation with homogeneous Dirichlet boundary conditions, enforced in a weak sense, read (Arnold et al., 2002)

Z Z Z a (u, v) := u v dx u v ds u v ds Ω ∇ · ∇ − Γ0 J K · h∇ i − Γ0 h∇ i · J K Z Z Z α Z α u v ds u v ds + u v ds + uv ds (4.15) − ∂ΩJ K · ∇ − ∂Ω ∇ · J K Γ0 h J K · J K ∂Ω h and Z L(v) := f v dx, (4.16) Ω where α > 0 is a penalty parameter and h is a measure for the cell size defined as + + + h = (h + h−)/2 with h and h− denoting the cell size for the two cells, T and T− respectively, incident with the given interior facet. Due to the term involving the penalty parameter α this formulation is commonly referred to as an interior penalty (IP) formulation. The size of a cell is defined here as twice the circumradius. + + The jump and average operators are defined as v = v n + v−n− and J·K+ h·i J K v = ( v + v−)/2 on the set of interior facets, Γ0, and v = vn on ∂Ω. h∇ i ∇ ∇ J K A domain and source term identical to those used in Section 1.3.5 are considered, that is, Ω = [0, 1] [0, 1] and f = 8π2 sin(2πx) sin(2πy). The corresponding × DOLFIN solver for this problem is shown in Figure 4.2 for linear polynomials on triangular elements. Note in particular how the form and syntax of the definitions of the bilinear and linear forms (a and L) resemble closely the mathematical notation in (4.15) and (4.16). Also, note the close resemblance with the code in Figure 1.10, page 25, for the continuous solution. This demonstrates the ease of switching between formulations for a given problem by only changing the definitions of the bilinear and linear forms in the computational setup. The functions FacetNormal and CellSize are convenience functions implemented in DOLFIN according to the definitions above. Because the solution is computed on discontinuous elements it is 4.2. Examples 105 common to project the solution onto a continuous basis for visualisation. This can be accomplished easily by using the project function in DOLFIN. The computed solution for the Poisson problem, projected onto a piecewise linear basis, is seen in Figure 4.3, which is almost identical to the solution presented in Figure 1.9, page 24, for the continuous case.

4.2.2 Steady state advection–diffusion equation Next, the advection–diffusion equation is considered with Dirichlet boundary conditions on inflow boundaries and full upwinding of the advective flux at element facets. Using the same definition of V as in (4.14), the bilinear and linear forms read Z Z Z a (u, v) := (κ u bu) v dx + bu? v ds + bu? v ds Ω ∇ − · ∇ Γ0 · J K ∂Ω · J K Z Z Z κα κ u v ds κ u v ds + u v ds − Γ0 h∇ i · J K − Γ0 J K · h∇ i Γ0 h J K · J K Z α Z Z + uv ds u v ds u v ds (4.17) ΓD h − ΓD J K · ∇ − ΓD ∇ · J K and Z α Z Z L(v) := gv ds g v ds g v ds, (4.18) ΓD h − ΓD J K · ∇ − ΓD ∇ · J K where the vector b is a given velocity field, u? is equal to u restricted to the upwind side of a facet,  + + ? u b n > 0, u = · + (4.19) u− b n < 0, ·

κ is the diffusion coefficient, ΓD is the part of the boundary where the Dirichlet condition u = g is applied. The definitions of the jump and average operators and the parameters h and α are the same as for the Poisson equation. Again, the unit square, Ω = [0, 1] [0, 1], is considered with g = sin(5πy)) × applied on the boundary at x = 1 and a constant velocity field of b = ( 3, 2). − − The diffusion coefficient κ is set to zero in which case the DOLFIN solver for this problem can be implemented as shown in Figure 4.4 for linear triangular elements. The implementation is again a reflection of the mathematical formulation with a small exception regarding the upwind value u?. In the code, the variable bn  is computed as bn = b n + b n /2. Relative to the two elements T+ and T · | · | − associated with a given facet it evaluates to:   + + b n b n > 0, b n− b n− > 0, bn + = · · and bn = · · (4.20) T + T− | 0 b n < 0, | 0 b n− < 0. · · 106 Chapter 4. Automation of discontinuous Galerkin methods

Python code from dolfin import *

# Create mesh and define function space mesh= UnitSquare(32, 32) V= FunctionSpace(mesh,"DG", 1)

# Define test and trial functions v= TestFunction(V) u= TrialFunction(V)

# Define normal component, mesh size, penalty parameter and right-hand side n= FacetNormal(mesh) h= CellSize(mesh) h_avg= (h(’+’)+ h(’-’))/2 alpha= 4.0 x= V.cell().x f=8 *pi**2*sin(2*pi*x[0])*sin(2*pi*x[1])

# Define variational problem a= dot(grad(u), grad(v)) *dx \ - dot(jump(u, n), avg(grad(v)))*dS \ - dot(avg(grad(u)), jump(v, n))*dS \ - dot(u*n, grad(v))*ds \ - dot(grad(u), v*n)*ds \ + alpha/h_avg*dot(jump(u, n), jump(v, n))*dS \ + alpha/h*u*v*ds L=f *v*dx

# Compute solution u= Function(V) solve(a== L, u)

# Project solution to piecewise linears u_proj= project(u)

# Plot solution plot(u_proj, interactive=True)

Figure 4.2: Complete DOLFIN solver for the interior penalty method applied to the Poisson equation on a unit square using k = 1. 4.2. Examples 107

Figure 4.3: Computed solution of the Poisson problem. The solution has been projected onto a piecewise linear basis for visualisation. The warped scalar field u has been scaled by a factor of 0.5.

Recalling that jump(v) in UFL is equivalent to v+ v the line in the code concern- − − ing upwinding on interior facets dot(jump(v), bn(’+’)*u(’+’) - bn(’-’)*u(’-’)) is equivalent to:

 +   + +  v v− b n u b n−u− = − · − · + + + + + + v b n u v−b n u v b n−u− + v−b n−u− (4.21) · − · − · · Since either b n+ or b n is zero, (4.21) is identical to the bu? v term in (4.17) · · − · when the definition of u? in (4.19) is used. The implementationJ K of the upwind value is a good example of the flexibility offered by the operators in Table 4.1 for implementing more complex expressions. Also note how the Dirichlet boundary conditions are applied to only the ΓD part of the exterior boundary. This is achieved in the code by first creating a class DirichletBoundary, see Section 1.3.5, which overloads the inside function to return true when x = 1. Then, a FacetFunction is created which holds an integer value, a marker, for all facets of the mesh and the value for all facets is initially set to 0. The DirichletBoundary class is then used to mark the facets which are located at x = 1 by 1. The variable boundary_facets now contains the index of all facets and the associated value (0 or 1) which indicates if the facet is part of the ΓD boundary or not. This variable is used to redefine the Measure object ds to let *ds(1) and *ds(0) in the forms, a and L, indicate integration over the Γ and Γ ∂Ω parts of the boundary respectively, see Section 1.3.2. The D D \ computed solution to this problem, projected onto a piecewise linear basis, is seen in Figure 4.5. 108 Chapter 4. Automation of discontinuous Galerkin methods

Python code from dolfin import *

# Create mesh and define function space mesh= UnitSquare(64,64) V= FunctionSpace(mesh,"DG", 1)

# Define test and trial functions v= TestFunction(V) u= TrialFunction(V)

# Define normal component, mesh size, penalty parameter and velocity n= FacetNormal(mesh) h= CellSize(mesh) alpha= 4.0 b= Constant((-3.0,-2.0))

# Define Dirichlet subdomain and value. class DirichletBoundary(SubDomain): def inside(self, x, on_boundary): return abs(x[0]- 1.0)< DOLFIN _EPS and on_boundary boundary_facets= FacetFunction("uint", mesh, 0) DirichletBoundary().mark(boundary_facets, 1) ds= ds[boundary _facets] g= sin(5.0 *pi*V.cell().x[1])

# bn= bn if outflow _facet else0 bn=(dot(b, n)+ abs(dot(b, n)))/2.0

# Define forms a= dot(-b *u, grad(v))*dx \ + dot(bn(’+’)*u(’+’)- bn(’-’) *u(’-’),jump(v))*dS+ dot(bn *u,v)*ds(0)\ + (alpha/h*u*v- dot(grad(u), v *n)- dot(u *n, grad(v)))*ds(1) L= (alpha/h *g*v- dot(g *n, grad(v))- dot(grad(g), v *n))*ds(1)

# Compute solution u= Function(V) solve(a== L, u)

# Project solution to piecewise linears and plot u_proj= project(u) plot(u_proj, interactive=True)

Figure 4.4: Complete DOLFIN solver for the advection–diffusion equation with diffusion coefficient κ = 0. 4.2. Examples 109

Figure 4.5: Computed solution of the advection–diffusion problem. The solution has been projected onto a piecewise linear basis for visualisation. The warped scalar field u has been scaled by a factor of 0.5.

4.2.3 The Stokes equations

The Stokes equations with a mixture of continuous and discontinuous functions, as well as basis functions with possibly varying polynomial orders are now addressed. Consider the function spaces W and Q,

  d  W := w L2 (Ω) : w P (T) T , 1 i d , (4.22) ∈ i ∈ k ∀ ∈ Th 6 6 n o Q := q H1 (Ω) : q P (T) T , (4.23) ∈ ∈ j ∀ ∈ Th where Ω is a bounded domain in Rd with d 2. Setting V = W Q and u = 0 on > × ∂Ω, particular bilinear and linear forms for the Stokes equation read (Baker et al., 1990)

 Z Z Z a u, p; v, q := ν u : v dx + p v dx u q dx Ω ∇ ∇ Ω ∇ · − Ω · ∇ Z Z + u n q ds + u nq ds Γ0 J · K ∂Ω · Z Z Z ν u : v ds ν u : v ds ν u : v ds − Γ0 h∇ i J K − Γ0 J K h∇ i − ∂Ω ∇ J K Z Z να Z να ν u : v ds + u : v ds + u : v ds, (4.24) − ∂Ω J K ∇ Γ0 h J K J K ∂Ω h J K J K 110 Chapter 4. Automation of discontinuous Galerkin methods

UFL code # Create mixed function space W= VectorElement("Discontinuous Lagrange","triangle", 1) Q= FiniteElement("Lagrange","triangle", 1) element=W * Q

# Define test and trial functions (v, q)= TestFunctions(element) (u, p)= TrialFunctions(element)

# Define normal component, mesh size, penalty parameter and right-hand side n= element.cell().n h= 2.0 *triangle.circumradius h_avg= (h(’+’)+ h(’-’))/2 alpha= 4.0 f= Coefficient(W)

# Define forms a= inner(grad(u), grad(v)) *dx+ inner(grad(p), v) *dx- inner(u, grad(q)) *dx \ + inner(jump(u, n), q(’+’))*dS \ + inner(u, n)*q*ds \ - inner(dot(avg(grad(u)), n(’+’)), jump(v))*dS \ - inner(jump(u), dot(avg(grad(v)), n(’+’)))*dS \ - inner(dot(grad(u), n), v)*ds \ - inner(u, dot(grad(v), n))*ds \ + alpha/h_avg*inner(jump(u), jump(v))*dS \ + alpha/h*inner(u, v)*ds

L= dot(f, v) *dx

Figure 4.6: UFL input for the Stokes equation using k = 1 and ν = 1.0. and  Z L v, q := f v dx. (4.25) Ω · The jump and average operators are defined as v = v+ n+ + v n , · h·i ⊗ − ⊗ − v n = v+J Kn+ + v n and v = ( v+ + v )/2JonK Γ and v = v n on · · − · − h∇ i ∇ ∇ − 0 ⊗ J∂Ω. TheK UFL input in two dimensions for this problem with k = j =J1,K as proposed in Baker et al.(1990), and the kinematic viscosity ν = 1.0 is shown in Figure 4.6.

4.2.4 Biharmonic equation

Classically, Galerkin methods for the biharmonic equation seek approximate solu- tions in a subspace of H2 (Ω). However, such functions are difficult to construct in a finite element context. Based on discontinuous Galerkin principles, methods have been developed which utilise functions from H1 (Ω) (Engel et al., 2002; Wells and Dung, 2007). Rather than considering jumps in functions across element boundaries, 4.2. Examples 111 terms involving the jump in the normal derivative across element boundaries are introduced. Unlike fully discontinuous approaches, this method does not involve double-degrees of freedom on element edges and, therefore, does not lead to the significant increase in the number of degrees of freedom relative to conventional methods. Consider the continuous function space n o V := v H1 (Ω) : v P (T) T . (4.26) ∈ 0 ∈ k ∀ ∈ Th The bilinear and linear forms for the biharmonic equation, with the boundary conditions u = 0 on ∂Ω and 2u = 0 on ∂Ω, read ∇ Z Z Z a (u, v) := 2u 2v dx u 2v ds 2u v ds Ω ∇ ∇ − Γ0 J∇ K · h∇ i − Γ0 h∇ i · J∇ K Z α + u v ds, (4.27) Γ0 h J∇ K · J∇ K Z L(v) := f v dx. (4.28) Ω The jump and average operators are defined as v = v+ n+ + v n · h·i ∇ ∇ · ∇ − · − and 2v J=K ( 2v+ + 2v )/2 on Γ . The UFL inputJ forK this problem with k = 4 h∇ i ∇ ∇ − 0 is shown in Figure 4.7.

As an example for the biharmonic equation consider the domain Ω = [0, 1] × [0, 1] [0, 1] with f = 9π4 sin(πx) sin(πy) sin(πz), in which case the exact solution × u = sin(πx) sin(πy) sin(πz). The observed convergence behaviour for this problem is illustrated in Figure 4.8 for various polynomial orders. As predicted by a priori estimates, a convergence rate of k + 1 is observed for k > 2 (Engel et al., 2002), and a rate of k for polynomial order k = 2 (Wells and Dung, 2007).

The error in the L2 norm for the convergence rates in Figure 4.8, is computed via the code shown in Figure 4.9 where the finite element solution uh has been computed using fourth order Lagrange basis functions. Given the exact solution u and the finite element solution u , the error e = u u can be computed by h − h the functional M in the code. Note that the exact solution has been approximated by interpolating the exact solution using a continuous eighth order polynomial. Extending the FEniCS framework for discontinuous Galerkin methods also permits the computation of other norms like the mesh-dependent semi-norm of the error Z Z e 2 = e e dx + e e ds, (4.29) ||| ||| Ω ∇ · ∇ Γ0 J K · J K in a straightforward fashion. The UFL input for this functional is shown in Figure 4.10. 112 Chapter 4. Automation of discontinuous Galerkin methods

UFL code # Define test and trial functions element= FiniteElement("Lagrange", tetrahedron, 4) u= TrialFunction(element) v= TestFunction(element)

# Normal component, mesh size and right-hand side n= element.cell().n h= 2.0 *element.cell().circumradius h_avg= (h(’+’)+ h(’-’))/2 f= Coefficient(element)

# Parameters alpha= 16.0

# Bilinear form a= inner(div(grad(u)), div(grad(v))) *dx \ - inner(avg(div(grad(u))), jump(grad(v), n))*dS \ - inner(jump(grad(u), n), avg(div(grad(v))))*dS \ + alpha/h_avg*inner(jump(grad(u), n), jump(grad(v),n))*dS

# Linear form L=f *v*dx

Figure 4.7: UFL input for the biharmonic equation using k = 4.

100 k = 2 1 k = 3 10− k = 4 2 10 2 − 1 k h u 10 3

− −

u 4 k 10 4 − 1

5 5 10− 1 6 10− 1 0.1 h

Figure 4.8: Error in the L2 norm for the biharmonic equation with penalty parame- ters α = 4, α = 20 and α = 20, for k = 2, k = 3 and k = 4 respectively. 4.2. Examples 113

UFL code element_u= FiniteElement("Lagrange", tetrahedron, 8) element_uh= FiniteElement("Lagrange", tetrahedron, 4)

u= Coefficient(element _u) u_h= Coefficient(element _uh)

e=u-u _h M=e *e*dx

Figure 4.9: Computation of the error in the L2 norm (squared).

UFL code element_u= FiniteElement("Lagrange", tetrahedron, 8) element_uh= FiniteElement("Discontinuous Lagrange", tetrahedron, 4)

u= Coefficient(element _u) u_h= Coefficient(element _uh)

e=u-u _h M= inner(grad(e), grad(e)) *dx+ inner(jump(e), jump(e)) *dS

Figure 4.10: Computation of the error in a mesh-dependent semi-norm (squared). 114 Chapter 4. Automation of discontinuous Galerkin methods

4.2.5 Further applications As demonstrated by the examples in this section, many problems involving concepts from discontinuous Galerkin methods can now be implemented in the FEniCS framework in a relatively straightforward fashion due to the extensions developed in this chapter. In addition to the presented examples, the developments for automation of discontinuous Galerkin methods in the FEniCS framework have also been applied by researchers and application developers to other problems such as free surface flows (Labeur and Wells, 2009), the Navier–Stokes equations (Labeur and Wells, 2012; Selim et al., 2012; Giesselmann et al., 2012), microstructural processes (Maraldi et al., 2011), the advection-diffusion-reaction equation (Wells, 2011), magnetic advection (Sukys et al., 2010; Heumann and Hiptmair, 2012), mantle convection simulations (Vynnytska et al., 2013, 2012), wave surface elevation (Lopes et al., 2011), Nitsche’s method for overlapping meshes (Massing et al., 2013), PDE- constrained optimisation (Funke and Farrell, 2013) and the p–biharmonic equation (Pryer, 2012). Clearly, automation of discontinuous Galerkin methods is a valuable extension to the FEniCS framework. However, in order to handle an even broader range of problems, automation of a particular class of discontinuous Galerkin methods remains to be addressed. This is the topic of the following chapter. 5 Automation of lifting-type discontinuous Galerkin methods

This chapter addresses the automation of a so-called lifting-type discontinuous Galerkin method proposed by Bassi and Rebay(1997) and Bassi and Rebay(2002) for the compressible Navier–Stokes equations. The method was analysed in Brezzi et al.(2000) and successfully used by, for example, Dung and Wells(2006); Wells and Dung(2007); Dung and Wells(2008) for thin bending problems. This chapter discusses a particular lifting-type formulation for the Poisson equation and pro- vides a basis for a similar formulation which is developed in the next chapter in the context of gradient plasticity. The formulation has two major advantages compared to the interior penalty (IP) formulation in Section 4.2.1. Firstly, no experiments are needed to determine the value of the stabilisation parameter as the formulation is stable for all positive values1 (unlike the penalty parameter α in (4.15), page 104). Secondly, numerical experiments, see Section 5.3, indicate that one can use a con- stant basis for the Poisson equation, which is not possible when using the interior penalty method. These properties make the formulation particular interesting for the gradient plasticity model which will be introduced in Chapter6. In addition, the lifting-type formulation is, unlike the IP formulation, also suitable for nonlinear problems as the former preserves symmetry of the formulation (Ten Eyck and Lew, 2006; Ten Eyck et al., 2008; Wells and Dung, 2007; Dung and Wells, 2008). However, the advantages come at a price. The three main drawbacks of the lifting-type formulation are that the formulation is more complex, the local assembly is more expensive to perform and the global tensor arising from assembling the variational form becomes less sparse. Due to the complexity of lifting-type formulations it is desirable to support these in the FEniCS framework. Unfortunately, fully automated support is not yet available, but it is possible to implement the methods in a semi-automated fashion by taking advantage of the functionality developed in the previous chapter. This

1Although the method is stable for all positive values of the stabilisation parameter, it should be chosen small enough such that it does not dominate the results when constant elements are used. This will be investigated in Section 5.3. 116 Chapter 5. Automation of lifting-type discontinuous Galerkin methods chapter first presents a lifting-type formulation for the Poisson equation. Then follows the implementation of this formulation in the FEniCS framework, including some developments for semi-automated support. The two formulations for the Poisson equation are then compared to each other to illustrate the influence of the penalty parameter and performance for constant elements. Finally, future developments to enable fully-automated support for lifting-type formulations in the FEniCS framework are discussed.

5.1 Lifting-type formulation for the Poisson equation

This section describes the basic concepts of a lifting-type formulation for the Poisson equation. The notation from the previous chapter is adopted and some definitions and concepts are reiterated in the following for convenience. Recall from Section 4.2.1 the discontinuous scalar function space V: n o V := v L2 (Ω) : v P (T) T , (5.1) ∈ |T ∈ k ∀ ∈ Th where Pk (T) denotes the space of Lagrange polynomials of degree k on the element T of the standard triangulation of Ω, which is denoted by . Again, let T+ and Th T− denote the two cells sharing a common facet S as shown in Figure 4.1. Let the jump of a function v V be defined as: ∈  + + v n + v−n− on Γ0, v = (5.2) J K vn on ∂Ω,

+ + where ( ) and ( )− denote the value of a quantity ( ) on T and T respectively, · · · − n is the outward unit normal and Γ0 is the set of interior facets in Ω. A function space for the gradient of functions in V is also defined:

 h id  Q := q L2 (Ω) : q P (T) T , (5.3) ∈ |T ∈ l ∀ ∈ Th where Lagrange polynomials of degree l are used on the local element T. As functions in Q should contain the gradient of functions in V it implies that l k 1, > − with k being the polynomial degree of functions in V. The average of a function q Q is defined as: ∈  1 +   q + q− on Γ0, q = 2 (5.4) h i q on ∂Ω.

An operator r : V Q (Brezzi et al., 2000) is now defined for a given v V, S → ∈ 5.2. Semi-automated implementation of lifting-type formulations 117

find r (v) Q such that: S ∈ Z Z rS (v) q dx = v q ds q Q, (5.5) E · − SJ K · h i ∀ ∈ where E = T+ T (identical to the macro cell T in Figure 4.1) for S Γ ; and ∪ − ∈ 0 E is the element associated with the facet S for S ∂Ω. Based on this operator, ∈ Th ∈ a function can now be defined:

R (v) = ∑ rS (v) , (5.6) S Γ Γ ∈ 0∪ D which can be interpreted as an approximation of the gradient of v as the operation defined in (5.5) transforms the jump in v across element facets into a gradient-like quantity on element interiors. The function R (v) will be referred to as a lifting function which is defined by the lifting operator r . Notice that because S Γ Γ S ∈ 0 ∪ D in (5.6), the function is not defined on ΓN, and it will therefore be set to be zero in this case. The bilinear form for the Poisson equation, corresponding to (4.15) in Sec- tion 4.2.1, can now be defined in terms of the lifting function and the lifting operator:

Z   a (u, v) := u + R (u) v + R (v) dx Ω ∇ · ∇ Z + ∑ α rS (u) rS (v) dx, (5.7) S Γ Γ Ω · ∈ 0∪ D where the last term is a stabilisation term with α being a stabilisation parameter. An important property of the lifting-type discontinuous Galerkin formulation is that the method is stable for any α > 0, which is in contrast to the IP formulation in (4.15), see Arnold et al.(2002). In addition, no parameter for the mesh size is needed (h in (4.15))2. The linear form for the Poisson problem using a lifting-type formulation remains identical to (4.16) when considering homogeneous Dirichlet boundary conditions.

5.2 Semi-automated implementation of lifting-type formulations

In the previous chapter, it was shown how the IP formulation for the Poisson equation in (4.15) and (4.16) can be implemented in a straightforward fashion

2The mesh size parameter can be defined differently depending on the problem. For the IP formula- tion of the Poisson equation in Arnold et al.(2002) the parameter he is defined as the length of an edge, while Djoko et al.(2007b) defines he as the distance between centroids of elements sharing a common edge for a similar problem. 118 Chapter 5. Automation of lifting-type discontinuous Galerkin methods using the FEniCS framework, see Figure 4.2 on page 106. The implementation of lifting-type discontinuous Galerkin forms like (5.7) is, however, more involved. This is due to the nature of the lifting function R (v) defined in (5.6), through the variational problem in (5.5), which adds complexity to the assembly procedure. However, it is possible to use the tools provided by FEniCS as building blocks to extend the framework to also handle lifting-type formulations in a semi-automated fashion. This section describes a possible approach to achieve this. n onv n T,qonq Let φT,v and φ denote the local finite element basis for V and Q k k=1 k k=1 on a cell T respectively. From (5.5) two tensors, AE and AS, can be identified:   Z ¯E,q ¯E,q AE,i = aE φi , φi = rS (v) q dx (5.8) 2 1 E ·   Z ¯E,v ¯E,q AS,i = aS φi , φi = v q ds (5.9) 2 1 − SJ K · h i

E where i = (i1, i2) is the usual multi-index and φ¯ is a, possible, macro basis which can be constructed from (4.6), page 102. The lifting operator rS (v) on the cell E can be represented as: N ¯E,q rS (v) = ∑ rkφk (5.10) k where N = 2n if S is an interior facet and N = n otherwise; and r N is the q q k ∈ R vector of degrees of freedom values for the function rS (v) and can be computed from: 1 rk = (AE)− AS. (5.11) The vector r is then used to compute the local cell tensor corresponding to (5.7). k R To keep things simple, only the R (u) v dx term is now considered and it is Ω · ∇ assumed that S is an interior facet. The local cell tensor AT for this term is equal to:   Z ¯E,q T,v AT,i = aT φi , φi = rS (u) v dx, (5.12) 2 1 T · ∇ with rS (u) defined by (5.10). Due to the extensions presented in the previous chapter, a bilinear form to compute AS from (5.9) in two dimensions can be implemented directly in UFL as:

UFL code Q= VectorElement("DG", triangle, 0) V= FiniteElement("DG", triangle, 1) q= TestFunction(Q) u= TrialFunction(V) n= triangle.n a=- inner(jump(u, n), avg(q)) *dS 5.2. Semi-automated implementation of lifting-type formulations 119 where discontinuous constant elements and discontinuous linear elements have been used for Q and V respectively. The tensor AE in (5.8) is a macro tensor computed over the domain E = T+ T . It is currently not possible to handle ∪ − integrals over macro cells in the FEniCS framework, but AE can be computed by evaluating

UFL code Q= VectorElement("DG", triangle, 0) v= TestFunction(V) u= TrialFunction(V) a=u *v*dx

+ on T and T− and then inserting the resulting tensors into AE (which is essentially a macro mass matrix) following the construction in (4.6). The bilinear form to compute AT only involves the standard integral over a cell and can, like AS, be implemented directly in UFL as:

UFL code Q= VectorElement("DG", triangle, 0) V= FiniteElement("DG", triangle, 1) v= TestFunction(V) R= TrialFunction(Q) a= inner(R, grad(v)) *dx

Before formulating the assembly algorithm, a collective local-to-global mapping ιT,S is needed. Note that this mapping depends on both T and S due to (5.6). In this particular example the collective local-to-global mapping for each T and ∈ Th S Γ is defined as ∈ 0   ι (i) = ι1 (i ) , ι2 (i ) i , (5.13) T,S T 1 S 2 ∀ ∈ IT,S

1 2 where the mappings ιT (i1) and ιS (i2) are computed according to (1.11), page 26, and (4.7), page 102, respectively; and is the index set: IT,S n o = (1, 1), (1, 2),..., (n , 2n 1), (n , 2n ) , (5.14) IT,S v q − v q where it is assumed that S is an interior facet (otherwise 2nq = nq). An algorithm to compute the contribution to the global tensor A from the local R cell tensor A for the term R (u) v dx in (5.7) is outlined in Algorithm3. An T Ω · ∇ extension of the assemble function in DOLFIN (Sections 1.3.5 and 4.1.4) based on the approach outlined in Algorithm3 for lifting-type formulations is implemented in the C++ class LiftingAssembler in the FEniCS Solid Mechanics library. Pro- vided that the user supplies the necessary variational forms, the LiftingAssembler _ computes AE, AS and AT by simply calling the tabulate tensor function (AE is 120 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

R Algorithm 3 Assembly algorithm for R (u) v dx. Ω · ∇ 1: for T do ∈ Th 2: for S ∂T do ∈ 3: Compute AE and AS from (5.8) and (5.9) to obtain rS (v) via (5.11) 4: Compute AT from (5.12) 5: Compute local-to-global degree of freedom mapping ιT,S from (5.13) 6: Add AT to A according to ιT,S: 7: for i T,S do ∈ I + 8: = AιT,S(i) AT,i 9: end for 10: end for 11: end for

constructed from A + and A as already mentioned). The tensors A and A are T T− E T then used to compute the degrees of freedom values rk by solving (5.11), which are 3 then passed as function values to the form (5.12) to compute AT in line 4. The tabulate_dofs function for each discrete function space on the cells T, T+ and T− is called to construct the collective local-to-global mapping ιT,S after which the global tensor A can be updated. It is clear from Algorithm3 that the local element assembly of lifting-type formulations is more complex, and thus more expensive to perform, compared to the assembly outlined in Algorithm2, page 103, for IP formulations. The increase in complexity also leads to a global tensor that is much less sparse. This is illustrated in Figures 5.1 and 5.2, which indicate the location of nonzero entries in the global tensor A obtained by assembling the bilinear formulations in (4.15) and (5.7) respectively on the unit square Ω = [0, 1] [0, 1] using the mesh shown in × Figure 5.3a. Discontinuous constant elements have been used such that cell indices correspond to degree of freedom numbers. The increase in off-diagonal entries in Figure 5.2 owes to the presence of the R (u) R (v) term in (5.7), see Brezzi et al. · (2000). The reason is that each of the lifting functions contains a loop over facets of the local cell, see (5.6), which effectively couple degrees of freedom on cells that ‘share a neighbouring cell’. This is different from the interior facets integrals in the IP formulation which only couple degrees of freedom on cells that share a facet. Figure 5.3b and 5.3c indicate the cells involved when computing the entry for degree of freedom number six when using the IP and lifting-type formulations respectively. In general, when considering a cell sufficiently far from the boundary of a mesh consisting of triangles, the IP formulation will involve four cells while the lifting-type formulation will involve ten cells when evaluating an entry.

3These local computations involve many linear algebra operations on dense data structures including computing the inverse of AE. In order to keep the implementation simple, the FEniCS Solid Mechanics library employs Armadillo (http://arma.sourceforge.net/) to perform these computations. 5.2. Semi-automated implementation of lifting-type formulations 121

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14   0 1  ∗ ∗ ∗   ∗ ∗ ∗  2    ∗ ∗ ∗  3    ∗ ∗ ∗ ∗  4    ∗ ∗ ∗  5    ∗ ∗ ∗  6    ∗ ∗ ∗ ∗  7    ∗ ∗ ∗ ∗  8   9  ∗ ∗ ∗    10  ∗ ∗ ∗ ∗    11  ∗ ∗ ∗ ∗    12  ∗ ∗   ∗ ∗ ∗ ∗  13   14 ∗ ∗ ∗ ∗ ∗

Figure 5.1: Illustration of nonzero entries in the global tensor Aip arising from assembling the IP formulation in (4.15) on the mesh shown in Figure 5.3a using discontinuous constant elements.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14   0 1  ∗ ∗ ∗ ∗ ∗ ∗   ∗ ∗ ∗ ∗ ∗  2    ∗ ∗ ∗ ∗ ∗ ∗  3    ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗  4    ∗ ∗ ∗ ∗ ∗  5    ∗ ∗ ∗ ∗ ∗ ∗  6    ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗  7    ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗  8   9  ∗ ∗ ∗ ∗ ∗ ∗    10  ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗    11  ∗ ∗ ∗ ∗ ∗ ∗ ∗    12  ∗ ∗ ∗ ∗   ∗ ∗ ∗ ∗ ∗ ∗ ∗  13   14 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

Figure 5.2: Illustration of nonzero entries in the global tensor Alift arising from assembling the lifting-type formulation in (5.7) on the mesh shown in Figure 5.3a using discontinuous constant elements. 122 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

11 13 14 11 13 14 11 13 14 10 12 10 12 10 12

7 9 7 9 7 9 5 6 8 5 6 8 5 6 8

1 3 4 1 3 4 1 3 4 0 2 0 2 0 2

(a) Simple mesh including (b) IP formulation. (c) Lifting-type formulation. global cell indices.

Figure 5.3: Simple mesh of the domain Ω = [0, 1] [0, 1] including cell indices. Because constant elements are used, the index numbering× is equal to the degree of freedom numbering. Figures (b) and (c) show the cells involved when computing the entry for degree of freedom number 6 using the IP and lifting-type formulations respectively.

5.3 Comparison of IP and lifting-type formulations

The disadvantages of the lifting-type formulation in terms of increased complexity and a less sparse global tensor have been treated in the previous section. This section concerns the advantages of lifting-type discontinuous Galerkin formulations (5.7) over the IP formulation (4.15) in terms of the penalty parameter and applicability to constant elements. For the IP formulation, it is not straightforward to determine the value of the penalty parameter α a priori, and it is, therefore, usually determined through numerical experiments. As already mentioned, this is not the case for the lifting-type formulation which is stable for any α > 0. Another drawback of the IP formulation is that, if discontinuous constant elements are used for V (k = 0 in (5.1)), then all terms in (4.15) vanish except the stabilisation term. As a consequence, the value of the penalty parameter will govern the solution. This is in contrast to the lifting-type formulation in (5.7) where also the R (u) R (v) term is · nonzero in addition to the stabilisation term for constant elements. The model problem from Section 4.2.1 on the unit square Ω = [0, 1] [0, 1] × with f = 8π2 sin(2πx) sin(2πy), is considered again. The exact solution for this problem is u = sin(2πx) sin(2πy). First, the convergence of the two formulations using discontinuous linear elements and different values for α is investigated. Then, the convergence of the two formulations using discontinuous constant elements on two different types of structured meshes is investigated, and the results obtained for the ‘optimal’ value of the penalty parameter is presented. Finally, comparison is made between results obtained using an unstructured mesh of discontinuous constant elements. The two different types of structured meshes that will be used 5.3. Comparison of IP and lifting-type formulations 123

(a) Structured mesh, the ‘right’ mesh, with (b) Structured mesh, the ‘left/right’ mesh, the direction of the diagonal pointing to the with alternating direction of the diagonals. right.

Figure 5.4: Two types of structured meshes for the domain Ω = [0, 1] [0, 1]. ×

are shown in Figure 5.4. The diagonals of the mesh in Figure 5.4a all point to the right, and this type of structured mesh will therefore be referred to as the ‘right’ mesh. This particular mesh is created in DOLFIN by:

C++ code UnitSquare mesh(4, 4);

and is the default mesh type in DOLFIN. The direction of the diagonals of the mesh in Figure 5.4b alternates between right and left and will be referred to as the ‘left/right’ mesh which is created in in DOLFIN by:

C++ code UnitSquare mesh(4, 4,"left/right");

The convergence of the two formulations in the L2 norm is first investigated using discontinuous linear elements, k = 1 in (5.1), on the ‘right’ mesh. The results can be seen in Figure 5.5 for various values of the penalty parameter α. As expected, a convergence rate of k + 1 is observed for the two formulations in general. In the case where α = 2, the IP formulation appears to be unstable which indicates that the value of α is too small, while for α > 4 the formulation is stable. The lifting-type formulation is stable for all values of α > 0 as predicted by Brezzi et al.(2000) which builds confidence in the implementation outlined in the previous section. Convergence rates for the two formulations using discontinuous constant ele- ments, k = 0 in (5.1), on the two different meshes for various values of α are shown in Figure 5.6. Convergence of the IP formulation on the ‘right’ mesh is not observed for any value of α. On the ‘left/right’ mesh, convergence can only be achieved for a very limited range of values of the penalty parameter when using constant elements. For α = 2.0 and α = 4.0 no convergence is observed and only for α = 2.45 124 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

100 100

1 1 10− 10− || || h h u 10 2 u 10 2 − − 2 − − 2 u u || || 1 1 10 3 10 3 − 3 − α = 10− α = 2 = 100 4 α 3 α = α = 10 α = 1000 4 4 10− 10− 0.1 0.01 0.1 0.01 h h (a) Convergence for lifting-type formulation. (b) Convergence for IP formulation

Figure 5.5: Error in the L2 norm as a function of the cell size for various values of α using discontinuous linear elements (k = 1). The results were computed on the ‘right’ mesh (similar results can be obtained using the ‘left/right’ mesh).

is the convergence rate as expected while for α = 2.3 convergence is suboptimal. On this particular mesh convergence is thus very sensitive with respect to the value of the penalty parameter. This is contrasted by the lifting-type formulation where 2 the expected convergence rate is observed for values in the range 0 < α < 10− for both types of meshes. The important thing to note here, is that the stabilisation parameter should be chosen small enough such that the solution is not dominated by the stabilisation term. Figure 5.7 show computed results for the two formulations on both types of meshes using ‘optimal’ values of α, that is, α = 2.45 for the IP formulation 3 and α = 10− for the lifting-type formulation although α could take any value 2 in the range 0 < α < 10− . From the figure it is obvious that the IP result is greatly influenced by the orientation of the mesh which is the reason for the poor convergence in the ‘right’ mesh case, see Figure 5.6b. The lifting-type formulation, on the other hand, seems largely unaffected by the mesh orientation. The reason is that the IP formulation only consider jumps in the value between adjacent cells on the shared facet, while the R (u) R (v) term of the lifting-type formulation couple · the jump across one facet with the jumps across all other facets associated with a given cell (also compare Figures 5.3b and 5.3c). This procedure results in an averaging of the gradient experienced by the cell and reduces the influence of mesh orientation. Figure 5.8 shows the computed solutions to the Poisson problem for the two formulations on an unstructured mesh using constant elements. The computed 3 solution from the lifting-type formulation with α = 10− is compared to the solutions from the IP formulation with different values of α. From the results in 5.3. Comparison of IP and lifting-type formulations 125

100 100

1 1 10− 10− || ||

h h 1

u 1 u − − u u

|| 1 || 1 2 2 10 0 10 − α = 10 − α = 1.00 = 10 1 2.00 α −2 α = α = 10− α = 2.30 = 10 3 2.45 α −4 α = α = 10 α = 4.00 3 − 3 10− 10− 0.1 0.01 0.1 0.01 h h (a) Lifting-type formulation ‘right’ mesh. (b) IP formulation ‘right’ mesh 100 100

1 1 10− 10− || ||

h h 1

u 1 u − − u u

|| 1 || 1 2 2 10 0 10 − α = 10 − α = 1.00 = 10 1 2.00 α −2 α = α = 10− α = 2.30 = 10 3 2.45 α −4 α = α = 10 α = 4.00 3 − 3 10− 10− 0.1 0.01 0.1 0.01 h h (c) Lifting-type formulation on ‘left/right’ (d) IP formulation on ‘left/right’ mesh. mesh.

Figure 5.6: Error in the L2 norm as a function of the cell size for various values of α on different types of meshes using discontinuous constant elements (k = 0). 126 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

(a) Computed solution for the lifting-type (b) Computed solution for the IP formula- 3 formulation with α = 10− on the ‘right’ tion with α = 2.45 on the ‘right’ mesh. mesh.

(c) Computed solution for the lifting-type (d) Computed solution for the IP formula- 3 formulation with α = 10− on the ‘left/right’ tion with α = 2.45 on the ‘left/right’ mesh. mesh.

Figure 5.7: Computed solutions to the Poisson problem for the two formulations on the two structured meshes using constant elements and a ‘optimal’ value of α. 5.4. Future developments 127

Figures 5.6d and 5.7 one could expect that the IP formulation with α = 2.45 would perform reasonably well on an unstructured mesh. But from Figure 5.8, α = 2 seems to produce the best result as higher values of α reduce the magnitude of the values in the solution. The lifting-type formulation is not affected by the unstructured mesh and the result is comparable to the ones obtained in Figures 5.7c and 5.7a for the ‘left/right’ and ‘right’ meshes respectively. In conclusion, a lifting-type formulation is needed to obtain reliable results when using discontinuous constant elements for the Poisson problem. 5.4 Future developments Rather than the semi-automated approach outlined in Section 5.2 it is obviously desirable to have lifting-type formulations fully supported in the FEniCS toolchain starting from the UFL input. However, it is still not completely clear what ab- stractions are needed to accomplish this, but general support for performing static condensation in FEniCS would be a good start. A possible future syntax for R implementing the R (u) v dx term from (5.7) could be: Ω · ∇ UFL code Q= VectorElement("DG", triangle, 0) V= FiniteElement("DG", triangle, 1) v= TestFunction(V) u= TrialFunction(V) R= LiftingFunction(Q) a= inner(R(u), grad(v)) *dE

The LiftingFunction represents the operations defined in (5.5) and (5.6) and FFC must be extended to support code generation for these operations. A new type of integral, dE, is introduced to denote integration over a patch of elements such that evaluating R (u) on T will involve T and all of its neighbours. UFC will then need to provide an interface for this new integral class and DOLFIN must be updated with an algorithm to perform the assembly, including the construction of the collective local-to-global mapping for the patch of elements under consideration. The procedure outlined above is further complicated by the definition of the lifting function in (5.6) where the loop over facets depends on the set ΓD where Dirichlet boundary conditions are to be applied. In the event of a moving boundary inside the domain Ω, the assembly algorithm must thus be provided with information about which cells and facets to consider. Another possible future direction, which is perhaps achieved more easily, is to extend the assembly algorithm. In Algorithm3 the loop over facets S to compute rS (v) is nested inside a loop over the cells of the mesh. It should be possible to compute the entire lifting function R (v) in a single loop over all facets of the mesh to avoid redundant computations of rS (v). The collective local-to-global mapping must, however, still be constructed by looping over the facets of the cell T during assembly. 128 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

(a) Computed solution for the lifting-type (b) Computed solution for the IP formula- 3 formulation with α = 10− on an unstruc- tion with α = 2 on an unstructured mesh. tured mesh.

(c) Computed solution for the IP formulation (d) Computed solution for the IP formula- with α = 2.45 on an unstructured mesh. tion with α = 4 on an unstructured mesh.

Figure 5.8: Computed solutions to the Poisson problem for the two formulations on an unstructured mesh using constant elements. The solution computed using the lifting-type formulation (a) is compared to the solutions obtained using the IP formulation with different values of α. 6 Strain gradient plasticity

This chapter brings together the tools and extensions of the previous chapters in an implementation of a strain gradient plasticity model proposed by Aifantis(1984) in the FEniCS framework. Strain gradient plasticity models can be used to model size effects which cannot be accounted for by classical plasticity theory. Size effects in plasticity are manifest as an increase in the strength of a material as the size of a specimen becomes smaller. This effect has been observed in many applications at the micron scale, for instance micro-indentation (Poole et al., 1996; Nix and Gao, 1998; Begley and Hutchinson, 1998), micro-bending (Stölken and Evans, 1998) and wire torsion (Fleck et al., 1994). For softening problems, the classical plasticity theory exhibits a pathological mesh dependence in material softening as it does not provide a length scale for the shear band width. Strain gradient models that define an internal length scale can, therefore, sometimes be used to provide regularisation in softening problems under certain conditions. The considered plasticity model involves the addition of the Laplacian of the plastic multiplier to the classical yield condition. By considering a weak formulation for the yield condition H1-regular functions can be used for representing the plastic multiplier. By employing a discontinuous formulation, the yield condition can be satisfied locally (cell-wise). Following this approach, the standard balance of momentum equations for the displacements are defined in the entire domain. The formulation for the yield condition, however, is only defined in the plastic domain. Furthermore, it is necessary to impose boundary conditions for the plastic multiplier on the, potentially moving, boundary of the plastic domain. This poses a numerically challenging problem. The chapter is organised as follows. First, the strain gradient plasticity model is presented. The presentation builds on the notation and equations presented in Chapter2, in particular Section 2.2.2 and 2.4.2 concerning plasticity and where convenient some of the definitions are reiterated. A lifting-type formulation for the plastic multiplier based on the work in the previous chapter is then proposed. This is followed by the linearisation of the governing equations after which the implementation in the FEniCS framework is discussed. Finally, numerical examples are presented followed by some computational observations. 130 Chapter 6. Strain gradient plasticity

6.1 A strain gradient plasticity model

This section introduces the strain gradient model which will be investigated in the remainder of this chapter. In the particular model under consideration, the yield criterion from (2.23), page 34, is augmented with the Laplacian of the internal hardening variable κ as suggested by Aifantis(1984, 1987) and investigated by, for instance, Mühlhaus and Aifantis(1991):  f (σ, εp, κ) := φ σ, q (εp) q (κ) σ + G 2κ 0, (6.1) kin − iso − y ∇ 6 p  where φ σ, qkin (ε ) is a scalar effective stress measure, qkin is a stress-like internal variable used to model kinematic hardening, qiso is a scalar stress-like term used to model isotropic hardening, κ is a scalar internal variable, σy is the initial scalar yield stress and the constant scalar G > 0 is a hardening parameter. The hardening parameter G determines the contribution of the gradient effect to hardening, and in the case G = 0 the model reduces to the classical problem in (2.23). As in Section 2.2.2, a von Mises model with linear isotropic hardening is adopted, see (2.25). Classical associative plastic flow is assumed1 see (2.26), and isotropic strain- hardening according to (2.27) is adopted, in which it follows that

κ˙ = λ˙ . (6.2)

In addition to the yield criterion (6.1), the Kuhn–Tucker loading–unloading condi- tions: f (σ, κ) 6 0, λ˙ > 0, f (σ, κ) λ˙ = 0 (6.3) and Prager’s consistency condition:

f˙ (σ, κ) = 0 (6.4) must also be satisfied in regions undergoing plastic deformations. In the remainder of this chapter, regions undergoing plastic deformations will be denoted by Ωp Ω. p p ⊂ p Furthermore, the boundary of Ω , denoted by ∂Ω , is decomposed into regions ΓD p p p p p and Γ such that Γ Γ = ∂Ωp and Γ Γ = ∅. N D ∪ N D ∩ N A consequence of the Laplacian term in (6.1) and the relationship between the hardening parameter and plastic multiplier in (6.2), the consistency condition (6.4) leads to a partial differential equation defined in the plastic domain Ωp rather than an algebraic equation for determining λ at a point. Therefore, conventional return mapping strategies, as for example the one outlined in Section 2.4.2, are not

1It has been argued that the plastic flow direction for strain gradient plasticity is governed by a microstress and not the deviatoric Cauchy stress, see for instance Gudmundson(2004). However, to ensure a proper comparison of the approach taken in this chapter to the approach of other researchers investigating the model by Aifantis, this argument is not taken into account. 6.1. A strain gradient plasticity model 131 suitable. A different approach is, therefore, adopted in which the yield criterion (6.1) is satisfied in a weak sense inside the plastic region at the end of a loading step: Z f (σn+1, λn+1) η = 0 η W, (6.5) Ωp ∀ ∈ for a suitable choice of W. Together with the standard balance of momentum equations, this forms a coupled system of equations for computing the unknowns u and λ. In the remainder of this section the subscript n + 1 is dropped for brevity. Consider a weak formulation for the yield criterion (6.5):  a (u, λ) ; η = 0 η W, (6.6) ∀ ∈ with

 Z    a (u, λ) ; η := φ σ (u, λ) Hλ σy η dx Ωp − − Z Z G λ η dx + G ( λ n) η ds, (6.7) − Ωp ∇ · ∇ ∂Ωp ∇ · where n is the outward unit normal to ∂Ωp and the last two integrals arise from the application of integration by parts. The variational form is nonlinear due to how the stress is computed from u and λ, however, the linearisation of the equations is postponed to Section 6.3. Due to the presence of integrals involving λ and η ∇ ∇ in (6.7), the functions interpolating λ and η should be in the space H1 (Ω). A further implication of (6.7) is the necessity of imposing boundary conditions on λ on the elastic–plastic boundary ∂Ωp. Frequently, the homogeneous boundary conditions:

p λ = 0 on ΓD, (6.8) p λ n = 0 on Γ , (6.9) ∇ · N are adopted, see for instance Mühlhaus and Aifantis(1991); De Borst and Mühlhaus p p (1992), where ΓD and ΓN denote the parts of the elastic–plastic boundary where Dirichlet and Neumann conditions for λ are applied respectively. These boundary conditions bear resemblance to the microhard and microfree boundary conditions suggested by Gurtin(2004); Gurtin and Needleman(2005) and defined as: microscopically hard boundary conditions meant to characterise, for example, microscopic behaviour at the boundary of a ductile metal perfectly bonded to a ceramic; microscopically free boundary conditions meant to characterise microscopic be- haviour at a boundary whose environment exerts no microscopic forces on the body. 132 Chapter 6. Strain gradient plasticity

The first boundary condition thus prevents plastic flow out of a domain, while the latter does not. Considering a discontinuous Galerkin formulation for satisfying the yield crite- rion in a weak sense carries certain advantages. For instance, by using discontinuous elements, the yield condition can be satisfied in a local sense (cell-wise). Another advantage of a discontinuous Galerkin formulation is that the boundary conditions in (6.8) and (6.9) on the, possibly moving, internal elastic–plastic boundary can be included naturally. Due to the discontinuous elements, jumps in the λ function across element boundaries can be represented, which is necessary to accommodate the λ n = 0 boundary condition on the elastic–plastic boundary. This issue can ∇ · be resolved by leaving the λ function undefined in the elastic region Ωe when considering continuous function spaces. However, this approach is computation- ally challenging if the elastic–plastic boundary is moving. Finally, discontinuous constant elements can be used for λ which is computationally more efficient. The choice of boundary condition for λ on the elastic–plastic boundary is crucial for the behaviour of the model. For instance, setting λ n = 0 on the elastic– ∇ · plastic boundary does not guarantee regularisation for a softening problem. The reason is that this boundary condition permits a constant field for λ inside the plastic domain which in turn does not introduce a gradient effect. Therefore, the model does not provide a mechanism by which the plastic domain can expand. On the other hand, by enforcing λ = 0 on the elastic–plastic boundary a constant nonzero λ field is no longer possible. This activates the gradient term inside the plastic domain and thereby provides a mechanism which allows the plastic domain to expand. Note, however, that for a softening problem the plastic domain will only expand if the gradient parameter is large enough to overcome the softening behaviour of the plastic domain and make adjacent elastic elements yield. The model is, therefore, not suitable for softening problems, see also Engelen et al. (2006) who investigated the model proposed in Fleck and Hutchinson(2001) as a representative of a wider class of gradient plasticity models, including that of Aifantis presented in this section. For hardening problems, on the other hand, the plastic domain can expand regardless of the choice of boundary condition for λ although an expanding plastic domain will lead to jumps in the λ field. The effect of boundary conditions on the behaviour of the model is demonstrated via numerical examples in Section 6.5. For the considered plasticity model and boundary conditions, others (De Borst and Mühlhaus, 1992; De Borst and Pamin, 1996; Djoko et al., 2007a) have shown, via numerical examples, a regularising effect in the presence of strain softening. However, the regularising effect is achieved by defining the plastic multiplier on the entire domain (elastic and plastic) and by imposing excessive regularity of the plastic multiplier across the elastic–plastic boundary (using a C1-conforming basis (De Borst and Mühlhaus, 1992) or introducing penalty terms (De Borst and Pamin, 6.2. A discontinuous Galerkin formulation for the plastic multiplier 133

1996)) or by allowing the plastic multiplier to spread into the elastic region (Djoko et al., 2007a). This is in contrast to the formulation pursued in this chapter which, as already mentioned, will use a discontinuous basis for the plastic multiplier and only consider the gradient term active in the plastic regions.

6.2 A discontinuous Galerkin formulation for the plastic multi- plier

Instead of seeking λ in the space H1 (Ω), as is implied by the variational formulation in (6.7), it follows from the argumentation in the previous section that a more appropriate function space for λ is: n o W := w L2 (Ω) : w P (T) T , (6.10) ∈ |T ∈ k ∀ ∈ Th where Pk (T) denotes the space of Lagrange polynomials of degree k on the element T of the standard triangulation of Ω, which is denoted by . As a consequence Th of the choice of function space W and the imposed regularity requirement in- side the plastic region Ωp, following from the variational formulation in (6.7), a discontinuous Galerkin formulation is needed. First, consider a function space for the displacement field u:

 h id  V := v H1 (Ω) : v P (T) T , (6.11) ∈ |T ∈ m ∀ ∈ Th with Lagrange polynomials of degree m. A weak formulation for the yield criterion corresponding to (6.5) on a single cell T Ωp can then be formulated as: find ∈ (u, λ) V W such that for all w W ∈ × ∈ Z    φ σ (u, λ) Hλ σy w dx T − − Z Z  G λ w dx + G τ (λ, λ) n w ds = 0, (6.12) − T ∇ · ∇ ∂T ∇ · where the numerical flux τ (λ, λ) is an approximation to λ on the boundary ∇ ∇ of T. Various discontinuous Galerkin methods, including the IP and lifting-type formulations presented in the preceding chapters, can be recovered by defining the numerical flux appropriately and adding over all cells in Ωp, see Arnold et al. (2002) for a detailed presentation. By setting w = 1 in (6.12) the equation reduces to: Z    Z φ σ (u, λ) Hλ σy dx + G τ (λ, λ) n ds = 0, (6.13) T − − ∂T ∇ · which demonstrates local conservation in terms of the numerical flux provided 134 Chapter 6. Strain gradient plasticity that the numerical flux is single valued on cell facets. As this is the case for all the variants of the numerical flux presented in Arnold et al.(2002) the yield criterion is satisfied locally on each cell for both the IP and lifting-type formulations. Taking guidance from Section 4.2.1, the variational form for (6.6) using an interior penalty formulation can be defined as:

 Z    Z a f (u, λ) ; w := φ σ (u, λ) Hλ σy w dx G λ w dx IP Ωp − − − Ωp ∇ · ∇ Z  Z α + G p λ w + λ w ds G p λ w ds, (6.14) Γ0 h∇ i · J K J K · h∇ i − Γ0 he J K · J K where the last term ensures stability of the formulation in which he denotes the distance between the centroids of two neighbouring elements, α is the usual p stabilisation parameter and Γ0 denotes the set of interior facets inside the plastic region Ωp. Both λ and w are assumed to be functions in W while u V. This ∈ particular type of formulation was used by Djoko et al.(2007a,b) for a gradient plasticity model similar to the one described in the previous section. However, as demonstrated in the previous chapter, the IP formulation is not suitable when discontinuous piecewise constant elements are used. A lifting-type formulation is, therefore, developed which is similar in nature to the lifting-type formulation for the Poisson equation, which was discussed in Section 5.1, although some definitions are slightly different to take into account that (6.5) is only valid for regions undergoing plastic deformations. The jump of a function w W is defined ∈ as:  + + p w n + w−n− on Γ , w = 0 (6.15) J K wn on ∂Ω ∂Ωp, ∪ which is comparable to (5.2), page 116. The function space for the gradient of functions in W is again denoted by Q and is defined in (5.3). The definition of the average of a function q Q is slightly different from that in (5.4) namely: ∈  1 +  p  q + q− on Γ , q = 2 0 (6.16) h i q on ∂Ω ∂Ωp. ∪ Also the definition of the lifting operator and the lifting function (equations (5.5) and (5.6) respectively) are slightly different. The operator r : W Q is defined S → for a given w W, find r (w) Q such that: ∈ S ∈ Z Z rS (w) q dx = w q ds q Q, (6.17) E · − SJ K · h i ∀ ∈ p where E = T+ T , as seen in Figure 4.1, for S Γ ; E is the element ∪ − ∈ 0 ∈ Th 6.3. Linearisation of the governing equations 135

p associated with the facet S for S ∂Ω; and for S Γ , E is the element inside Ωp ∈ ∈ D which is associated with the facet S. The lifting function is then defined as:

R (w) = ∑ rS (w) , (6.18) p p S Γ Γ ∈ 0 ∪ D which is very similar to the lifting function in (5.6). Note that due to the definitions of E and S in (6.17), the function is not defined in neither the elastic region of Ω p nor on ΓN and it will, therefore, be defined to be zero in both of these cases. The lifting-type formulation corresponding to the variational form in (6.14) for the yield criterion then reads:

 Z    a f (u, λ) ; w := φ σ (u, λ) Hλ σy w dx Ωp − − Z   Z G λ + R (λ) w + R (w) dx ∑ αG rS (λ) rS (w) dx, − Ωp ∇ · ∇ − p p Ωp · S Γ Γ ∈ 0 ∪ D (6.19) where α is a stabilisation parameter. Note the close resemblance to the formulation for the Poisson equation in (5.7).

6.3 Linearisation of the governing equations

The steady state balance of momentum equation from (2.12), page 32, at the end of a loading step reads: Z Z Z σ (un+1, λn+1) : v dx hn+1 v ds bn+1 v dx = 0, (6.20) Ω ∇ − ΓN · − Ω · where v V is a weight function with ∈  h id  V := v H1 (Ω) : v P (T) T , v = 0 on Γ , (6.21) ∈ |T ∈ m ∀ ∈ Th D where Lagrange polynomials of degree m are used and homogeneous Dirichlet boundary conditions are assumed for the displacement field u. The yield criterion must also be satisfied at the end of the loading step  a f (un+1, λn+1) ; w = 0. (6.22)

Together these equations form a coupled system of equations that are nonlinear in general. Newton’s method is, therefore, employed to obtain a solution by linearising about a state defined at Newton iteration k. 136 Chapter 6. Strain gradient plasticity

At the end of a loading step the stress tensor can be computed from (2.21) and (2.22), page 34, by:

 p  σ = : ε ε . (6.23) n+1 C n+1 − n+1 The increment in plastic strain is computed from (2.26):

p p ∂ f (σ ) ε ε = ∆λ n+1 = ∆λN (σ ) , (6.24) n+1 − n ∂σ n+1 where the increment of the plastic multiplier ∆λ = λ λ . The Newton n+1 − n increment of the stress tensor is determined by inserting (6.24) into (6.23) and linearising such that at iteration k:

 ∂N  dσ = : sdu N dλ ∆λ k dσ , (6.25) C ∇ − k − k ∂σ which after rearranging terms yields:  dσ = C : sdu N dλ , (6.26) tan ∇ − k with   1 1 ∂Nk − C = − + ∆λ . (6.27) tan C k ∂σ Here, ∆λ = λ λ denotes the total increment in the plastic multiplier measured k k − n from the previously converged state at load step n.

In a similar fashion the increment of the yield function can be found by linearis- ing (6.1) such that: d f = N dσ Hdλ + G 2dλ, (6.28) k − ∇ which after inserting (6.26) results in the following expression for the increment of the yield function:  d f = N : C : sdu N dλ Hdλ + G 2dλ. (6.29) k tan ∇ − k − ∇

Using these increments, the linearised coupled variational formulation for the equations (6.20) and (6.22) then reads: find (du, dλ) V W such that ∈ ×  a (du, dλ) ; (v, w) = L (v, w) (v, w) V W, (6.30) ∀ ∈ × 6.4. Implementation 137 where Z Z  s a (du, dλ) ; (v, w) = Ctan : du : v dx dλHw dx Ω ∇ ∇ − Ω Z Z s dλCtan : Nk : v dx + Nk : Ctan : du w dx − Ωp ∇ Ωp ∇ Z Z   dλNk : Ctan : Nkw dx G dλ + R (dλ) w + R (w) dx − Ωp − Ωp ∇ · ∇ Z ∑ αG rS (dλ) rS (w) dx (6.31) − p p Ωp · S Γ Γ ∈ 0 ∪ D and Z Z Z Z L (v, w) = σ : v dx b v dx h v ds + f w dx. (6.32) k p k Ω ∇ − Ω · − ΓN · Ω In the linear form, the homogeneous Dirichlet and Neumann boundary conditions for λ, see (6.8) and (6.9), have been adopted. An important thing to note is that R the term Ω dλHw dx in (6.31) is effective in the entire domain although, strictly speaking, it should only be effective in regions undergoing plastic deformation. This is necessary in order to avoid a singular global system when solving the R equations. However, it does not affect the solution because the term Ωp fkw dx in (6.32) is only nonzero in regions undergoing plastic deformation. The variational problem in (6.30) is solved for each Newton iteration followed by the corrections u u du and λ λ dλ after which σ and f can be k ← k − k ← k − k k updated before proceeding with the next iteration k k + 1. Note that although ← Nk, σk and Ctan are computed at integration points they are assembled into a global system of equations for computing dλ. The classical local return mapping scheme is thus effectively substituted by a global Newton scheme. Implementation details of a solver for these linearised equations is discussed in the following section.

6.4 Implementation

Implementing a solver for the coupled nonlinear equations of the gradient plasticity problem involves advancing the solution from the pseudo time tn to the time tn+1 where the state defined at tn is known. This is achieved by a series of iterations using a predictor–corrector algorithm outlined in Algorithms4 and5 which is implemented in the C++ class GradPlasProblem in the FEniCS Solid Mechanics library. The algorithm is inspired by the work of Djoko et al.(2007b) although there are a few notable differences. Firstly, the evolving plastic region is determined based on the cell average value of the yield criterion instead of the value at integration points. This means that an element is either elastic or plastic and that the elastic–plastic boundary is located on element facets and not inside elements. 138 Chapter 6. Strain gradient plasticity

This is a necessary requirement for the discontinuous formulation developed in the previous section. Secondly, the value of G 2λ is not computed at integration ∇ points for evaluating the yield criterion. Instead the yield criterion is evaluated by solving a variational formulation. Thirdly, and most importantly, the value of the yield function in an elastic element is independent of λ values in adjacent plastic elements as the gradient terms are only active inside the plastic region. This property prevents the artificial spread of λ from the plastic region into the elastic region and a resulting spurious regularisation. The solution procedure is outlined below.

6.4.1 The predictor step

Algorithm 4 Predictor step of the predictor–corrector algorithm for the coupled variational problem in (6.30) at iteration k and time step n + 1. 1: Solve system (6.30) at configuration k 1 to get u and λ . − k k 2: for T do ∈ Th 3: Compute ∆λk = λk λn. − avg 4: Compute cell average ∆λk . avg < 5: if ∆λk 0 then 6: Force all integration points on T to be elastic during entire load step n n + 1. → 7: Use the elastic tangent, C = and set λ = λ . tan C k n 8: end if  p 9: Compute trial stress σ = : ε ε . tr C k − n ∂ f (σtr) 10: Update N to trial state Ntr = ∂σ . 11: end for p 12: Solve problem (6.33) using Ωk 1 to get ftr = f (σtr, λk). −

Algorithm4 shows the computations for the predictor step. The force b and boundary condition h is updated to the state n + 1 and the global system in (6.30) is assembled and solved to get the Newton increments du and dλ which are used to update the values of u and λ at time n + 1 and iteration number k such that in general un+1,k un+1,k 1 + du and λn+1,k λn+1,k 1 + dλ. For the first iteration ← − ← − u u + du and λ λ + du. In the following, and in Algorithms4 n+1,k ← n n+1,k ← n and5, the subscripts n + 1 are omitted. The total increment in the plastic multiplier for the entire load step is computed at every integration point, line 3. If the cell average of this increment is negative, the cell is marked as elastic during the entire load step n n + 1 to avoid the → unstable situation where elements are switching back and forth between the elastic and plastic state, lines 5–8. Note that it is only the total increment of λ which 6.4. Implementation 139 is not allowed to be negative, thus dλ for an element can be negative during iterations. This situation will often occur as, due to the gradient effect, the λ field is redistributed while the plastic domain is expanding. A trial stress is then computed locally for all integration points based on uk and the plastic strain from the previous converged load step n, line 9. To determine which elements are yielding, the yield criterion (6.1) must be evaluated. The value of the yield function also enters the linear form in (6.32) and it must, therefore, be consistent with the linearisation in the previous section. This means that the boundary conditions for λ should be identical to those enforced in the bilinear form (6.31) and that the gradient terms are only active in the plastic region. A variational formulation to compute the value of the yield function at some known state k, can be defined on the form: find f W such that k ∈  a f , w = L (w) w W, (6.33) k ∀ ∈ where  Z a fk, w := fkw dx (6.34) Ω and Z   L (w) := φ (σk) Hλk σy w dx Ω − − Z   G λk + R (λk) w + R (w) dx − Ωp ∇ · ∇ Z ∑ αG rS (λk) rS (w) dx, (6.35) − p p Ωp · S Γ Γ ∈ 0 ∪ D The yield criterion is evaluated (line 12), using the trial stress, by solving the variational problem (6.33) under the assumption that the set of plastic elements p p remained constant during the last iteration, that is, using Ωk 1 (or Ωn in case k = 0). −

6.4.2 The corrector step The value of the yield function, based on the trial stresses and the old set of plastic cells, was computed in the predictor step. In the corrector step, Algorithm5, this value is used to determine the new set of plastic elements together with a corrected stress. In lines 2–5, the total increment of the plastic multiplier is again evaluated to test if the element should be forced to be elastic. If this is the case, the corrected stress is equal to the trial stress. Based on element averages of the yield function the new set of plastic cells can be determined, lines 6–8. This means that integration points inside an element can become part of the plastic domain although the value of the yield function is negative for that particular integration point. As already 140 Chapter 6. Strain gradient plasticity

Algorithm 5 Corrector step of the predictor–corrector algorithm for the coupled variational problem in (6.30) at iteration k and time step n + 1.

1: for T h do ∈ Tavg < 2: if ∆λk 0 for given cell then 3: Cell is already marked elastic, so use trial stress σk = σtr. 4: continue 5: end if avg 6: Compute cell average of the yield function ftr . avg 7: if ftr > 0 then p 8: Cell is marked as plastic, that is, T Ωk .  p∈  9: Correct trial stress σ = : ε ε ∆λ N . k C k − n − k tr ∂ f (σk) 10: Update N such that Nk = ∂σ and compute Ctan from (6.27). 11: else p 12: Mark current cell as elastic, that is, T / Ω . ∈ k 13: Use the elastic tangent, C = and set σ = σ and λ = λ . tan C k tr k n 14: end if 15: end for p 16: Solve problem (6.33) using Ωk to get fk = f (σk, λk) for the linear form in (6.32). 17: Assemble system (6.30) at state k. 18: if Global convergence then 19: Advance state ( ) ( ) . · n+1 → · n 20: else 21: Return to line 1 in the predictor step, Algorithm4 and increment iteration number k k + 1. ← 22: end if 6.5. Numerical examples 141 discussed, this is necessary because the discontinuous Galerkin terms in (6.35) and (6.31) are only defined on element boundaries. After updating the plastic domain and correcting the trial stress for the relevant integration points (line 9), the yield criterion (6.33) is evaluated again (line 16) such that corrected values enters the linear form of the coupled problem in (6.32) as fk. Notice that values of fk might be negative to allow for negative dλ in the next iteration thereby permitting a redistribution of the λ field within the load step. Finally, the global system in (6.30) is assembled and checked for convergence, line 17-18. If convergence is achieved, the system is advanced to the next load step (line 19), otherwise return to the predictor step and continue iterations (line 21).

6.4.3 Implementing the variational forms The bilinear and linear forms of the two variational problems (6.30) and (6.33), which are solved during the iterations in the algorithm above, can be implemented in the FEniCS framework by utilising the functionality outlined in the preceding chapters. To accomplish this, integrals of the variational forms which only contain conventional terms are implemented in a standard UFL file while integrals involving the lifting function R and the lifting operator r are handled separately. For the coupled problem (6.30), the linear form (6.32) and the terms from the bilinear form (6.31) which are independent of R and r can be implemented in UFL in a straightforward fashion as shown in Figure 6.1. As was the case for conventional plasticity in Section 2.4.2, the stress and the linearised tangent (and also the gradient of the yield function N) are supplied as coefficients to the form using quadrature elements. In the code, subdomain 0 refers to the elastic region while subdomain 1 refers to the plastic region. The remaining terms from (6.31) which involves R and r, which in essence are identical to the bilinear form for the Poisson equation in (5.7), can be implemented as outlined in Section 5.2 using the LiftingAssembler class. The variational problem (6.33) is implemented using a similar approach.

6.5 Numerical examples

In this section, the gradient plasticity model is applied to different example prob- lems to demonstrate the influence of boundary conditions on the abilities of the model when considering softening and hardening problems. First, a simple soften- ing problem is considered with negligible plastic strain gradients inside the plastic domain to demonstrate that the model does not guarantee regularisation. Then, another strain softening problem is considered, in which plastic strain gradients are present inside the plastic domain, to demonstrate that some degree of regularisation can be achieved for the microfree boundary condition. Finally, a hardening problem is considered to demonstrate that the model is only capable of modelling size effects when the microhard boundary condition is considered. 142 Chapter 6. Strain gradient plasticity

UFL code V= VectorElement("Lagrange", tetrahedron, 1) W= FiniteElement("DG", tetrahedron, 0) EPS= VectorElement("Quadrature", tetrahedron, 1, 6) TAN= VectorElement("Quadrature", tetrahedron, 1, 36) element=V * W

(v, w)= TestFunctions(element) (du, dl)= TrialFunctions(element)

N0= Coefficient(EPS) sig0= Coefficient(EPS) f0= Coefficient(W) t= Coefficient(TAN) H= 2000.0 G= 800.0

def tangent(t): return as_matrix([[t[i*6+j] for j in range(6)] for i in range(6)])

def epsilon(U): return as_vector([U[i].dx(i) for i in range(3)]\ +[U[i].dx(j)+U[j].dx(i) for i, j in [(0, 1), (0, 2), (1, 2)]])

def sigma(s): return as_matrix([[s[0], s[3], s[4]], [s[3], s[1], s[5]], [s[4], s[5], s[2]]])

a= inner(dot(tangent(t), epsilon(du)), epsilon(v)) *dx(0)- inner(dl *H, w)*dx(0) \ + inner(dot(tangent(t), epsilon(du)), epsilon(v))*dx(1)- inner(dl *H, w)*dx(1) \ - dl*inner(dot(tangent(t), N0), epsilon(v))*dx(1) \ + inner(dot(N0,tangent(t)), epsilon(du))*w*dx(1) \ - dl*w*inner(dot(N0, tangent(t)), N0)*dx(1) \ -G *inner(grad(dl), grad(w))*dx(1)

L= inner(sigma(sig0), grad(v)) *dx(0) \ + inner(sigma(sig0), grad(v))*dx(1)+ f0 *w*dx(1)

Figure 6.1: UFL input for the conventional parts of the variational problem in (6.30) in three dimensions. In the specific case, continuous, piecewise linear elements are used for the displacements while discontinuous piecewise constant elements are used for the plastic multiplier. 6.5. Numerical examples 143

The lifting-type formulation of the model will be considered and the solver is implemented in the FEniCS framework as outlined in the previous section. Two combinations of finite element discretisations for the displacement and plastic multiplier fields will be considered. The first case considers a continuous, piecewise linear displacement field and a discontinuous, piecewise constant field for the plastic multiplier λ and will be referred to as the P1/P0 case. The second case considers a continuous, piecewise quadratic displacement field and a discontinuous, piecewise linear field for the plastic multiplier λ and will be referred to as the P2/P1 case. For the latter case, it is chosen to use discontinuous, piecewise linear polynomials for the gradient space Q although constant elements can be used according to the definition in (5.3). Using equal order elements for the plastic multiplier and the gradient space improves convergence of the Newton solver when large gradients are present. Similar observations have been reported by Bassi and Rebay(1997). Based on the conclusions from the previous chapter regarding lifting-type formulations for fields involving discontinuous constants, the value of 3 the stabilisation parameter will be set to α = 10− for all examples to avoid that the stabilisation term governs the solution. Two types of boundary conditions for λ, see (6.8) and (6.9), are considered for all examples. The first type will be referred to as the microhard boundary condition p where λ = 0 on Γ = ∂Ωp ∂Ω, that is, the facets on the elastic–plastic boundary D \ which are not located on the exterior of the domain. For the microhard boundary condition, λ n = 0 is imposed on the remainder of facets on the plastic boundary ∇p · such that Γ = ∂Ωp ∂Ω. The second type will be referred to as the microfree N ∩ boundary condition where λ n = 0 on ∂Ωp. ∇ ·

6.5.1 Unit square loaded in shear with strain softening This example considers a unit square under shear loading with strain softening and negligible plastic strain gradients in the plastic domain. The purpose is to demonstrate that the model does not provide a mechanism that guarantees regularisation in the softening regime. The domain, Ω = [0, 1] [0, 1], is divided × into 5 5 cells and each cell is divided into two 2 triangles, see Figure 6.3. The × left-hand and right-hand side of the domain is fixed in the horizontal direction and the bottom is fixed in vertical direction for x 6 0.4. A sequence of downward displacements are prescribed at the top of the domain for x > 0.6. An elastic load step of ∆u = 0.0006mm is followed by 14 plastic load steps of ∆u = 0.0001mm such that the total downward displacement after the final load step is u = 0.002mm. The test is performed under plane strain conditions using the material parameters shown in Table 6.1. The material is weakened in the center of the domain such that σ = 150 MPa in [0.2, 0.8] [0.2, 0.8] and for 0.4 x 0.6. For this particular y × 6 6 example, only P1/P0 elements are considered. First, the microfree boundary condition, λ n = 0, is applied. The net force ∇ · 144 Chapter 6. Strain gradient plasticity

Parameter Value [unit] Young’s modulus, E 200.0E3 [MPa] Poisson’s ratio, ν 0.3 Yield strength, σy 200.0 [MPa] Hardening modulus, H -25.0E3 [MPa]

Table 6.1: Material parameters for the localisation example.

120

100

80

60 Net force [N] 40

G = 0 20 G = 1E2 G = 1E4 G = 1E6 0 0 0.001 0.002 Displacement [mm]

Figure 6.2: Load-displacement curve for different values of G using the microfree boundary condition for λ on a unit square loaded in shear. acting at the top of the domain as a function of the displacement is shown in Figure 6.2 for different values of the gradient parameter G. As expected, the results are independent of the value of G, even for very large values, as no significant plastic strain gradients are present inside the plastic region. The distribution of the λ field after the last load step can be seen in Figure 6.3. Clearly, the distribution of the λ field is independent of the value of the gradient parameter and the plastic zone is localised in the middle column of elements. The microhard boundary condition, λ = 0, is now applied. The net force acting at the top of the domain as a function of the displacement is shown in Figure 6.4 for different values of the gradient parameter G. As expected, the results for this type of boundary condition are very sensitive to the value of G. The reason is that the microhard boundary condition on the elastic–plastic boundary introduces a gradient effect. Also note that the values for G are substantially lower than those used for the microfree case. For the case where G = 50MPa the specimen 6.5. Numerical examples 145

(a) G = 0MPa. (b) G = 100MPa. (c) G = 1E6MPa.

Figure 6.3: Localisation of λ in the middle column of elements for different values of the gradient parameter G after the last load step using the microfree boundary condition.

120

100

80

60 Net force [N] 40

G = 0 20 G = 50 G = 250 G = 1000 0 0 0.001 0.002 Displacement [mm]

Figure 6.4: Load-displacement curve for different values of G using the microhard boundary condition for λ on a unit square loaded in shear. 146 Chapter 6. Strain gradient plasticity

(a) G = 50MPa. (b) G = 250MPa. (c) G = 1000MPa.

Figure 6.5: Distribution of λ for different values of the gradient parameter G after the last load step using the microhard boundary condition. still exhibits softening, but the softening is less if compared to the case where G = 0MPa. As the value of G is increased, the softening becomes less pronounced and for G = 250MPa the load-displacement curve is almost perfectly plastic. In other words, the gradient effect counterbalances the influence of material softening governed by the hardening parameter H. For G = 1000MPa, the specimen exhibits a hardening behaviour as the gradient effect becomes dominant compared to the softening term in the yield function. The distribution of the λ field after the last load step for the microhard boundary condition can be seen in Figure 6.5. Note, that for the two cases where G = 50MPa and G = 250MPa, Figures 6.5a and 6.5b, the plastic zone is still localised in the middle column of elements. However, the values of λ are different compared to the classical plasticity case, Figure 6.3a, in that higher values of G correspond to lower values of λ because the microhard boundary condition drives the values towards zero. When the gradient parameter is large enough to make the specimen enter the hardening regime, the plastic zone expands to the adjacent elements as shown in Figure 6.5c. The expanding plastic zone also accounts for the jumps observed in the load-displacement curve. As demonstrated in this example, the model is incapable of providing regulari- sation under the given conditions when using the microfree boundary condition. Switching to the microhard boundary condition makes the softening less pro- nounced for lower values of the gradient parameter although it does not lead to an expansion of the plastic zone. To make the plastic zone expand, a high value of the gradient parameter is needed which effectively changes the load-displacement response from softening to hardening.

6.5.2 Plate under compressive loading with strain softening This example considers shear band formation in a plate subjected to compressive loading which means that gradients of the plastic multiplier will be present inside the plastic region. Therefore, in contrast to the previous example, the results will 6.5. Numerical examples 147

(a) Mesh 1 consisting of 690 (b) Mesh 2 consisting of 1566 (c) Mesh 3 consisting of 6370 triangles. triangles. triangles.

Figure 6.6: Three unstructured meshes of the plate subjected to compressive loading. depend on the value of the gradient parameter also when using the microfree boundary condition. As a consequence, regularisation of the softening problem can be expected to some extent. Three different unstructured meshes, shown in Figure 6.6, will be considered to demonstrate the influence of the mesh size in this softening problem. The width of the plate is 10mm while the height is 15mm and the imperfection in the lower left corner has an extension of 1mm. The left- hand side of the plate is fixed in the horizontal direction and the bottom is fixed in vertical direction. The test is performed under plane strain conditions using material parameters identical to the ones shown in Table 6.1 with the exception that H = 4000MPa and the yield stress is uniform in the entire domain. −

P1/P0 elements with G = 0 A sequence of downward displacements are prescribed at the top of the plate. An elastic load step of ∆u = 0.005mm is followed by twelve plastic load steps of ∆u = 0.0025mm such that the total downward displacement after the final load step is u = 0.035mm. First, the gradient parameter G is set to zero to verify that the classical theory is mesh dependent in the current implementation. The net force acting at the top of the plate as a function of the displacement is shown in Figure 6.7 for the three different meshes. Clearly, the result is mesh dependent in that less energy is dissipated as the mesh is refined. The Newton solver failed to converge for the last load steps in the case of mesh 3. The distribution of the λ field after the last converged load step can be seen in Figure 6.8. As the mesh is refined the plastic zone localises in a shear band of 148 Chapter 6. Strain gradient plasticity

2500

2000

1500

1000 Net force [N]

500 Mesh 1 Mesh 2 Mesh 3 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.7: Mesh dependent softening with G = 0MPa (classical plasticity) for the P1/P0 case. decreasing width as shown in the figure. Apart from the convergence problems, the overall behaviour is as anticipated. In general, the approach is very sensitive to the choice of model parameters for softening problems. In particular, the step size, the mesh size, and values of H and G affects the stability of the problem.

P1/P0 elements and microfree boundary condition for λ It is now demonstrated that the results for the microfree boundary condition are influenced by a nonzero gradient parameter. For this particular example, the gradient parameter G = 200MPa otherwise the test setup remains identical to the previous example. The resulting load-displacement curve is shown in Figure 6.9. As seen in the figure, the nonzero gradient parameter has an influence on the results as plastic gradients are present inside the plastic domain. (Compare to Figure 6.2 where the value of the gradient parameter does not have any influence on the load-displacement curve.) Although the gradient parameter does have an influence on the results these are clearly not mesh independent. 2 Figure 6.10 shows the distribution of λ for the three different meshes after the final load step. The width of the plastic zone is still mesh dependent such that a finer mesh results in a more narrow plastic zone compared to a coarser mesh for identical values of the gradient parameter. However, the width of the plastic zone is less dependent on the cell size compared to the results in Figure 6.8. The reason

2For other values of the gradient parameter a similar story holds; the finer mesh always exhibits more softening than the coarser mesh for a given value of the gradient parameter. 6.5. Numerical examples 149

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.8: Localisation of λ with G = 0MPa for the three different mesh cases after the last converged load step using P1/P0 elements.

2500

2000

1500

1000 Net force [N]

500 Mesh 1 Mesh 2 Mesh 3 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.9: Load-displacement curve with G = 200MPa using P1/P0 elements and the microfree boundary condition for λ. 150 Chapter 6. Strain gradient plasticity

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.10: Distribution of λ with G = 200MPa for the three different mesh cases after the final load step using P1/P0 elements and the microfree boundary condition for λ. is that the microfree boundary condition only has a regularising effect while the plastic zone is developing. Once the plastic zone is fully developed, there is no mechanism by which the plastic zone can expand as the ‘shape’ of the λ field does not change. Therefore, no additional gradient effects are introduced and plastic flow localises in the zone which is already plastic. The microfree boundary condition is, therefore, not able to produce mesh independent results for this softening problem even if some plastic strain gradients are present inside the plastic domain.

P1/P0 elements and microhard boundary condition for λ The microhard boundary condition is now enforced on the elastic–plastic boundary. The resulting load-displacement curve is shown in Figure 6.11. As was the case for the simple shear example, see Figure 6.4, the results are influenced by the nonzero gradient parameter. Note that the results in the first part of the load-displacement curve, before the jumps, appear to be more mesh independent than the results for the microfree boundary condition. For mesh 2, the jump in load bearing capacity at load step 11 is due to the expanding plastic zone as explained in Section 6.5.1, see also Figures 6.4 and 6.5. This is illustrated in Figure 6.12 which shows the distribution of λ on mesh 2 at load steps 8-13. In load step 8, the plastic zone is fully developed and in step 9-10 plastic flow increases inside this zone. However, as λ = 0 is enforced on the boundary, the gradients also increase. This introduces a hardening effect and as a result the plastic zone expands in load step 11. Then, in load step 12-13, plastic flow simply increase inside the, now larger, plastic zone. For mesh 3, the solution becomes unstable. This is an effect of the spreading of the plastic zone described above and the predictor–corrector algorithm. If the 6.5. Numerical examples 151

2500

2000

1500

1000 Net force [N]

500 Mesh 1 Mesh 2 Mesh 3 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.11: Load-displacement curve with G = 200MPa using P1/P0 elements and the microhard boundary condition for λ.

avg plastic zone expands ‘too much’ during an iteration, ∆λk for a given cell T can become negative due to the diffusive nature of the yield function. For stability reasons, in line 5 of Algorithm4, the cell is marked as elastic during the entire load step should this event occur. However, this introduces an artificial elastic–plastic boundary in the otherwise plastic domain, which again, due to the boundary conditions for λ, introduces additional hardening. This is illustrated in Figure 6.13 which shows the distribution of λ on mesh 3 at load steps 6-11. In load step 6-8, the distribution of λ is developing as expected for the given problem. Then, in load step 9-11 it is seen how the artificial elastic–plastic boundaries develop as loading progresses which causes ‘plastic islands’ to emerge in the computational domain. This effect naturally has an impact on the load-displacement curve as already shown in Figure 6.11. For fine meshes, it is difficult to avoid this situation when using the microhard boundary condition for λ. However, while the plastic zone is expanding smoothly, convergence is usually better compared to the microfree case. Reducing the loading step size does not improve the stability of the algorithm avg because even a small expansion of the plastic zone can result in a negative ∆λk for a cell well inside the plastic region.

P2/P1 elements The influence of using higher order elements for the softening problem is now investigated. Using higher order elements makes the algorithms even more sensitive to the choice of model parameters. The hardening modulus is set to H = 2000MPa − 152 Chapter 6. Strain gradient plasticity

(a) Load step 8. (b) Load step 9. (c) Load step 10.

(d) Load step 11. (e) Load step 12. (f) Load step 13.

Figure 6.12: Development in the distribution of λ at different load steps for G = 200MPa on mesh 2 using P1/P0 elements and the microhard boundary condition for λ. 6.5. Numerical examples 153

(a) Load step 6. (b) Load step 7. (c) Load step 8.

(d) Load step 9. (e) Load step 10. (f) Load step 11.

Figure 6.13: Development in the distribution of λ at different load steps for G = 200MPa on mesh 3 using P1/P0 elements and the microhard boundary condition for λ. 154 Chapter 6. Strain gradient plasticity

2500

2000

1500

1000 Net force [N]

500 Mesh 1 Mesh 2 Mesh 3 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.14: Mesh dependent softening with G = 0MPa (classical plasticity) for the P2/P1 case. and the size of the plastic load steps is reduced. An elastic load step of ∆u = 0.005mm is followed by sixty plastic load steps of ∆u = 0.0005mm such that the total downward displacement after the final load step is, still, u = 0.035mm. Again, the gradient parameter G is set to zero to verify that the classical theory is mesh dependent in the current implementation. The resulting load-displacement curve for the three meshes is shown in Fig- ure 6.14. For this test set up, the Newton solver failed to converge for the last few load steps in the case of mesh 2 and for mesh 3 convergence was only achieved for a couple of load steps after the plastic zone was fully developed. The distribution of the λ field after the last converged load step can be seen in Figure 6.15. It is clear that the higher order elements allow the plastic zone to localise in a zone which is only a couple of elements wide. (Compare to Figure 6.8 for the P1/P0 case.) The microfree boundary condition is now applied for the λ field with G = 200MPa and the resulting load-displacement curve can be seen in Figure 6.16. Compared to the P1/P0 case, the results are now almost mesh independent. How- ever, the convergence rate of the Newton solver was poor and for mesh 3 it failed to converge after a few plastic load steps. Figure 6.17 shows the distribution of λ for the three different meshes after the last converged load step. The width of the plastic zone is almost identical for the three meshes and much less dependent on the cell size compared to the results in Figure 6.15. Note that the softening for G = 200MPa in Figure 6.16 is much less pronounced compared to Figure 6.14 for 6.5. Numerical examples 155

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.15: Localisation of λ with G = 0MPa for the three different mesh cases after the last converged load step using P2/P1 elements.

2500

2000

1500

1000 Net force [N]

500 Mesh 1 Mesh 2 Mesh 3 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.16: Load-displacement curve with G = 200MPa using P2/P1 elements and the microfree boundary condition for λ. 156 Chapter 6. Strain gradient plasticity

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.17: Distribution of λ with G = 200MPa for the three different mesh cases after the final load step using P2/P1 elements and the microfree boundary for λ.

G = 0MPa. However, for smaller values of G the resulting load-displacement curve becomes mesh dependent, similar to that shown in Figure 6.9 for the P1/P0 case and the Newton solver is still fails to converge for mesh 3. Finally, the influence of the microhard boundary condition for λ is demonstrated for the P2/P1 case. As seen from Figure 6.18, the results are highly unstable for the same reasons as in the P1/P0 case and no softening is observed. As demonstrated in the previous examples the model is not suitable for pro- viding regularisation of a softening problem regardless of which type of boundary condition is being applied for λ. If the microfree boundary condition is used results are independent of the gradient parameter if no gradients of the plastic strain are present inside the plastic domain. Even if gradients are present inside the plastic domain the results are not necessarily mesh independent for a given value of the gradient parameter. Although the microhard boundary condition for λ does provide a mechanism by which the plastic zone can expand it is also unsuitable for providing regularisation of softening problems. The reason is that the boundary condition will result in a hardening effect for sufficiently large values of the gradient parameter which is needed for the plastic zone to expand. The expanding plastic zone, on the other hand, makes the solution algorithm highly unstable and for both types of boundary conditions convergence problems often become an issue as the approach is very sensitive to the choice of model parameters.

6.5.3 Plate under compressive loading with strain hardening The softening problem is now changed to a hardening problem by setting the hard- ening modulus H = 2000MPa in order to investigate if the numerical difficulties from the softening problem disappear. Mesh 3 is investigated for different values of the gradient parameter G while all other parameters remain the same as for the 6.5. Numerical examples 157

2500

2000

1500

1000 Net force [N]

500 Mesh 1 Mesh 2 Mesh 3 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.18: Load-displacement curve with G = 200MPa using P2/P1 elements and the microhard boundary condition for λ.

P1/P0 case of the previous softening problem. The load-displacement curves for the P1/P0 case using microfree and microhard boundary conditions can be seen in Figures 6.19 and 6.20 respectively, where indeed the instability has disappeared for all values of G. Note that increasing the value of G seems to have a negligible effect on the results. This is because the microhard boundary condition λ = 0 on the elastic–plastic boundary (inside the computational domain) only results in a gradient effect while the plastic zone is developing. As soon as the plastic zone is fully developed, the positive hardening modulus results in the (almost) entire domain becoming plastic. This activates the p λ n = 0 boundary condition on Γ = ∂Ωp ∂Ω. During continued loading, ∇ · N ∩ the value of λ will simply increase inside the plastic region without introducing additional gradients and the results thus become independent of G. Similar results can be seen in Figures 6.21 and 6.22 for the microfree and microhard boundary conditions respectively for the P2/P1 case. Again the results are almost independent of the values of G when microfree boundary conditions are used. When using microhard boundary conditions, some artificial hardening is introduced for G = 800MPa and G = 1600MPa. The reason is that there is a small elastic region close to the imperfection where the yielding initialise. During continued loading the plastic zone will try to expand into this region but due to the high concentration of plastic flow at the imperfection ∆λavg becomes negative which introduces artificial elastic–plastic boundaries as already discussed. Despite this numerical difficulty, the model and implementation appears to be working 158 Chapter 6. Strain gradient plasticity

3000

2500

2000

1500 Net force [N] 1000

500 G = 0 G = 800 G = 1600 G = 3200 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.19: Influence of different values of the gradient parameter G for mesh 3 and an isotropic linear hardening modulus H = 2000MPa using P1/P0 elements and the microfree boundary condition for λ.

3000

2500

2000

1500 Net force [N] 1000

500 G = 0 G = 800 G = 1600 G = 3200 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.20: Influence of different values of the gradient parameter G for mesh 3 and an isotropic linear hardening modulus H = 2000MPa using P1/P0 elements and the microhard boundary condition for λ. 6.5. Numerical examples 159

3000

2500

2000

1500 Net force [N] 1000

500 G = 0 G = 800 G = 1600 G = 3200 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.21: Influence of different values of the gradient parameter G for mesh 3 and an isotropic linear hardening modulus H = 2000MPa using P2/P1 elements and the microfree boundary condition for λ.

3000

2500

2000

1500 Net force [N] 1000

500 G = 0 G = 800 G = 1600 G = 3200 0 0 0.01 0.02 0.03 0.04 Displacement [mm]

Figure 6.22: Influence of different values of the gradient parameter G for mesh 3 and an isotropic linear hardening modulus H = 2000MPa using P2/P1 elements and the microhard boundary condition for λ. 160 Chapter 6. Strain gradient plasticity much better for a hardening problem and it is less sensitive to the choice of model parameters.

6.5.4 Micro-indentation As shown in the hardening problem in the previous section, the effect of the value of the gradient parameter on the load-displacement curves was negligible as the microfree boundary condition on the exterior boundary did not introduce addi- tional hardening. Therefore, a micro-indentation problem is now investigated to demonstrate that the model is capable of capturing size effects when the microhard boundary condition for λ is used on an elastic–plastic boundary located inside the computational domain. A three dimensional model problem is considered in which the specimen of interest has a width of 10mm and a height of 5mm. The specimen is constrained such that displacements in the normal direction on the four sides and at the bottom are prevented. The indenter is located at the center of the top part of the domain. It has a spherical tip with a radius of 1mm and is initially embedded within the specimen, to which it is rigidly attached, at a depth equal to the radius. The domain in this initial state is assumed stress free. A sequence of downward displacements measured from this initial state are prescribed on the indenter. An elastic load step of ∆u = 0.0008mm is followed by seven plastic load steps of ∆u = 0.0004mm such that the total downward displacement after the final load step is u = 0.0036mm. Rather than modelling the indenter explicitly, the prescribed displacements are imposed on the degrees of freedom located on the surface of the indenter. Due to the symmetry of the problem, only one quarter of the domain is modelled. A front and top view of the computational mesh used for this problem is shown in Figure 6.23. The mesh is refined in the region around the indenter tip. The material parameters for this example are identical to the ones shown in Table 6.1 with the exception that H = 2000MPa. The net force acting on the indenter tip as a function of the indentation depth for the P1/P0 case using microfree boundary conditions is shown in Figure 6.24. Increasing the value of G only has a small effect on the load–displacement curve. On the other hand, the load–displacement curve shown in Figure 6.25 for the microhard boundary condition show a much bigger dependence on the value of G. After load step number 6 the load bearing capacity for the cases G = 1600MPa and G = 3200MPa increases dramatically. Again, this can be attributed to the effect avg of forcing cells to be elastic in Algorithm4 if ∆λk is negative as explained in the previous section. Figures 6.26 and 6.27 show the load–displacement curves for the P2/P1 case using microfree and microhard boundary conditions respectively. In case of the microfree boundary condition, the effect of increasing G has completely vanished. 6.5. Numerical examples 161

(a) Front view. (b) Top view.

Figure 6.23: Finite element mesh for the micro-indentation example consisting of 10979 tetrahedra.

500

400

300

Net force [N] 200

100 G = 0 G = 800 G = 1600 G = 3200 0 0 0.001 0.002 0.003 0.004 Indenter displacement [mm]

Figure 6.24: The resulting force on the indenter as a function of the indentation depth for different values of the gradient parameter G using P1/P0 elements and the microfree boundary condition for λ. 162 Chapter 6. Strain gradient plasticity

500

400

300

Net force [N] 200

100 G = 0 G = 800 G = 1600 G = 3200 0 0 0.001 0.002 0.003 0.004 Indenter displacement [mm]

Figure 6.25: The resulting force on the indenter as a function of the indentation depth for different values of the gradient parameter G using P1/P0 elements and the microhard boundary condition for λ.

Again, the microhard boundary condition results in a load–displacement curve which is clearly influenced by the value of G. Also note that the sudden jumps in load bearing capacity from Figure 6.25 have disappeared. As demonstrated in these numerical experiments, one should use the microhard boundary condition to model size effects with the current gradient plasticity model. Figures 6.28 and 6.29 show the distribution of λ at the final load step for different values of the gradient parameter for the P2/P1 case using microfree and microhard boundary conditions respectively. For both the microfree and microhard cases the extent of the plastic region increases and the distribution of the λ field becomes smoother as the value of the gradient parameter is increased. The extent of the plastic region is comparable for the microfree and microhard cases, but note that the values of λ near the boundary are much smaller, as they should be, for the microhard case. The gradient of λ is thus larger which is reflected in the load–displacement curve.

6.5.5 Computational notes

As demonstrated in the previous four sections, the gradient plasticity model is not suitable for producing mesh independent results for softening problems but it is able to capture size effects for the micro-indentation test when considering the microhard boundary condition for λ. However, the numerical experiments also revealed that the computed solutions are sensitive to the effect of the evolving 6.5. Numerical examples 163

500

400

300

Net force [N] 200

100 G = 0 G = 800 G = 1600 G = 3200 0 0 0.001 0.002 0.003 0.004 Indenter displacement [mm]

Figure 6.26: The resulting force on the indenter as a function of the indentation depth for different values of the gradient parameter G using P2/P1 elements and the microfree boundary condition for λ.

500

400

300

Net force [N] 200

100 G = 0 G = 800 G = 1600 G = 3200 0 0 0.001 0.002 0.003 0.004 Indenter displacement [mm]

Figure 6.27: The resulting force on the indenter as a function of the indentation depth for different values of the gradient parameter G using P2/P1 elements and the microhard boundary condition for λ. 164 Chapter 6. Strain gradient plasticity

(a) G = 0MPa. (b) G = 800MPa.

(c) G = 1600MPa. (d) G = 3200MPa.

Figure 6.28: Close-ups of the region around the indenter tip which show the distribution of λ at the final load step for different values of G using P2/P1 elements and the microfree boundary condition for λ. 6.5. Numerical examples 165

(a) G = 0MPa. (b) G = 800MPa.

(c) G = 1600MPa. (d) G = 3200MPa.

Figure 6.29: Close-ups of the region around the indenter tip which show the distribution of λ at the final load step for different values of G using P2/P1 elements and the microhard boundary condition for λ. 166 Chapter 6. Strain gradient plasticity elastic–plastic boundary particularly when the microhard boundary condition for λ is used. The problem does not appear to originate from the gradient model, the for- mulation and linearisation or the implementation of the variational forms. This conclusion is based on the observation that the convergence of the Newton solver during a load step is quadratic provided that the plastic region does not expand. If indeed the plastic region expands during a load step, the Newton solver does not begin to converge until the plastic region becomes stable after which convergence is quadratic. The sensitivity of the solution with respect to the evolving elastic–plastic bound- ary can, therefore, be attributed to the numerical effect caused by line 5 in Algo- avg rithm4 where a cell T is forced to be elastic during a load step should ∆λk be negative for the given cell at a given iteration. This line is, however, needed to ensure convergence in situations where a cell switches back and forth between being elastic and plastic during iterations. Essentially, problems arise as the nonlinear equations are solved using an iterative procedure where in each iteration the computational domain might change. Future work in this regard involves investigating different approaches in order to alleviate this numerical problem. Firstly, a different way of stabilising the algorithm in which cells are not forced to be elastic might be considered. Secondly, a staggered approach to solving the coupled nonlinear equations and the evolving elastic–plastic boundary can be pursued. Thirdly, an adaptive mesh refinement scheme in front of the evolving elastic–plastic boundary can be implemented to allow the plastic region to expand smoothly. The last approach should probably be used in combination with adaptive mesh coarsening behind the evolving elastic– plastic boundary in order to reduce computational cost. 7 Conclusions and future developments

In this work, the automated modelling framework of FEniCS has been developed in a number of directions with the aim to facilitate rapid implementation and testing for a wider range of problems. The developed extensions are widely used by researchers and application developers in a number of different fields, see the introduction to Chapter3 and Section 4.2.5 for examples. The main contributions can be summarised as follows. Efficiency is an issue when large scale problems are solved using the finite element method. The development of the quadrature representation and its optimisations has, therefore, extended the applicability of the automated modelling concepts to more complex problems. Discontinuous Galerkin methods, and methods that use discontinuous Galerkin concepts, may be applied to problems other than strain gradient plasticity as demonstrated in this work. The extensions to FEniCS for discontinuous Galerkin methods developed in this work, therefore, also apply to these problems. Finally, the quadrature element, developed for correct linearisation of plasticity problems, can be used for other problems where functions do not come from a finite element space.

Conclusions

The main conclusions of this work relate to the representations and optimisations of finite element forms, the automation of discontinuous Galerkin methods and strain gradient plasticity. Numerical experiments have shown that the relative run-time performance of the quadrature representation and the tensor contraction representation can differ substantially depending on the nature of the considered variational form. In general, the tensor contraction approach deals well with forms which involve high-order bases and few coefficient functions, whereas the quadrature representation is more efficient as the number of coefficient functions (other than constant coefficients) and derivatives in a form increases. Hence, in general, the quadrature representation is significantly faster for more complicated forms. Furthermore, it has been shown, that quadrature optimisations can have a significant impact on the run-time performance. It is, therefore, desirable to select the most favorable representation and optimisation strategy based on an a priori 168 Chapter 7. Conclusions and future developments inspection of the variational form. However, the code with the lowest number of flops, at least for the quadrature representation, does not always perform best for a given form. In addition, the run-time performance even depends on which C++ compiler options are used. A strategy for selecting between representations and optimisations based only on an estimation of the number of flops does, therefore, not seem feasible. By developing extensions for supporting discontinuous Galerkin methods a range of discontinuous variational formulations can be implemented in a relatively straightforward fashion in the FEniCS framework. However, the new abstractions also permit other formulations that build on concepts from discontinuous Galerkin methods to be implemented by using the developed extensions as building blocks. This has been demonstrated in a semi-automated implementation of a lifting-type discontinuous Galerkin formulation for the Poisson equation. The lifting-type formulation has two main advantages in relation to this work compared to the interior penalty formulation. Firstly, it is stable for all positive values of the stabilisation parameter. Secondly, as numerical experiments indicate, one can use a constant basis for the Poisson equation, something which is not possible when using the interior penalty method. The Aifantis strain gradient plasticity model was implemented in the FEniCS framework using a continuous, piecewise linear displacement field and a discon- tinuous, piecewise constant field for the plastic multiplier. The latter was possible because a lifting-type discontinuous Galerkin formulation was used for the plas- tic multiplier. The implementation was also tested successfully for a continuous, piecewise quadratic displacement field and a discontinuous, piecewise linear field for the plastic multiplier. It was demonstrated that the model is not suitable for softening problems. Size effects, on the other hand, were observed for a hardening problem in the micro-indentation example, provided that the microhard boundary condition was employed for the plastic multiplier. Some numerical problems were, however, observed during load steps in which the plastic region did not expand smoothly. The observed problems originate from the algorithm which handles the update of state variables as it will force an element to be elastic during a load step if the average of the total increment of the plastic multiplier becomes negative in a given iteration. This issue should be resolved in order to produce reliable results.

Future developments

Using this work as a basis, the following future developments of the FEniCS framework could be of interest. As the user base of FEniCS grows, so does the desire of solving problems of increasing complexity. Therefore, continued investigations into further optimising the quadrature representation is desirable. The optimisations should focus on both run-time performance of the generated 169 code and compile-time performance. Two areas of particular importance in terms of compile-time performance are the size of the generated code and the speed of the code generation stage. Related to these developments, is the automatic selection of representation and/or optimisation strategy. The advantages of the lifting-type formulation mentioned above come at a price. The three main drawbacks of the lifting-type formulation are that the formulation is more complex, the local assembly is more expensive to perform and the global tensor arising from assembling the variational form becomes less sparse. The latter drawback is difficult to remedy but the first drawback can be alleviated by adding fully automated support for lifting-type discontinuous Galerkin formulations. As it is not entirely clear how this should be implemented, a first step is to address the second drawback by improving the algorithm that evaluates the lifting function in the current semi-automated approach which will make the local assembly less expensive. In order to improve the current implementation of the Aifantis model the fol- lowing approaches may be attempted as outlined in the previous chapter. Firstly, a different way of stabilising the algorithm in which cells are not forced to be elastic might be considered. Secondly, a staggered approach to solving the coupled nonlinear equations and the evolving elastic–plastic boundary can be pursued. Thirdly, an adaptive mesh refinement scheme in front of the evolving elastic–plastic boundary can be implemented to allow the plastic region to expand smoothly. The last approach should probably be used in combination with adaptive mesh coarsen- ing behind the evolving elastic–plastic boundary in order to reduce computational cost. Finally, as the overall aim of this work was to promote rapid prototyping and testing of complex problems, while maintaining high performance, it seems natural to implement other gradient models using the FEniCS framework. To facilitate this, continued development of the FEniCS Solid Mechanics library to improve the interface is important. For solid mechanics problems in general, continued development on support for isoparametric elements and shell problems within the FEniCS framework is also important.

References

Aifantis, E. C. (1984). On the microstructural origin of certain inelastic models. Journal of Engineering Materials and Technology, 106(4):326–330.

Aifantis, E. C. (1987). The physics of plastic deformation. International Journal of Plasticity, 3:211–247.

Alfred, V., Sethi, R., and Jeffrey, D. U. (1986). Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading, Massachusetts.

Allen, G., Benger, W., Goodale, T., Hege, H.-C., Lanfermann, G., Merzky, A., Radke, T., Seidel, E., and Shalf, J. (2000). The Cactus code: a problem solving environment for the grid. In High-Performance Distributed Computing, 2000. Proceedings. The Ninth International Symposium on, pages 253–260.

Alnæs, M. S. (2012). UFL: a finite element form language. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the , volume 84 of Lecture Notes in and Engineering, chapter 17. Springer.

Alnæs, M. S., Logg, A., and Mardal, K.-A. (2012). UFC: a finite element code generation interface. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 16. Springer.

Alnæs, M. S., Logg, A., Mardal, K.-A., Skavhaug, O., and Langtangen, H. P. (2009). Unified framework for finite element assembly. International Journal of Computational Science and Engineering, 4(4):231–244.

Alnæs, M. S., Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N. (2013). Unified Form Language: A domain-specific language for weak formulations of partial differential equations. ACM Transactions on Mathematical Software, To appear. http://arxiv.org/abs/1211.4047. 172 References

Alnæs, M. S. and Mardal, K.-A. (2010). On the efficiency of symbolic computations combined with code generation for finite element methods. ACM Transactions on Mathematical Software, 37(1).

Alnæs, M. S. and Mardal, K.-A. (2012). SyFi and SFC: Symbolic finite elements and form compilation. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 15. Springer.

Arnold, D. N., Brezzi, F., Cockburn, B., and Marini, L. D. (2002). Unified analysis for discontinuous Galerkin methods for elliptic problems. SIAM Journal on Numerical Analysis, 39(5):1749–1779.

Baker, G. A., Jureidini, W. N., and Karakashian, O. A. (1990). Piecewise solenoidal vector fields and the Stokes problem. SIAM Journal on Numerical Analysis, 27(6):1466–1485.

Balay, S., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., and Zhang, H. (2001). PETSc Web page. http://www.mcs.anl. gov/petsc/.

Bangerth, W., Hartmann, R., and Kanschat, G. (2007). deal.II –a general-purpose object-oriented finite element library. ACM Transactions on Mathematical Software, 33(4).

Bassi, F. and Rebay, S. (1997). A high-order accurate discontinuous finite element method for the numerical solution of the compressible Navier–Stokes equations. Journal of Computational Physics, 131(2):267–279.

Bassi, F. and Rebay, S. (2002). Numerical evaluation of two discontinuous Galerkin methods for the compressible Navier–Stokes equations. International Journal for Numerical Methods in Fluids, 40(1–2):197–207.

Bastian, P., Blatt, M., Dedner, A., Engwer, C., Klöfkorn, R., Kornhuber, R., Ohlberger, M., and Sander, O. (2008a). A Generic Grid Interface for Parallel and Adaptive Scientific Computing. Part II: Implementation and Tests in DUNE. Computing, 82(2–3):121–138.

Bastian, P., Blatt, M., Dedner, A., Engwer, C., Klöfkorn, R., Ohlberger, M., and Sander, O. (2008b). A Generic Grid Interface for Parallel and Adaptive Scientific Computing. Part I: Abstract Framework. Computing, 82(2–3):103–119.

Begley, M. R. and Hutchinson, J. W. (1998). The mechanics of size-dependent indentation. Journal of the Mechanics and Physics of Solids, 46(10):2049–2068. 173

Bonet, J. and Wood, R. D. (1997). Nonlinear Continuum Mechanics for Finite Element Analysis. Cambridge University Press.

Brandenburg, C., Lindemann, F., Ulbrich, M., and Ulbrich, S. (2012). Advanced numerical methods for PDE constrained optimization with application to optimal design in Navier Stokes flow. In Leugering, G., Engell, S., Griewank, A., Hinze, M., Rannacher, R., Schulz, V., Ulbrich, M., and Ulbrich, S., editors, Constrained Optimization and Optimal Control for Partial Differential Equations, volume 160 of International Series of Numerical Mathematics, pages 257–275. Springer Basel.

Brezzi, F., Douglas, Jim, J., and Marini, L. (1985). Two families of mixed finite elements for second order elliptic problems. Numerische Mathematik, 47:217–235.

Brezzi, F., Manzini, G., Marini, D., Pietra, P., and Russo, A. (2000). Discontinuous Galerkin approximations for elliptic problems. Numerical Methods for Partial Differential Equations, 16(4):365–378.

Clason, C. and Kunisch, K. (2012). A measure space approach to optimal source placement. Computational Optimization and Applications, 53(1):155–171.

De Borst, R. and Mühlhaus, H.-B. (1992). Gradient-dependent plasticity: Formu- lation and algorithmic aspects. International Journal for Numerical Methods in Engineering, 35(3):521–539.

De Borst, R. and Pamin, J. (1996). Some novel developments in finite element procedures for gradient-dependent plasticity. International Journal for Numerical Methods in Engineering, 39(14):2477–2505.

Djoko, J. K., Ebobisse, F., McBride, A. T., and Reddy, B. D. (2007a). A discontinuous Galerkin formulation for classical and gradient plasticity – Part 1: Formulation and analysis. Computer Methods in Applied Mechanics and Engineering, 196(37– 40):3881–3897.

Djoko, J. K., Ebobisse, F., McBride, A. T., and Reddy, B. D. (2007b). A discontinuous Galerkin formulation for classical and gradient plasticity. Part 2: Algorithms and numerical analysis. Computer Methods in Applied Mechanics and Engineering, 197(1–4):1–21.

Dular, P., Geuzaine, C., Henrotte, F., and Legros, W. (1998). A general environment for the treatment of discrete problems and its application to the finite element method. Magnetics, IEEE Transactions on, 34(5):3395–3398.

Dung, N. T. and Wells, G. N. (2006). A study of discontinuous Galerkin methods for thin bending problems. In III European Conference on Computational Mechanics: Solids, Structures and Coupled Problems in Engineering, Lisbon, Portugal. 174 References

Dung, N. T. and Wells, G. N. (2008). Geometrically nonlinear formulation for thin shells without rotation degrees of freedom. Computer Methods in Applied Mechanics and Engineering, 197(33–40):2778 – 2788.

Engel, G., Garikipati, K., Hughes, T. J. R., Larson, M. G., and Taylor, R. L. (2002). Continuous/discontinuous finite element approximations of fourth-order elliptic problems in structural and continuum mechanics with applications to thin beams and plates, and strain gradient elasticity. Computer Methods in Applied Mechanics and Engineering, 191(34):3669–3750.

Engelen, R. A. B., Fleck, N. A., Peerlings, R. H. J., and Geers, M. G. D. (2006). An evaluation of higher-order plasticity theories for predicting size effects and localisation. International Journal of Solids and Structures, 43(7–8):1857–1877.

Fleck, N. and Hutchinson, J. (1997). Strain gradient plasticity. volume 33 of Advances in Applied Mechanics, pages 295 – 361. Elsevier.

Fleck, N. and Hutchinson, J. (2001). A reformulation of strain gradient plasticity. Journal of the Mechanics and Physics of Solids, 49(10):2245 – 2271.

Fleck, N., Muller, G., Ashby, M., and Hutchinson, J. (1994). Strain gradient plasticity: theory and experiment. Acta Metallurgica et Materialia, 42(2):475–487.

Funke, S. W. and Farrell, P. E. (2013). A framework for automated PDE-constrained optimisation. arXiv preprint. http://arxiv.org/abs/1302.3894.

Gao, H., Huang, Y., Nix, W., and Hutchinson, J. (1999). Mechanism-based strain gra- dient plasticity– I. theory. Journal of the Mechanics and Physics of Solids, 47(6):1239 – 1263.

Giesselmann, J., Makridakis, C., and Pryer, T. (2012). Energy consistent DG methods for the Navier–Stokes–Korteweg system. arXiv preprint. http://arxiv.org/abs/ 1207.4647.

Grandi, D., Maraldi, M., and Molari, L. (2012). A macroscale phase-field model for shape memory alloys with non-isothermal effects: Influence of strain rate and environmental conditions on the mechanical response. Acta Materialia, 60(1):179– 191.

Gudmundson, P. (2004). A unified treatment of strain gradient plasticity. Journal of the Mechanics and Physics of Solids, 52(6):1379–1406.

Gurtin, M. E. (2004). A gradient theory of small-deformation isotropic plasticity that accounts for the Burgers vector and for dissipation due to plastic spin. Journal of the Mechanics and Physics of Solids, 52(11):2545–2568. 175

Gurtin, M. E. and Needleman, A. (2005). Boundary conditions in small-deformation, single-crystal plasticity that account for the Burgers vector. Journal of the Mechanics and Physics of Solids, 53(1):1–31.

Heumann, H. and Hiptmair, R. (2012). Stabilized Galerkin methods for magnetic ad- vection. ETH Zürich. ftp://ftp.sam.math.ethz.ch/pub/sam-reports/reports/ reports2012/2012-26.pdf.

Hilber, H. M., Hughes, T. J., and Taylor, R. L. (1977). Improved numerical dissipation for time integration algorithms in structural dynamics. Earthquake Engineering & Structural Dynamics, 5(3):283–292.

Hoffman, J., Jansson, J., de Abreu, R. V., Degirmenci, N. C., Jansson, N., Müller, K., Nazarov, M., and Spühler, J. H. (2013). Unicorn: Parallel adaptive finite ele- ment simulation of turbulent flow and fluid–structure interaction for deforming domains and complex geometry. Computers & Fluids, 80(0):310–319. Selected contributions of the 23rd International Conference on Parallel Fluid Dynamics.

Holzapfel, G. A. (2000). Nonlinear Solid Mechanics: A Continuum Approach for Engineering. John Wiley & Sons.

Horst, T., Heinrich, G., Schneider, M., Schulze, A., and Rennert, M. (2013). Linking mesoscopic and macroscopic aspects of crack propagation in elastomers. In Grellmann, W., Heinrich, G., Kaliske, M., Klüppel, M., Schneider, K., and Vilgis, T., editors, Fracture Mechanics and Statistical Mechanics of Reinforced Elastomeric Blends, volume 70 of Lecture Notes in Applied and Computational Mechanics, pages 129–165. Springer Berlin Heidelberg.

Hosangadi, A., Fallah, F., and Kastner, R. (2006). Optimizing polynomial expressions by algebraic factorization and common subexpression elimination. Computer- Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 25(10):2012– 2022.

Hughes, T. J. R., Scovazzi, G., Bochev, P. B., and Buffa, A. (2006). A multiscale discontinuous Galerkin method with the computational structure of a continuous Galerkin method. Computer Methods in Applied Mechanics and Engineering, 195(19– 22):2761–2787.

Jansson, N., Hoffman, J., and Nazarov, M. (2011). Adaptive simulation of turbulent flow past a full car model. In High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, pages 1–8. IEEE.

Karniadakis, G. E. and Sherwin, S. J. (2005). Spectral/hp Element Methods for Com- putational Fluid Dynamics. Numerical Mathematics and Scientific Computation. Oxford University Press, Oxford, second edition. 176 References

Kirby, R. C. (2004). Algorithm 839: FIAT, A new paradigm for computing finite element basis functions. ACM Transactions on Mathematical Software, 30:502–516.

Kirby, R. C. (2012). FIAT: Numerical construction of finite element basis functions. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 13. Springer.

Kirby, R. C., Knepley, M. G., Logg, A., and Scott, L. R. (2005). Optimizing the evaluation of finite element matrices. SIAM Journal on Scientific Computing, 27(3):741–758.

Kirby, R. C. and Logg, A. (2006). A compiler for variational forms. ACM Transactions on Mathematical Software, 32:417–444.

Kirby, R. C. and Logg, A. (2007). Efficient compilation of a class of variational forms. ACM Transactions on Mathematical Software, 33(3).

Kirby, R. C. and Logg, A. (2008). Benchmarking domain-specific compiler op- timizations for variational forms. ACM Transactions on Mathematical Software, 35(2).

Kirby, R. C., Logg, A., Scott, L. R., and Terrel, A. R. (2006). Topological optimization of the evaluation of finite element matrices. SIAM Journal on Scientific Computing, 28(1):224–240.

Korelc, J. (1997). Automatic generation of finite-element code by simultaneous optimization of expressions. Theoretical Computer Science, 187(1–2):231–248.

Labeur, R. and Wells, G. (2012). Energy stable and momentum conserving hybrid finite element method for the incompressible Navier–Stokes equations. SIAM Journal on Scientific Computing, 34(2):889–913.

Labeur, R. J. and Wells, G. N. (2007). A Galerkin interface stabilisation method for the advection-diffusion and incompressible Navier-Stokes equations. Computer Methods in Applied Mechanics and Engineering, 196(49–52):4985–5000.

Labeur, R. J. and Wells, G. N. (2009). Interface stabilised finite element method for moving domains and free surface flows. Computer Methods in Applied Mechanics and Engineering, 198(5–8):615 – 630.

Lakkis, O. and Pryer, T. (2011). A finite element method for fully nonlinear elliptic problems. arXiv preprint. http://arxiv.org/abs/1103.2970.

Langtangen, H. P. (1999). Computational partial differential equations: numerical methods and Diffpack programming. Springer Verlag. 177

Lezar, E. and Davidson, D. (2012). Electromagnetic waveguide analysis. In Logg, A., Mardal, K.-A., and Wells, G., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 34, pages 629–642. Springer Berlin Heidelberg.

Logg, A., Mardal, K.-A., and Wells, G. N., editors (2012a). Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering. Springer.

Logg, A., Mardal, K.-A., and Wells, G. N. (2012b). Finite element assembly. In Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 6. Springer.

Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N. (2012c). FFC: the fenics form compiler. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 11. Springer.

Logg, A. and Wells, G. N. (2010). DOLFIN: Automated finite element computing. ACM Transactions on Mathematical Software, 37(2):20:1–20:28.

Logg, A., Wells, G. N., and Hake, J. (2012d). DOLFIN: A C++/Python finite element library. In Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 10. Springer.

Long, K., Kirby, R., and Van Bloemen Waanders, B. (2010). Unified embedded parallel finite element computations via software-based Fréchet differentiation. SIAM Journal on Scientific Computing, 32:3323–3351.

Lopes, N., Pereira, P., and Trabucho, L. (2011). A numerical analysis of a class of generalized Boussinesq-type equations using continuous/discontinuous FEM. International Journal for Numerical Methods in Fluids, 69(7):1186–1218.

Lubliner, J. (2008). Plasticity Theory. Dover Publications.

Luo, C. and Calderer, M. C. (2012). Numerical study of liquid crystal elastomers by a mixed finite element method. European Journal of Applied Mathematics, 23:121–154.

Maraldi, M., Molari, L., and Grandi, D. (2012). A unified thermodynamic framework for the modelling of diffusive and displacive phase transitions. International Journal of Engineering Science, 50(1):31 – 45.

Maraldi, M., Wells, G., and Molari, L. (2011). Phase field model for coupled displacive and diffusive microstructural processes under thermal loading. Journal of the Mechanics and Physics of Solids, 59(8):1596–1612. 178 References

Marchand, R. and Davidson, D. (2011). The method of manufactured solutions for the verification of computational electromagnetics. In Electromagnetics in Advanced Applications (ICEAA), 2011 International Conference on, pages 487–490.

Massing, A., Larson, M., and Logg, A. (2013). Efficient implementation of finite element methods on nonmatching and overlapping meshes in three dimensions. SIAM Journal on Scientific Computing, 35(1).

Massing, A., Larson, M. G., Logg, A., and Rognes, M. E. (2012a). A stabilized Nitsche fictitious domain method for the Stokes problem. arXiv preprint. http: //arxiv.org/abs/1206.1933.

Massing, A., Larson, M. G., Logg, A., and Rognes, M. E. (2012b). A stabilized Nitsche overlapping mesh method for the Stokes problem. arXiv preprint. http: //arxiv.org/abs/1205.6317.

Miaskowski, A., Sawicki, B., and Krawczyk, A. (2012). The use of magnetic nanopar- ticles in low frequency inductive hyperthermia. COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, 31(4):1096– 1104.

Molari, L., Wells, G. N., Garikipati, K., and Ubertini, F. (2006). A discontinuous Galerkin method for strain gradient-dependent damage: Study of interpolations and convergence. Computer Methods in Applied Mechanics and Engineering, 195(13– 16):1480–1498.

Mortensen, M., Langtangen, H., and Wells, G. (2011). A FEniCS-based programming framework for modeling turbulent flow by the Reynolds-averaged Navier–Stokes equations. Advances in Water Resources, 34(9):1082–1101.

Mühlhaus, H.-B. and Aifantis, E. C. (1991). A variational principle for gradient plasticity. International Journal of Solids and Structures, 28:845–857.

Nikbakht, M. (2012). Automated Solution of Partial Differential Equations with Dis- continuities using the Partition of Unity Method. PhD thesis, Delft University of Technology.

Nikbakht, M. and Wells, G. (2009). Automated modelling of evolving discontinuities. Algorithms, 2(3):1008–1030.

Nix, W. D. and Gao, H. (1998). Indentation size effects in crystalline materials: a law for strain gradient plasticity. Journal of the Mechanics and Physics of Solids, 46(3):411–425.

Poole, W., Ashby, M., and Fleck, N. (1996). Micro-hardness of annealed and work-hardened copper polycrystals. Scripta Materialia, 34(4):559–564. 179

Prud’homme, C. (2006). A domain specific embedded language in C++ for auto- matic differentiation, projection, integration and variational formulations. Scien- tific Programming, 14:81–110. Pryer, T. (2012). Discontinuous Galerkin methods for the p–biharmonic equation from a discrete variational perspective. arXiv preprint. http://arxiv.org/abs/ 1209.4002. Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. (2005). SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2):232– 275. Riesen, P., Hutter, K., and Funk, M. (2010). A viscoelastic Rivlin–Ericksen material model applicable to glacier ice. Nonlinear Processes in Geophysics, 17:673–684. Riesen, P. D. (2011). Variations of the surface ice motion of Gornergletscher during drainages of the ice-dammed lake Gornersee. PhD thesis, ETH Zürich. http://dx. doi.org/10.3929/ethz-a-006526655. Rognes, M., Kirby, R., and Logg, A. (2010). Efficient assembly of h(div) and h(curl) conforming finite elements. SIAM Journal on Scientific Computing, 31(6):4130–4151. Rognes, M. E. and Logg, A. (2012). Automated goal-oriented error control I: Stationary variational problems. arXiv preprint. http://arxiv.org/abs/1204. 6643. Rosseel, E. and Wells, G. N. (2012). Optimal control with stochastic PDE constraints and uncertain controls. Computer Methods in Applied Mechanics and Engineering, 213–216(0):152 – 167. Russell, F. P. and Kelly, P. H. J. (2013). Optimized code generation for finite element local assembly using symbolic manipulation. ACM Transactions on Mathematical Software, 39(4). Saibaba, A. K., Bakhos, T., and Kitanidis, P. K. (2012). A flexible Krylov solver for shifted systems with application to oscillatory hydraulic tomography. arXiv preprint. http://arxiv.org/abs/1212.3660. Selim, K. (2012). An adaptive finite element solver for fluid–structure interaction problems. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 29. Springer. Selim, K., Logg, A., and Larson, M. (2012). An adaptive finite element splitting method for the incompressible navier–tokes equations. Computer Methods in Applied Mechanics and Engineering, 209–212(0):54–65. 180 References

Shewchuk, J. R. and Ghattas, O. (1993). A compiler for parallel finite element meth- ods with domain-decomposed unstructured meshes. In Keyes, D. E. and Xu, J., editors, Proceedings of the Seventh International Conference on Domain Decomposition Methods in Scientific and Engineering Computing (Pennsylvania State University), Contemporary Mathematics, volume 180, pages 445–450. American Mathematical Society.

Simo, J. C. and Hughes, T. J. R. (1998). Computational Inelasticity. Springer Verlag.

Simo, J. C. and Taylor, R. L. (1985). Consistent tangent operators for rate- independent elastoplasticity. Computer Methods in Applied Mechanics and En- gineering, 48(1):101–118.

Stölken, J. and Evans, A. (1998). A microbend test method for measuring the plasticity length scale. Acta Materialia, 46(14):5109–5115.

Sukys, J., Hiptmair, R., and Heumann, H. (2010). Discontinuous Galerkin discretiza- tion of magnetic convection. ETH Zürich. http://math1.unice.fr/~hheumann/ Files/Report_Sukys.pdf.

Ten Eyck, A., Celiker, F., and Lew, A. (2008). Adaptive stabilization of discontin- uous Galerkin methods for nonlinear elasticity: Motivation, formulation, and numerical examples. Computer Methods in Applied Mechanics and Engineering, 197(45–48):3605–3622.

Ten Eyck, A. and Lew, A. (2006). Discontinuous Galerkin methods for non-linear elasticity. International Journal for Numerical Methods in Engineering, 67(9):1204– 1243.

Vidoli, S. (2013). Discrete approximations of the Föppl–Von Kármán shell model: From coarse to more refined models. International Journal of Solids and Structures, 50(9):1241–1252.

Vynnytska, L., Clark, S. R., and Rognes, M. E. (2012). Dynamic simulations of convection in the Earth’s mantle. In Logg, A., Mardal, K.-A., and Wells, G., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 31, pages 585–600. Springer Berlin Heidelberg.

Vynnytska, L., Rognes, M., and Clark, S. (2013). Benchmarking FEniCS for man- tle convection simulations. Computers & Geosciences, 50(0):95–105. Benchmark problems, datasets and methodologies for the computational geosciences.

Wang, P. (1986). FINGER: A symbolic system for automatic generation of numerical programs in finite element analysis. Journal of Symbolic Computation, 2(3):305–316. 181

Wells, G. (2011). Analysis of an interface stabilized finite element method: The advection-diffusion-reaction equation. SIAM Journal on Numerical Analysis, 49(1):87–109.

Wells, G. N. and Dung, N. T. (2007). A C0 discontinuous Galerkin formulation for Kirchhoff plates. Computer Methods in Applied Mechanics and Engineering, 196(35–36):3370–3380.

Wells, G. N., Garikipati, K., and Molari, L. (2004). A discontinuous Galerkin formulation for a strain gradient-dependent damage model. Computer Methods in Applied Mechanics and Engineering, 193(33–35):3633–3645.

Wells, G. N., Hooijkaas, T., and Shan, X. (2008). Modelling temperature effects on multiphase flow through porous media. Philosophical Magazine, 88(28–29):3265– 3279.

Ølgaard, K. B., Logg, A., and Wells, G. N. (2008a). Automated code generation for discontinuous Galerkin methods. SIAM Journal on Scientific Computing, 31(2):849– 864.

Ølgaard, K. B. and Wells, G. N. (2009). Supporting material. http://www.dspace. cam.ac.uk/handle/1810/218612.

Ølgaard, K. B. and Wells, G. N. (2010). Optimisations for quadrature representations of finite element tensors through automated code generation. ACM Transactions on Mathematical Software, 37(1):8:1–8:23.

Ølgaard, K. B. and Wells, G. N. (2012a). Applications in solid mechanics. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 26. Springer.

Ølgaard, K. B. and Wells, G. N. (2012b). Quadrature representation of finite element variational forms. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering, chapter 7. Springer.

Ølgaard, K. B. and Wells, G. N. (2013). FEniCS Solid Mechanics. https: //bitbucket.org/fenics-apps/fenics-solid-mechanics.

Ølgaard, K. B., Wells, G. N., and Logg, A. (2008b). Automated computational modelling for solid mechanics. In Reddy, B. D., editor, IUTAM Symposium on Theoretical, Computational and Modelling Aspects of Inelastic Media, volume 11 of IUTAM Bookseries, pages 192–204. Springer.

Summary

In engineering, physical phenomena are often described mathematically by par- tial differential equations (PDEs), and a commonly used method to solve these equations is the finite element method (FEM). Implementing a solver based on this method for a given PDE in a computer program written in source code can be te- dious, time consuming and error prone. Recently, compilers that automatically gen- erate source code from the mathematical representation of a given PDE expressed in a form language have been introduced. This approach to automated mathematical modelling, which is key in the FEniCS Project (http://fenicsproject.org), has reduced the burden of application developers working with the FEM when it comes to implementing solvers for new models. In this thesis, the automated modelling framework of the FEniCS Project is extended such that discontinuous Galerkin methods can be handled; rapid prototyp- ing of advanced models and applications is possible; and efficiency is maintained also for complex problems in general. The extensions are implemented in various components of the FEniCS frame- work. For instance, the Unified Form Language (UFL) is extended by adding new abstractions that allow operators pertinent to discontinuous Galerkin methods to be represented in a straightforward fashion. The FEniCS Form Compiler (FFC) is also extended such that code can be generated from expressions that contain the discontinuous Galerkin operators introduced in UFL. In order to maintain computational efficiency for complex problems, various optimisation strategies for computing the local finite element tensor are implemented in the FFC. The central philosophy of the optimisation strategies is to manipulate the representation in such a way that the number of operations to compute the local element tensor decreases. As an example, to demonstrate the extensions to the FEniCS framework devel- oped in this work, a strain gradient plasticity model which includes a lifting-type discontinuous Galerkin formulation for the plastic multiplier is presented. It is demonstrated that the model is not suitable for softening problems. On the other hand, the model is able to capture size effects for a hardening problem in a micro-indentation simulation in three dimensions.

Samenvatting

In de ingenieurspraktijk worden natuurkundige fenomenen vaak wiskundig beschreven met partiële differentiaalvergelijkingen (PDVs) en een veelgebruikte methode om zulke vergelijkingen op te lossen is de eindige-elementenmethode (EEM). Het implementeren in een computerprogramma geschreven in broncode van een solver die gebaseerd is op deze methode voor een gegeven PDV is weerbarstig, tijdrovend en foutgevoelig. Recentelijk zijn compilers geïntroduceerd die automa- tisch broncode genereren van een wiskundige representatie van een gegeven PDV in een form language. Deze benadering van automatisch modelleren, die centraal staat in het FEniCS Project (http://fenicsproject.org), heeft de werklast vermin- derd van ontwikkelaars die werken met de EEM wat betreft het implementeren van solvers voor nieuwe modellen. In dit proefschrift is het automatische modelleerraamwerk van het FEniCS Project zodanig uitgebreid dat discontinue Galerkin methoden kunnen worden gebruikt; rapid prototyping van geavanceerde modellen en toepassingen mogelijk is; en efficiëntie algemeen gewaarborgd is voor complexe problemen. De uitbreidingen zijn geïmplementeerd in verschillende componenten van het FEniCS raamwerk. Zo is de Unified Form Language (UFL) uitgebreid met nieuwe abstracties waarmee operatoren die bij de discontinue Galerkin methoden horen, eenvoudig gerepresenteerd kunnen worden. De FEniCS Form Compiler (FFC) is ook uitgebreid, zodat code gegenereerd kan worden die de discontinue Galerkin operatoren bevat welke geïntroduceerd zijn in UFL. Om numerieke efficiëntie voor complexe problemen te waarborgen zijn verschillende optimalisatie-strategieën voor het berekenen van de lokale eindige-elementen tensor geïmplementeerd in FFC. De centrale filosofie van de optimalisatie-strategieën is om de representatie zodanig aan te passen dat het aantal bewerkingen voor het berekenen van de lokale elementen-tensor afneemt. Om de uitbreidingen van het FEniCS raamwerk die ontwikkeld zijn in dit werk te demonstreren, wordt bij wijze van voorbeeld een strain gradient plasticiteitsmodel gepresenteerd met een lifting discontinue Galerkin formulering voor de plastische multiplier. Het wordt aangetoond dat het model niet geschikt is voor problemen met softening. Aan de andere kant is het model wel in staat om schaaleffecten weer 186 Samenvatting te geven voor een hardening probleem van micro-indentatie in drie dimensies. Propositions

1. Automatic code generation can reduce the time needed to implement finite element solvers, but only if one has faith in the generator that generates the code. (This proposition was conceived after many hours of debugging the FEniCS Form Compiler only to find out that there was a sign error in the input code.)

2. In the past, finite element solvers for partial differential equations (PDEs) were written in languages (source code) that compilers translated into machine code. In the present, a high-level language for expressing the mathematical formulation of a given PDE makes it possible for compilers to automatically generate source code. In the future, automated model generators may create PDEs from experimental data.

3. Eventually, artificial general intelligence (strong AI) will allow autonomous systems to select which part of reality to model and how, thus making humans redundant.

4. In the big picture, humans, as a species, are already redundant, but this does not imply that strong AI has been developed yet.

5. In terms of exposing bugs and directing the development of a software project, a large user base is worth more than a large number of developers.

6. “A young man naturally conceives an aversion to labour, when for a long time he receives no benefit from it.” – Adam Smith, reflections on apprenticeships in An Inquiry into the Nature and Causes of the Wealth of Nations. Similarly, a PhD student may experience a drop in motivation if project funding runs out. The solution is to improve project planning rather than extending the funding.

7. Dijkstra’s shortest path algorithm works well for planning tasks of short duration, but it is not suitable for planning long-term research projects.

8. Time (or money) is the penalty parameter closing the gap between ambition and actual work done. 188 Propositions

9. Economic growth is driving social differences between countries to zero.

10. A child does not satisfy boundary conditions by construction. Boundary conditions must, therefore, be enforced in a weak sense by penalties, rewards and by setting a good example.

11. Gradients in the distribution of wealth is a prerequisite for a dynamic society.

12. Collaboration makes it possible for a strong group of individuals to outper- form a group of strong individuals. This concept applies to science as well as sports and is illustrated, e.g., by the 1992 and 2012 European Championship football matches between The Netherlands and Denmark.

13. Recent debate whether or not Sinterklaas (Saint Nicholas) can have Zwarte Piet (Black Peter) as his helper (provided his employment is in accordance with the collective agreement) misses the point. The real problem is if Zwarte Piet is dismissed because of his skin colour.

14. Although a PhD thesis is rarely read cover to cover, most people will read the propositions and then go through the references to see how many papers have been published based on the present work.

These propositions are regarded as opposable and defendable, and have been approved as such by the supervisors Prof. dr. ir. L. J. Sluys and Dr. G. N. Wells. Stellingen

1. Automatische code-generatie kan de tijd die nodig is om eindige elementen solvers te implementeren reduceren, maar alleen voor wie vertrouwen heeft in de generator die de code genereert. (Deze stelling is tot stand gekomen na urenlang debuggen van de FEniCS Form Compiler, slechts om te ontdekken dat er een tekenfout zat in de inputcode.)

2. In het verleden werden eindige elementen solvers voor partiële differenti- aalvergelijkingen (PDVs) geschreven in talen (broncode) die door compilers in machinetaal vertaald werden. In het heden maakt een hogere taal voor het uitdrukken van de wiskundige formulering van een gegeven PDV het mogelijk dat compilers automatisch broncode genereren. In de toekomst kunnen geautomatiseerde modelgeneratoren PDVs creëren van experimentele data.

3. Uiteindelijk zal kunstmatige algemene intelligentie (sterke KI) het mogelijk maken voor autonome systemen om te kiezen welk onderdeel van de werke- lijkheid te modelleren en hoe, en zodoende mensen overbodig maken.

4. In het grotere plaatje zijn mensen, als soort, al overbodig, maar dit impliceert niet dat sterke KI al ontwikkeld is.

5. In termen van het ontdekken van bugs en het sturen van de ontwikkeling van een software-project, is een grote gebruikersgroep meer waard dan een groot aantal ontwikkelaars.

6. “Een jongeman ontwikkelt van nature een afkeer van arbeid wanneer hij er gedurende een langere periode geen baat bij heeft.” – Adam Smith, overdenkingen over leerlingschap in An Inquiry into the Nature and Causes of the Wealth of Nations. Evenzo kan een promovendus een afname in motivatie ervaren wanneer de projectfinanciering ophoudt. De oplossing is eerder om de projectplanning te verbeteren dan om de financiering te verlengen. 190 Stellingen

7. Dijkstra’s kortste-pad-algoritme werkt goed voor het plannen van kortlopende taken, maar het is niet geschikt voor het plannen van langlopende onderzoeks- projecten.

8. Tijd (of geld) is de penalty parameter die de kloof tussen ambitie en werkelijk verrichte arbeid dicht.

9. Economische groei drijft sociale verschillen tussen landen naar nul.

10. Een kind voldoet niet vanaf zijn geboorte aan randvoorwaarden. Randvoor- waarden moeten daarom afgedwongen worden in zwakke zin met straf, beloning en het geven van het goede voorbeeld.

11. Gradiënten in de verdeling van rijkdom zijn een voorwaarde voor een dy- namische maatschappij.

12. Samenwerking maakt het mogelijk dat een sterke groep individuen een groep sterke individuen aftroeft. Dit concept werkt zowel voor de wetenschap als voor sport en is, bijvoorbeeld, te zien geweest in de EK-voetbalwedstrijden tussen Nederland en Denemarken in 1992 en 2012.

13. De recente discussie over of Sinterklaas al dan niet Zwarte Piet als zijn helper mag hebben (gegeven dat zijn dienstverband in overeenstemming is met de collectieve arbeidsovereenkomst) mist het wezenlijke punt. Het echte probleem ontstaat als Zwarte Piet ontslagen wordt vanwege zijn huidskleur.

14. Hoewel een proefschrift zelden van kaft tot kaft gelezen wordt, zullen de meeste mensen de stellingen lezen en dan de bibliografie doornemen om te zien hoeveel artikelen gepubliceerd zijn op basis van het betreffende werk.

Deze stellingen worden opponeerbaar en verdedigbaar geacht en zijn als zodanig goedgekeurd door de promotoren Prof. dr. ir. L. J. Sluys en Dr. G. N. Wells. Curriculum vitae

2 August 1978 Born in Ringkøbing, Denmark

Aug. 2000–Oct. 2005 Civil engineering studies, Aalborg University.

October 2005 Master of Science in Civil Engineering, Aalborg University.

Nov. 2005–Nov. 2009 Research assistant, Faculty of Civil Engineering and Geosciences, Delft University of Technology.

Apr. 2010–Sep. 2010 Scientific programmer, Simula Research Labora- tory.

Nov. 2011–present Research assistant, Department of Civil Engineer- ing, Aalborg University.