for numerical stabilization of advection-diffusion PDEs

Courses: Numerical Analysis for Partial Differential Equations - Advanced Programming for Scientific Computing

Margherita Guido, Michele Vidulis Suprevisor: prof. Luca Ded`e

09/09/2019

Contents

1 Introduction5

2 Problem: SUPG stabilization and Isogeometric analysis7 2.1 Advection-diffusion problem...... 7 2.1.1 Weak formulation and numerical discretization using FE...8 2.2 SUPG stabilization...... 9 2.2.1 A Variational Multiscale (VMS) approach...... 9 2.3 Isogeometric analysis...... 11 2.3.1 B-Spline basis functions...... 11 2.3.2 B-Spline geometries...... 11 2.3.3 NURBS basis functions...... 12 2.3.4 NURBS as trial space for the solution of the Advection-Diffusion problem...... 12 2.3.5 Mesh refinement and convergence results...... 12

3 Artificial Neural Networks 15 3.1 Structure of an Artificial Neural Network...... 15 3.1.1 Notation...... 15 3.1.2 Design of an ANN...... 16 3.2 Universal approximation property...... 17 3.3 Backpropagation and training...... 17 3.3.1 Some terminology...... 17 3.3.2 The learning algorithm...... 18

4 A Neural Network to learn the stabilization parameter 21 4.1 Our Neural Network scheme...... 21 4.2 Mathematical formulation of the problem...... 22 4.3 Expected results...... 24 4.4 Implementation aspects...... 24 4.4.1 ...... 24 4.4.2 C++ libraries...... 25

5 Isoglib and OpenNN 27 5.1 Structure of IsoGlib...... 27 5.1.1 Definition of the problem...... 27 5.1.2 Main steps in solving process...... 28 5.1.3 Export and visualization of the results...... 28 5.2 Structure of OpenNN...... 28 5.2.1 Uses of the library...... 28 5.2.2 Vectors and matrices...... 29 5.2.3 DataSet class...... 29 5.2.4 NeuralNetwork class...... 30 5.2.5 TrainingStrategy class...... 30 5.2.6 LossIndex class...... 30 5.2.7 OptimizationAlgorithm class...... 31 5.2.8 Backpropagation...... 31

3 4 CONTENTS

6 Implementation: interface of OpenNN and IsoGlib 35 6.1 SUPG solver in IsoGlib...... 35 6.1.1 Our test class: SUPGdata ...... 35 6.1.2 SUPGLocalMatrix class...... 36 6.2 New OpenNN classes...... 36 6.2.1 IsoglibInterface class...... 36 6.2.2 Customized loss function: OutputFunction class...... 38 6.3 SUPG example in OpenNN...... 42 6.3.1 Data of the problem...... 42 6.3.2 Training...... 42 6.3.3 Testing...... 43

7 Numerical Tests 45 7.1 Training of the network...... 46 7.2 Prediction of the SUPG parameter...... 47 7.3 Analysis of the results...... 48 7.4 L2 error trend in a different PDE...... 50

8 Conclusions 53

Bibliography 55 Chapter 1

Introduction

High-order numerical methods for the approximation of advection–diffusion PDEs based on Galerkin finite element may be affected from numerical instabilities, which occur as spurious oscillations and compromise the solution accuracy. A stabilized and strongly consistent method named Streamline Upwind Petrov- Galerkin Method (SUPG) can be obtained by adding a further term to the Galerkin approximation (see (HB82) and (Q17, chap.13)). This term includes an elementwise stabilization parameter that should be carefully determined, since an exact formula defining its optimal value is still lacking. Through the years, several approximations of this coefficient have been proposed: we present the simplest and most used, trying to frame the ideas that lead to their definition (see (HSF18) and (C97)). The aim of our work is to exploit the universal approximation power of the Arti- ficial Neural Networks to reconstruct a suitable approximation of such stabilization parameter. We train an Artificial Neural Network, implemented inside the C++ li- brary OpenNN (OpenNN), so that, once the parameters characterizing the problem and its discretization are known, we predict an optimal value of the stabilization parameter. To train our network we developed a customized loss function that re- quires several numerical resolution of the stabilized PDE, for which we employ a C++ library (IsoGlib), based on Isogeometric analysis (CHB09). We start presenting the problem in its theoretical framework, describing the family of PDE equations we want to solve, some instability issues and how they can be confined, constructing a theoretical background for this problem. In Chapter2 we will also introduce Isogeometric Analysis, the technique that will be used to solve numerically the PDEs problems. Chapter3 is devoted to the description of Artifi- cial Neural Networks and the algorithm involved in their training, with particular attention to the description of how we designed our customized Neural Network. In Chapter4 our original problem is formalized, with the purpose of putting in com- munication the world of EDPs with the one of ANNs, and describing the effort we made to realize our project. Chapter5 contains a description of the libraries used for the implementation (OpenNN and IsoGlib), while in Chapter6 we will present the classes we developed to interface them. Numerical results are presented in Chapter 7 to demonstrate the performance of this network-based technique and results are critically discussed comparing them with the known formulas. At the end (Chapter 8) , possible further developments of this technique are presented. Indeed we start from the implementation of a neural network that works on a simplified version of the PDE problem but, once this network is tested, we already have in mind how to customize it for more complicated problems.

5

Chapter 2

Problem: SUPG stabilization and Isogeometric analysis

2.1 Advection-diffusion problem

Advection–diffusion PDEs (see (Q17, chap.13)) are used to model a wide range of phenomena such as semiconductor devices modeling, magnetostatics and electro- static flows, heat and mass-transfer and flows in porous media related to oil and groundwater applications. The linear advection–diffusion equation on a domain d Ω Ď R “ 2, 3 is a boundary value problem of the form:

Lu “ ´divpµpxq∇uq ` bpxq ¨ ∇u “ f in Ω (2.1) #u “ 0 on BΩ

where µpxq and bpxq are respectively the diffusion and advection coefficent. In many practical applications, the diffusion term ´divpµpxq∇uq is dominated by the transport term bpxq ¨ ∇u. In such cases the solution can give rise to boundary layers, namely regions, generally close to the boundary of Ω, where the solution is characterized by strong gradients. From a numerical point of view, solving these types of advection-dominated problems using Galerkin Finite Element method leads to nonphysical oscillation near the boundary layers. An example of this phe- nomenon can be seen in 2.1, obtained using the following data µ “ 10´4, β “ r1, 1s in Ω “ p0, 1q2 and a forcing term f such that the exact solution is given by: px´0.5q2`py´0.5q2´ 1 ? 16 uex “ ´atan µ A challenging problem is to find numerical scheme that works´ well in all types¯ of regime, without increasing the computational effort to solve the PDE problem.

The goal of this project is to find an innovative way to code the SUPG stabi- lization method, exploiting Artificial Neural Networks to select the best value of the stabilization parameter.

7 8CHAPTER 2. PROBLEM: SUPG STABILIZATION AND ISOGEOMETRIC ANALYSIS

Figure 2.1: Numerical oscillations in 2D

2.1.1 Weak formulation and numerical discretization using FE 1 Let V “ H0 pΩq be the reference Sobolev space, let a : V ˆ V Ñ R be the bilinear form apu, vq “ µ∇u ¨ ∇v dΩ ` b ¨ ∇uv dΩ żΩ żΩ and let F : V Ñ R be the linear functional

F pvq “ fv dΩ żΩ The weak formulation of problem (2.1) reads

Find u P V : apu, vq “ F pvq @v P V (2.2)

The existence of a unique solution follows from the Lax-Milgram Theorem. By choosing a suitable family tVh, h ą 0u of finite dimensional subspaces of V we can state the Galerkin formulation of the problem:

Find uh P Vh : apuh, vhq “ F pvhq @vh P Vh (2.3) that is well posed thanks to the previous analysis. Moreover the Galerkin error inequality gives: M ku ´ u k ď inf kv ´ v k (2.4) h V α vhPVh h V µ0 where α “ 2 and M “ kµkL8pΩq ` kbkL8pΩq are the coercivity and continuity 1`CΩ constant of ap¨, ¨q, given µ0 the lower bound for the diffusive coefficient µ. M By the definitions of α and M, the upper-bounding constant α of the error becomes larger (and, correspondingly, the estimate 2.4 meaningless) as the ratio kbk L8pΩq grows and tends to infinity. This happens in the advection dominated kµkL8pΩq regime: in such cases the Galerkin method can give inaccurate solutions, presenting numerical oscillations. The dimensionless coefficient that measures the relation between transport and diffusion is called Peclet number and is defined as |b|h e “ (2.5) P 2µ being h the dimension of the mesh. To understand the significance of the Peclet number it can be useful to look at the 1D problem:

´µu2 ` bu1 “ 0, 0 ă x ă 1 #up0q “ 0, up1q “ 1 2.2. SUPG STABILIZATION 9

exppbx{µq´1 The exact solution of this problem is given by upxq “ exppb{µq´1 . When b " µ the solution is close to zero on almost all of the interval, except for the boundary layer that generates in a neighborhood of x “ 1. It can be shown that the Finite Element solution is monotone if and only if the condition Pe ď 1 is satisfied: in practice, in order to avoid oscillations h has to be chosen small enough to solve the boundary layer, which, in turn, becomes thinner as the ratio b{µ tends to infinity. In figure 2.2 different trends of the solution to vary of the magnitude of the Peclet number.

Figure 2.2: numerical solutions in 1D with different Peclet numbers

2.2 SUPG stabilization

A stabilized and strongly consistent method that improves numerical stability for convection dominated flows can be obtained by adding a further term to the Galerkin approximation, so that the stabilized system reads:

Find uh P Vh : apuh, vhq ` Lhpuh, f; vhq “ pf, vhq @vh P Vh (2.6) with Lhpu, f; vhq “ 0 @vh P Vh One of the possible choices for Lhpu, f; vhq leads to the so called Streamline Upwind Petrov-Galerkin Method (SUPG):

τK L pu , f; v q “ pLu ´ f, divpbpxqv qq 2 (2.7) h h h h 2 h L pKq KPT ÿh In particular the new consistent term is proportional to the residual Luh ´ f and contains an elementwise stabilization parameter τK to be chosen. Several formulas for computing this parameter have been proposed and to get an idea of where they come from it could be useful to have a look at the Variational Multiscale framework. An analogy will guide us towards a better understanding of the SUPG stabilization method.

2.2.1 A Variational Multiscale (VMS) approach VMS is a general framework for stabilization methods. Stabilized methods can be derived from a variational multiscale approach, that combines ideas of physical modeling with numerical approximation. This is motivated by the fact that straight- forward application of Galerkin’s method employing standard bases, such as finite elements, is not a robust approach in the presence of multiscale phenomena, like in advection-diffusion case. This method not only explains where do instabilities come from, it also identifies clearly the intrinsic stabilization parameter τ. More detailed description of this method can be found in (HSF18) and (HEV07). 10CHAPTER 2. PROBLEM: SUPG STABILIZATION AND ISOGEOMETRIC ANALYSIS

The idea of the variational multiscale applied to problems that produce an in- accurate Galerkin solution is based on decomposing the solution in two terms. An approximation is made to determine one of them analytically, while the other one is computed using Galerkin method, which will then be able to produce accurate solutions. ˚ In symbols, we write the exact solution u P V as the sum of uh and u belong- ˚ ing respectively to the FE space Vh and to V , which is any complement to it in V . Each VMS-type method depends on the way V ˚ is approximated. The same decomposition is performed for the test functions. Rewriting the weak formulation of the problem as Find u P V : pLu, vq “ pf, vq @v P V (2.8) where L is the second order differential operator corresponding to the advection diffusion equation, we can introduce the multiscale decomposition:

˚ ˚ Find puh, u q P Vh ˆ V : ˚ ˚ ˚ ˚ ˚ ˚ pLuh, vh ` v q ` pLu , vh ` v q “ pf, vh ` v q @pvh, v q P Vh ˆ V (2.9) ˚ By taking alternatively vh “ 0 and v “ 0, we obtain the following system of two equations:

pLu , v q ` pLu˚, v q “ pf, v q @v P V h h h h h h (2.10) ˚ ˚ ˚ ˚ ˚ ˚ #pLuh, v q ` pLu , v q “ pf, v q @v P V where the first equation is at grid scale while the second one at subgrid scale. The idea is, at first, to “solve” the subgrid scale equation approximating its solution, then to substitute this solution in the grid scale equation. We rewrite the subgrid scale equation as: ˚ ˚ ˚ ˚ ˚ pLu , v q “ pLuh ´ f, v q @v P V (2.11) ˚ which, in strong form, reads Lu “ Luh ´ f. The expression for the infinite dimensional part of the solution follows: ˚ ´1 u “ L pLuh ´ fq

This solution results to be driven by Luh ´ f, that represents the residual of the subgrid equation. Now we make our approximation: since the inverse of the differential operator is not easy to compute, we estimate it with the constant τ. ˚ u – τpLuh ´ fq (2.12) Now we can substitute the approximation of u˚ in the grid scale equation:

pLuh, vhq ` pLpτpLuh ´ fqq, vhq “ pf, vhq @vh P Vh (2.13) and, using the definition of L˚, adjoint operator of L, we get: ˚ pLuh, vhq ` pτpLuh ´ fq,L vhq “ pf, vhq @vh P Vh (2.14) Here we can see the analogy with the SUPG stabilization formulation where the advective part of the differential operator L substitutes L˚ in 2.7. Indeed, this formulation can be obtained with a similar procedure, leading to a stabilization parameter τ strictly related to the approximate solution of the subproblem for u˚ P V ˚. Using an advanced analysis based on, for example, the employment of Green’s function or maximum principle, several formulas for the approximation of tau can be shown (see (HSF18, chap. 3.6)). In particular, for linear finite elements, τ can be obtained as: h 1 τ “ pcothpPeq ´ q (2.15) 2|b| Pe where Pe is the Peclet number defined in 2.5. This choice comes from the imposition of nodal exactness of the solution in 1D problems with uniform grids (see (C97)). Alternative formulas will be presented in section 4.3. 2.3. ISOGEOMETRIC ANALYSIS 11

2.3 Isogeometric analysis

Isogeometric Analysis ((CHB09),(Q17, chap.11)) – commonly abbreviated as IGA – is a technique for the spatial approximation of PDEs based on the so called isoge- ometric concept. Extensively developed in the last decade, IGA originally aimed at restoring the centrality of the geometric representation of the computational domain in the numerical approximation of PDEs. In particular, it allows the exact repre- sentation of certain curved shapes, eliminating, for example in the case of a circular boundaries, the domain approximation from the sources of error. This approach considers Non-Rational Uniform B-Splines (NURBS) bases both for the construc- tion of the finite dimensional space and for the geometry representation. In this way IGA gets perfectly integrated with CAD procedures that exploit the same kind of functions. We will provide a brief overview of NURBS–based IGA in the framework of the Galerkin method, starting from the definition of B–splines and NURBS basis functions and geometries. Then we will address the isogeometric concept and the approximation properties of the Galerkin method.

2.3.1 B-Spline basis functions B-splines are piecewise polynomials which are built from a linear combination of basis functions with local support and controlled continuity. They are built starting ˆ ˆ ˆ from a knot vector Ξ “ tξ1, ξ2, ..., ξn`p`1u, where p is the polynomial degree and n is the number of basis functions. The vector Ξ contains non-decreasing real values (possibly repeated) belonging to a parameter domain (usually r0, 1s). We indicate by mi the multiplicity of the i-th knot and we call element each of the intervals delimited by subsequent knots (leaving the possibility of having null size elements in case of mi ě 1). We will consider only open knot vectors, meaning that first and last nodes are taken with multiplicity p ` 1. A knot vector defines the set of basis functions through the following recursive relation (Cox-de Boor formula) ˆ ˆ 1, ξ P rξi, ξi`1q Ni,0pξq “ (2.16) #0, otherwise

ξ ´ ξi ξi`p`1 ´ ξ Ni,ppξq “ Ni,p´1pξq ` Ni`1,p´1pξq i “ 1, ..., n (2.17) ξi`1 ´ ξi ξi`p`1 ´ ξi`1

We indicated the i-th basis function of degree p as Ni,ppξq. All the basis functions are non-negative, have a regularity degree at each knot which coincides with p ´ mi and are infinitely differentiable out of the knots. Moreover, each function has support over p ` 1 knots and shares the support with at most 2p ` 1 other basis functions. A remarkable property of the set tNi,ppξqui for any fixed p is that it constitutes a partition of unity.

2.3.2 B-Spline geometries Through a geometrical mapping, defined from the parametric domain into the physi- d cal space R , B-Spline curves (or, more in general, surfaces and solids if as parameter k space R is chosen, with k ď d) can be defined starting from the basis functions presented in the previous section. A set of control points belonging to the physical space, each one associated to a basis function, is needed too. Control points are n indicated as tP iui“1. d The B-Spline geometry φ : Ωˆ Ñ R is defined as n φpξq “ P iNi,ppξq (2.18) i“1 ÿ It is interesting to note that open knot vectors give rise to curves having extrema coinciding with the first and the last control points. It is not true, in general, that the other control points belong to the curve. 12CHAPTER 2. PROBLEM: SUPG STABILIZATION AND ISOGEOMETRIC ANALYSIS

2.3.3 NURBS basis functions B-Splines, being piecewise polynomials, does not ensure the exact representation of conic sections. That is why NURBS are introduced. To define the basis functions a set of real positive weights is needed: let’s indicate it with tw1, ..., wnu. The i ´ th NURBS basis function of degree p is given by:

Ni,ppξqwi Ni,ppξqwi Ri,ppξq “ “ n (2.19) W pξq j“1 Nj,ppξqwj

ˆ d ř Note that Ri,p : Ω Ñ R is no more a piecewise polynomial; anyway, the letter p is still used to indicate the degree of B-Splines from which NURBS are derived. NURBS geometries can be generated with an analogous procedure to the one used in the previous section.

2.3.4 NURBS as trial space for the solution of the Advection- Diffusion problem The commonly indicated NURBS–based IGA relies on the very same NURBS basis functions first used to represent the computational domain of a PDE also to build later the finite dimensional trial space. In standard finite elements it is the choice of the basis functions for the solution that conditions the meshing procedure. On the other hand, in isogeometric analysis the basis is chosen to be suitable for the geometry, and then it is used for the solution space. Let’s choose, in the Galerkin formulation (1.3), Vh as the space spanned by the NURBS basis functions: Nh Vh “ spantRi, ui“1 so that the Galerkin solution can be expressed as

Nh uhpxq “ RiUi (2.20) i“1 ÿ for some suitable coefficients Ui, known as control variables (also referred to as degrees of freedom or DOFs). It can be shown that this produces a numerical solution that converges to the exact one, since the NURBS basis functions satisfy some properties required by the Galerkin theory. The NURBS basis is not an interpolatory basis as the finite element Lagrangian basis. Indeed, if we assume that a control point P i lays in Ω, the value taken by the approximate solution uh in such control point uhpP iq does not coincide in general with the corresponding control variable Ui. Moreover, some of the control points used to build the computational domain may even lay outside Ω; in those cases, the approximate solution at these points is not defined.

2.3.5 Mesh refinement and convergence results The convergence of the method is guaranteed by the theory of Galerkin methods and isoparametric elements. The proof of the analogous result for classical finite elements is complicated by the fact that it is hard to find suitable interpolation estimates using NURBS functions, because the basis functions are not polynomial and because they can have support over several elements. By introducing suitable functional spaces and deriving estimates on those spaces, an interpolation result can be obtained and it can be deduced that the order of convergence of the solution obtained with isogeometric analysis using NURBS of degree p is the same as that of classical finite elements of the same degree. In particular, provided the solution and the domain are regular enough, the following estimates hold:

p`1 ku ´ uhkL2 ď Ch (2.21)

p ku ´ uhkH1 ď Ch (2.22) 2.3. ISOGEOMETRIC ANALYSIS 13

Note that the order of convergence is independent from the degree of continuity of the basis functions. This fact is particularly relevant if we consider that we can refine the mesh introducing less degrees of freedom than we would in classical finite elements. By the term refinement of a geometry, in this context, we refer to procedures that enrich the functional basis used to parametrize such geometry while maintaining it geometrically and parametrically unchanged. More general convergence results can be found in (CHB09). B-splines offer three possibilities for refinement:

• h-refinement (also known as knot insertion or knot refinement): we keep the same polynomial order, but insert additional knots in the knot vector(s)

• p-refinement (also know as order elevation): we keep the same knot vector(s) but increase the polynomial degree of the basis functions;

• k-refinement: we perform p-refinement and subsequently insert new knots, so that the degree of the basis functions and their continuity across knots increase simultaneously, while knot spans become smaller

Instead of h-refinement, p-refinement or a k-refinement can be used to exploit the convergence properties of the method. Under suitable conditions and regularity hypoteses, the following interpolation estimate holds:

r s´r ´ps´rq s ku ´ ηhkH ď Ch pp ´ k ` 1q kukH (2.23) where p ě 1 is the polynomial degree,k ě 0 is the global continuity of the basis, and they are suchthat p ď 2k ` 1,and for any 0 ď r ď k ď s and u P HspΩq. In the estimate C is independent from h, p and k.

Chapter 3

Artificial Neural Networks

In this chapter we will explain how Artificial Neural Networks (ANNs) work. We will describe their structure introducing the notation that will be used also in the following chapter for the formalization of our problem and we will recall their fun- damental property of universal approximation of continuous functions (R96). After explaining the meaning of some recurrent terms, we will describe the algorithm used during the training phase, highlighting the steps which are going to be modified by our original implementation.

3.1 Structure of an Artificial Neural Network

The fundamental unit of an ANN is the neuron. It consists in a processing unit which, taking as input a finite number of signals, computes a weighted sum, subtracts a threshold and produces one single output applying a non-linear activation function. In this work we focus on feed forward ANNs (also known as multilayer perceptrons) namely groups of neurons organized in layers so that the outputs of one layer are the inputs of the next one. Every neuron is connected to all the neurons of the next and the previous layer. As described here, a feed forward ANN can be seen as a (highly non-linear) map between inputs and outputs.

3.1.1 Notation

In general, a multilayer perceptron consists in L ` 1 layers (input is considered as layer 0, so that there are L ´ 1 hidden layers and the output one is the L-th) which can have different numbers of neurons each (nk with k “ 0, ..., L). Neurons will be identified by their local number in the layer (neuron i in layer k, with i “ 1, ..., nk). pkq Weights and biases will be denoted in the following way: wij P R is the weight of the connection from the j-th neuron of layer k ´ 1 to the i-th neuron of layer k, pkq bi P R is the threshold of the i-th neuron of layer k. Weights and biases will be frequently called just parameters. A generic activation function will be indicated by σ so that the output of a single neuron can be calculated as

pkq pkq ai “ σpzi q

pkq where zi is the biased weighted sum, computed as

nk´1 pkq pkq pkq pkq zi “ wij aj ` bi j“1 ÿ

15 16 CHAPTER 3. ARTIFICIAL NEURAL NETWORKS

Figure 3.1: Neural network structure, and action of the i ´ th neuron

3.1.2 Design of an ANN Some features of an ANN, the ones which define its structure, have to be chosen a priori, before the training process takes place (actually, some of this characteristics can be tuned during the model selection phase, which we will not describe being out of the scope of this project). For example, the number of hidden layers and the dis- tribution of neurons inside the ANN are two examples of so called hyperparameters, namely a group of parameters (that will not be updated during the training) which not only affects the performance of the ANN but also represents its actual definition. The number of layers and neurons which maximizes the learning capability of the ANN depends on the specific application. Another important design feature to be chosen is the set of activation functions. In this field there are few classical choices we will now present, remembering that any Heaviside-like function mimicking the behavior of biological neurons can be accepted. The most common activation functions are: • hyperbolic tangent: sigmoidal function with output belonging to p´1, 1q ex ´ e´x σpxq “ ex ` e´x

• logistic: sigmoidal function with output belonging to p0, 1q 1 σpxq “ 1 ` e´x

• rectified linear (relu): output is strictly positive

0 x ď 0 σpxq “ #x x ą 0

• softplus: strictly monotone with output strictly positive

σpxq “ lnp1 ` exq

ANNs of the type described above can be employed in solving two kinds of problems: classification and approximation. The first one arises when there is a pattern which needs to be identified, some features and some classes to which the objects of the analysis belong. We will not go into the details of pattern recognition since our task falls under the second category. We only notice that proper activation functions have to be chosen to obtain as (discrete) output the classification of the instance. Approximation tasks, instead, aim at recovering the continuous relation between inputs and outputs. Note that, despite being the choice of the hyperparameters so important, there are no universal rules to determine the best ANN setup to solve a generic problem. Anyway, the following fundamental result makes our effort justified. 3.2. UNIVERSAL APPROXIMATION PROPERTY 17

3.2 Universal approximation property

As we already pointed out, in an approximation problem an ANN can be seen as a map continuously linking in and out values. The following result (Cybenko, 1989. See (C89)) states that the set of feed forward ANNs with single hidden layer can approximate any continuous function over a compact set. Let σ : R Ñ R be a noncostant, bounded and continuous function (activation function). Let Im be the m-dimensional unit hypercube and CpImq the space of real-valued continuous functions on Im. Then the following theorem holds:

m Theorem 1 @ ą 0, @f P CpImq, DN P N, Dvi, bi P R, Dwi P R such that we may define N T F pxq “ viσpwi x ` bq i“1 ÿ as an approximate realization of f, that is: |F pxq ´ fpxq| ă  @x P Im.

The theorem can be reformulated in the framework of ANNs in the following way:

Theorem 2 Every ANN with a single hidden layer can approximate with arbitrarily small error any continuous function on a compact set, provided that a sufficient number of hidden neurons are employed.

3.3 Backpropagation and training

We will now describe the training algorithm, at first giving a rapid overview and then, once the notation and some important terms are well understood, going into some more technical details (see (R96, chap.7) for more detailed descriptions). Learning, from the point of view of an ANN, means adapting to some sample observations (the training set), gaining in this way the ability to generalize and predict results starting from inputs not included in the training set itself. The key ingredients for the learning phase are a loss function (which evaluates the current performance of the Neural Network) and an optimization algorithm (e.g. gradient descent, which is responsible of finding the minimum of the loss). The scheme of the algorithm is the following:

1. Weights and biases are randomly initialized.

2. Inputs are fed into the ANN and outputs are obtained forward-propagating the values; outputs are used to evaluate the loss function.

3. The gradient of the loss function with respect to every parameter is computed.

4. Weights and biases are updated according to the direction indicated by the gradient.

Steps 2, 3 and 4 are repeated until the algorithm stops because a termination condition is reached. Possible termination criteria adopted are the maximum num- ber of iterations, desired value of the loss function, time elapsed. Step 3 contains the actual core of the algorithm and will be explained in detail in a while. In the following section we will enumerate the most important concepts involved in the training process, accompanying them with a brief explanation.

3.3.1 Some terminology Loss function Measures the performance of the ANN. Results obtained from the inputs are com- pared to targets through a suitable measure and the loss value summarizes the goodness of the comparison. Classical examples are mean squared error measure (for approximation problems) and cross entropy (for classification). 18 CHAPTER 3. ARTIFICIAL NEURAL NETWORKS

Optimization algorithm Needed to find the minimum of the loss function. Optimization algorithms can be classified depending on two criteria: memory consumption and time to reach con- vergence. Usually the fastest ones need also more memory to be run: it’s the case of Levenberg-Marquardt and quasi-Newton methods, which exploit the computation of second derivatives to accelerate convergence; gradient descent and conjugate gra- dient, instead, are better suited for optimization over big data sets, when memory consumption can become an issue.

Training set It’s the set of instances (inputs and corresponding outputs) used to make the model learn. Its values are fed into the Neural Network repeatedly until the predicted outputs are close enough to the target values.

Validation (or selection) set It is used during the learning phase too, but ANN parameters are not updated based on its values. Inputs and targets are used to control overfitting: the loss computed on this data set (which contains values that the Neural Network meets for the first time) is compared to the one obtained with the training set; if the ANN performs well on training set but not on validation then overfitting happened and learning should be stopped.

Testing set Contains data used to test the performance of the ANN once it has been trained.

Epoch Defines the time frame in which the whole data set in scanned once by the Neural Network.

Batch Usually, data instances are not processed one at a time; in fact, groups of inputs are chosen to calculate groups of outputs. The evaluation of the loss function takes into account the results from the whole batch so that, during the update of the parameters, single values don’t spoil the generalization capability of the ANN.

Learning rate States the magnitude of the decrease of the parameters during training. It can be considered the most important hyperparameter of an ANN, even if, differently from the layers and the neurons numbers, can be adapted during training.

3.3.2 The learning algorithm The learning algorithm in its generality is called backpropagation. The idea is the following: one of the parameters is slightly perturbed and the effect of this perturbation over the value of the loss is observed. In this way the sensibility of the loss with respect to that specific parameter (namely: its derivative) is measured. Once the procedure is done for every parameter, the gradient of the loss function is known and it is possible to update the parameters in order to reduce the loss value during the next iteration. In practice, the derivatives of the loss can be calculated via chain rule and this is the reason why the algorithm is named in this way. To show how the differentiation process works in detail we need to select, without losing generality, a specific loss function (from now on denoted by L ). Since we 3.3. BACKPROPAGATION AND TRAINING 19 used an approximation of an L2 norm in our implementation, (mean) squared error can be considered as the benchmark loss. Explicitly:

nL pLq 2 L “ pai ´ yˆiq (3.1) i“1 ÿ Ei looooomooooon wherey ˆi, i “ 1, ..., nL are the target values (whose value is contained in the training set) corresponding to the nL outputs of the ANN, while Ei is the contribution to the total error given by the i-th output. Let’s analyze the algorithm in detail following the numeration of the steps in- troduced in the previous section. About steps 1 and 2 there are only a couple of things to be pointed out.

1. Parameters initialization: since the Neural Network is optimized to work with signals of order zero, not only inputs have to be scaled but also the param- eters should respect some kind of constraint. That’s why, usually, initialization values are chosen randomly in the interval p´1, 1q.

2. Forward propagation: a batch of inputs is selected from the training set and processed by the ANN. Each neuron computes an affine transformation of its (local) inputs applying its own activation function to the result. The value of the local output of each neuron is saved for later use and, at the end of the propagation, the global outputs of the Neural Network are available. They are used to evaluate the loss and check whether the ANN performances increased.

3. Backward propagation: here is where the gradient of the loss function is calculated. Let’s start considering the simplest cases: the derivatives of the loss w.r.t. the weights of the connections between layer L ´ 1 and the layer of the outputs. Recalling the notation previously introduced, they can be computed as:

nL nL pLq pLq BL BEi BEi Ba Bz “ “ ¨ i ¨ i (3.2) pLq pLq pLq pLq pLq Bw i“1 Bw i“1 Ba Bz Bw ij ÿ ij ÿ i i ij with: BEi L “ 2pap q ´ yˆ q (3.3) pLq i i Bai pLq Ba L i “ σ1pzp qq (3.4) pLq i Bzi pLq Bz L 1 i “ ap ´ q (3.5) pLq j Bwij Note that, at this point, all the values in 3.3 are known thanks to the forward propagation performed in step 2. Computing derivatives with respect to parameters belonging to previous layers is just a matter of applying the chain rule. We will show how it can be done for a weight in the last but one layer; the procedure can then be generalized to all the parameters of the ANN. In this case the derivative is:

nL nL pLq pLq pL´1q pL´1q BL BEt BEt Ba Bz Ba Bz “ “ ¨ t ¨ t ¨ i ¨ i pL´1q pL´1q pLq pLq pL´1q pL´1q pL´1q Bwij t“1 Bwij t“1 ˜Bat Bzt Baj ¸ Bzi Bwij ÿ ÿ (3.6) with: pLq Bz L t “ wp q (3.7) pL´1q ti Baj and the value of the other terms understood from the previous formula. 20 CHAPTER 3. ARTIFICIAL NEURAL NETWORKS

Note that the derivative with respect to the biases can be written in the same way just substituting the value of the last term, since for biases holds

k Bzp q i “ 1 (3.8) pkq Bbi

The gradient is now available and can be written (after a proper ordering of the parameters) as:

BL BL BL BL ∇L “ ,..., ,..., ,..., , wp1q wp1q wp1q wp1q «B 11 B 1n1 B n21 B n2n1 BL BL BL BL ,..., , ,..., (3.9) p1q p1q p2q pLq Bb1 Bbn1 Bw11 BbnL ff

4. Parameters update: depending on the optimization algorithm selected for training, gradient is used to determine the ”best direction” of decrease in the parameters space. The size of the increment has to be chosen wisely, large enough to decrease loss as much as possible but without overtaking the minimum. There is no general rule to set a priori an optimal learning rate; on the contrary, this hyperparameter should be tuned during the specific epoch exploiting specific algorithms. In simple ANN it is sometimes set and kept fixed during the whole training. Chapter 4

A Neural Network to learn the stabilization parameter

Our idea is to construct a Neural Network that predicts the best value of the SUPG stabilization parameter (τ) given the coefficients that characterize the advection- diffusion PDE problem. To achieve this goal we exploit the structure of a multilayer perceptron described in the previous chapter, adding a couple of steps to the compu- tation of the loss function. This addition is required from the fact that, differently from what happens in a standard approximation problem, in which the training set contains directly the targets that have to be compared with the outputs of the Neural Network, in this case we do not know which is the best value of τ. What is available is the exact solution (possibly analytical, but the same idea holds also for a reference solution computed on a grid fine enough to resolve the boundary layers) that we use to bypass the direct comparison.

4.1 Our Neural Network scheme

The input candidates in our neural network are the coefficients that define the advection-diffusion problem and the Finite Elements space (see figure 4.1), namely:

• element number k

• advection coefficient µpxq

• diffusion coefficient bpxq • mesh granularity h

• polynomial degree of the Finite Element approximation p

In the actual implementation we will describe in chapter6, we chose to consider as unique input the diffusion coefficient (which, moreover, we assumed constant in space), while b, h and p are kept fixed. Moreover, the predicted value of τK will

Figure 4.1: Scheme of the ANN built to predict the SUPG parameter

21 22CHAPTER 4. A NEURAL NETWORK TO LEARN THE STABILIZATION PARAMETER be assumed to be constant over the domain: in the following we will neglect the subscript k that indicates the dependency from the specific element of the mesh. Possible developments of this simple (but not trivial) model will be discussed in the conclusions. As can be seen from figure 4.1, we designed our ANN scheme in such a way that its output can be interpreted as the value of the stabilization parameter. At this point, being the best value of τ the unknown object of interest, we needed to involve the exact solution in the computation of the error (note that, if an analytical solution is not available, a reference solution on a fine grid should be computed for every training instance). An L2 norm error between the numerical solution obtained from the stabilized Finite Elements and the reference one is computed, eventually approximating it via proper quadrature rules. To make the backpropagation process working we have to implement a proper way to numerically differentiate the just introduced mapping - the one that asso- ciates to τ the value of the FE solution in Gauss quadrature points - since direct computations via chain rule are not possible in this case.

4.2 Mathematical formulation of the problem

In this section we address the issue of translating in formal notation the training of the ANN presented above. To achieve this goal, we will make use of the notation introduced in chapter3. Moreover, we have seen that the characterizing element of our scheme is the introduction of a new method to evaluate the correctness of the output, method which involves the computation of an integral over the domain. So, we will also need the formal language normally adopted for the analysis of PDEs, with particular attention to numerical aspects such as gaussian integration. First, we write the loss function as:

2 L “ puhpx; τq ´ upxqq dΩ (4.1) żΩ

given u is the exact solution and uhpx; τq highlights the dependency of the FE solution from the stabilization parameter. From now on we will drop the dependency from τ to lighten notation a little bit. Obviously, the integral in 4.1 cannot be computed exactly and a gaussian quadrature rule is needed to estimate its value. We indicate with xk and mk the Gauss points and the Gauss weights, respectively. Given the total number of elements nE , the (local) number of Gauss quadrature points nG andn ˆG its total counterpart, we can write:

nE nG nˆG j j j 2 2 L « mk puhpxk; τq ´ upxkqq “ F mkˆ puhpxkˆ; τq ´ upxkˆqq (4.2) j“1 k“1 ˆ ÿ ÿ kÿ“1

To write the equality in 4.2, a proper global ordering of the gaussian nodes have to be performed. An example is depicted in figure 4.2. In analogy with the symbols used in section 3.3.2 we can indicate the contribution of a single point to the total 2 error with Ekˆ “ puhpxkˆ; τq ´ upxkˆqq , allowing to write the expression of the loss function as:

nˆG

L « mkˆ Ekˆ (4.3) ˆ kÿ“1

Notice the similarity of the 4.3 with 3.1: Ekˆ recalls the term Ei and the values mkˆ are just constants. 4.2. MATHEMATICAL FORMULATION OF THE PROBLEM 23

Figure 4.2: Numbering of Gauss points (in black) follows the global numbering of the elements

At last, we recall that the ANN models the relation between the input µ and the output τ exploiting its layered structure. The SUPG parameter can than be expressed as:

nL´1 pLq pLq pL´1q pL´1q τ “ σpa1 q “ σ w1i σpai q ` bi “ ... “ fpµq (4.4) ˜ i“1 ¸ ÿ Clearly, differentiating τ with respect to each one of the parameters of the ANN follows the usual rule presented in the previous chapter. We are now interested in calculating the gradient of the loss function in the parameter space (see 3.9). Once again, the chain rule comes into play: the derivative with respect to, for example, the weight of a connection between layers L ´ 1 and L can be computed as:

nˆG nˆG pLq pLq BL BEˆ BEˆ Buhpxˆq Ba Bz “ m k “ m k ¨ k ¨ 1 ¨ 1 (4.5) pLq kˆ pLq kˆ Bu px q pLq pLq pLq Bw ˆ Bw ˆ h kˆ Ba Bz Bw 1j kÿ“1 1j kÿ“1 1 1 1j

pLq Observing that in this case a1 coincides with τ, we can assign the following values to the derivatives above:

BEkˆ “ 2puhpxkˆq ´ upxkˆqq Buhpxkˆq

Bτ L “ σ1pzp qq (4.6) pLq 1 Bz1

pLq Bz L 1 1 “ ap ´ q pLq j Bw1j Note that the last two expressions are identical to the ones in 3.4 and 3.5, while the first one resembles 3.3. What does not have any analytic expression is the rate of change of the sampled solution w.r.t. the output of the Neural Network. We chose to estimate it numerically using a finite difference:

Buhpxˆq Buhpxˆ; τq uhpxˆ; τ ` hq ´ uhpxˆ; τq k “ k « k k (4.7) pLq Bτ h Ba1

where the parameter h has to be tuned. As in the classical ANN differentiation, derivatives w.r.t. parameters belonging to previous layers can be computed via chain rule with no modification of what we have shown in section 3.3.2. 24CHAPTER 4. A NEURAL NETWORK TO LEARN THE STABILIZATION PARAMETER

4.3 Expected results

In the analysis of the results we will be guided by a set of classical formulas that pre- dict the optimal value of the SUPG parameter. We list them in order of complexity, referring to chapter2 and bibliography for their theoretical framing.

• In the first place we know that, using linear polynomials for the Finite Ele- ments approximation on a uniform mesh, the following formula represents a good estimate of the optimal value of the SUPG parameter:

h τK “ δ (4.8) |bpxq|

with δ subject to the only constraint of being positive. The same formula can be adopted also in the case of polynomials of higher degree, provided that the following inequality is satisfied:

δ ă kp4

being k a constant and p the polynomial degree (see (Q17, chap. 13)).

• The formula 2.15, that we rewrite here for convenience

h 1 τK “ pcothpPeq ´ q (4.9) 2|b| Pe guarantees an estimate that considers also the value of the diffusion coefficient, which is included in the definition of the Peclet number.

• In the end, the expression

1 τ “ (4.10) K 2|bpxq| 4µpxq h ` 2 K hK

which is obtained from imposing a discrete version of the maximum principle in 1D, can be extended to multidimensional cases ((HEV07), (C97)) such as the problems we will solve in chapter7.

4.4 Implementation aspects

4.4.1 Keras

For the implementation of the Neural Network, our first choice was Keras, a high- level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or . The easy and clear way of use of Keras helped us in getting used to how a Neural Network works in practice: starting from some examples we experimented different types of ANN frameworks and we started to understand how training is performed. At this point, we began to code our network on Keras, using Matlab to solve the PDE problem and connecting it to Python through a Matlab engine. Proceeding in the construction of the Neural Network we found out that Keras does not permit to write the differentiation of a function that is part of the network. Adding the calculation of the solution as part of the loss function we introduced an external call and, since Keras uses symbolic auto-differentiation, it is not capable of computing the derivative of compositions of functions that does not use TensorFlow symbolic operations. Of course it cannot handle a function that uses an entire program written in Matlab. 4.4. IMPLEMENTATION ASPECTS 25

4.4.2 C++ libraries

Summing up, our aim is to implement a Neural Network in such a way that we can customize it in the scheme and even in the differentiation process. Once we understood this was not possible using Keras, we decided to switch to a couple of C++ libraries we will describe in depth in the next chapter.

Chapter 5

Isoglib and OpenNN

We chose to use as PDE solver the library IsoGlib, while we exploited OpenNN to build the Artificial Neural Network. Both the libraries are written in C++. In the following sections we will present the structure of these libraries, focusing on the features we have exploited most in this project.

5.1 Structure of IsoGlib

IsoGlib is a library for isogeometric analysis which takes care of defining and solving different kinds of PDE problems. We will describe this library adopting the perspec- tive of the user who needs to solve a specific advection-diffusion-reaction problem, with given geometry, boundary conditions and advection-diffusion-reaction param- eters. In the next chapter, after the overview of the existing features of IsoGlib library, we will show how we easily implemented the SUPG stabilization. The process of solving a PDE with IsoGlib can be outlined as follows:

1. generation of a file containing information about geometry;

2. definition of a class specifying the values of the forcing term and ADR param- eters;

3. definition of a class which states how to assemble the local stiffness matrix;

4. creation of a problem and setup using the previously defined structures;

5. solution and export of the results.

5.1.1 Definition of the problem The first ingredient for the setup is the geometry of the domain, which has to be defined in external files. Those files (referred to as meshload data files from now on) are created using Matlab scripts that employ the NURBS (NUR) package to handle B-spline and NURBS geometries. To generate the meshload.dat we used the script Matlab Tools/createMeshes.m. It calls mesh .m where, together with the geometry data, the user can store the connectivity information, fix the type of boundary conditions and, in case of Dirichlet ones, impose the value of the solution on the boundary. The forcing term and the coefficients can be set by the user overriding the meth- ods contained into the IsoGlib class data class interface, defined in Core/data.hpp. The data class can also be used to set information about the ex- act solution and its gradient, allowing error computation and convergence order estimate. The way data are used to assemble the stiffness matrix is ruled by another class that inherits from LocalMatrixFast. Essentially, this class specifies how the integration on each element is performed to construct the local stiffness matrix entries. For example, an adr local matrix object can be instantiated to solve advection-diffusion problems but this choice can lead, as we already pointed out, to

27 28 CHAPTER 5. ISOGLIB AND OPENNN

inaccurate solutions in the case of advection dominated flows. In section 6.1.2 we will describe how we introduced supg local matrix. A class named Problem represent the implementation of the PDE. Besides con- taining two members that point at data class interface and LocalMatrixBase (alias for local matrix class) objects, instances of this class also own pointers to TimeAdvancing and Solver classes.

5.1.2 Main steps in solving process Despite its name recalling time dependent problems, a TimeAdvancing object im- plements the routines to solve also stationary PDEs. Its core method is

1 [public] virtual void TimeAdvancing:: computeSteadyStep(Solver ∗ s o l v e r)

which takes as unique argument a pointer to the abstract class Solver.A DefSolver inheriting from the abstract class is available, implements the method [public] void DefSolver:: solve(bool assembleRHSOnly=false) override

and owns a member named m solver of type solve tangent class. This last object implements the method [public] int solve t a n g e n t c l a s s:: assemble(solution c l a s s&solution, l o c a l m a t r i x c l a s s&localMatrix, bool inUnsteadySolve)

to perform assembly and, once the linear system has been built, solves it calling [public] int solve t a n g e n t c l a s s:: solveSteady(solution c l a s s&solution)

After this call the computed solution is available inside a solution class object, pointed by a SubProblem object which in turn is owned by the Problem one. To access it it’s enough to write

problem instance. getSubProblem() ´>getSolution() ´>s o l r

since the sol r member of solution class is defined as public.

5.1.3 Export and visualization of the results The class VTKExporter provides the required interface to export the solution in vtu format, so that it can be analyzed using an application for scientific visualization, for example with ParaView. Note that, by default, the exporter produces a file containing the evaluation of the solution in more points than the number of the dofs, interpolating the numerical solution thanks to the knowledge of the basis of the discrete space.

5.2 Structure of OpenNN

OpenNN is a library which implements Artificial Neural Networks, from the building of the network itself to its training and utilization. We chose OpenNN because it allows a high degree of customization: in this sense, besides the choice of the hyperparameters, we noticed that the default classes can be extended by the user without particular limitations. As described in chapter3, we want to customize our network adding a new loss function, capable of computing the stabilized solution of the problem, once the optimal value of the stabilization parameter has been predicted by the Neural Network. In the following sections we will describe some general features of the library before exploring in detail the training routine. Section 5.2.8 will allow us to reach the key point of our work in the following chapter.

5.2.1 Uses of the library OpenNN is really versatile and allows to deal with different kind of problems in the framework of machine learning. It can be used, for example, to achieve the following goals: feature selection, classification and pattern recognition, clustering, 5.2. STRUCTURE OF OPENNN 29

regression. Only the last task is involved in our project. The library comes with some already implemented examples which make use of benchmark datasets in the field of machine learning. For example, pima indians diabetes and iris plant are two prototypes of a pattern recognition tasks, while airfoil self noise and yacht hydrodynamics design exemplify function regression problems.

5.2.2 Vectors and matrices The whole library relies upon two class templates: Vector and Matrix.

Vector class template It’s a class template derived from the vector class in Standard Template Library, so every known method is still available and the common operations (such as con- struction, access, comparison ecc.) can be performed as usual. No new members are introduced. Some ad hoc methods are defined to perform recurrent tasks in Neural Networks algorithms: for example, calculate L1 norm computes the norm of a vector of doubles; dot performs the dot product between two vectors of the same length; some specific methods allow to save and load a Vector from file.

Matrix class template This class template inherits from Vector and adds three (private) members: rows number, columns number and header, containing the names of the columns to mimic a data frame. The easiest way to initialize Matrix object is through a Vector list. The reference operator is implemented by mean of round brackets, so that m(0,1) = 2 can be interpreted as: assign to the element in first row and second column the value 2 (indexing follows the usual rules of C++). Again, there are methods implementing frequently needed tasks, such as calculate columns minimums maximums (which is used to scale inputs before feed- ing them into the network) or calculate sum squared error (which is called to compare outputs and targets and compute the value of the loss function).

5.2.3 DataSet class The data set contains all the information needed to build the model. Every col- umn represents a particular variable and each row corresponds to one sample. Data sets can be easily loaded from data file through a DataSet object, specifying their name and the type of separator in the main (methods set data file name and set separator) and then calling load data. By default the last column has been set as output and the remainder as inputs. To modify this behavior a pointer to the subclass Variables is needed. Through the method Instances::split random indices it’s also possible to modify the default subdi- vision of the data set into training, validation and testing sets (60%, 20% ad 20% respectively, with randomly chosen instances). Finally, this class implements some useful preprocessing methods (e.g. scale inputs minimum maximum) to scale input and outputs into the range of unity. Here is an example of the loading of a data set and the setup of an input and a target variable: // Data loading from file DataSet data s e t; d a t a s e t.set d a t a f i l e n a m e(”data/sample d a t a s e t.txt”); 4 d a t a s e t.set s e p a r a t o r(”Tab”); d a t a s e t.load d at a();

// Variable characterization V a r i a b l e s ∗ v a r i a b l e s p o i n t e r= data s e t.get v a r i a b l e s p o i n t e r(); 9 v a r i a b l e s p o i n t e r ´>set name(0,”mu”); v a r i a b l e s p o i n t e r ´>s e t u s e(0, Variables:: Input); v a r i a b l e s p o i n t e r ´>set name(1,”dof 1”); v a r i a b l e s p o i n t e r ´>s e t u s e(1, Variables:: Target); 30 CHAPTER 5. ISOGLIB AND OPENNN

14 // Data scaling const Vector< S t a t i s t i c s > i n p u t s s t a t i s t i c s= data s e t. scale inputs minimum maximum(); const Vector< S t a t i s t i c s > t a r g e t s s t a t i s t i c s= data s e t. scale targets minimum maximum();

5.2.4 NeuralNetwork class The Artificial Neural Network in which we are interested in is a simple feed forward ANN. It is composed by an input layer, an output layer and one or more hidden layers of neurons in between. The simplest constructor of a NeuralNetwork object takes as arguments three integers: the number of inputs, the number of neurons in the (unique) hidden layer, the number of outputs. A MultilayerPerceptron object is built and the relative pointer inside the NeuralNetwork updated. Scaling and unscaling layers are added, respectively, before the the first layer of neurons and after the last one, making use of the methods construct scaling layer and construct unscaling layer. Probabilistic layers (which rescale outputs so they can be interpreted as probabilities) are available too, but are used only in pattern recognition problems which are outside the scope of our project. Here is an ex- ample of declaration of a simple NeuralNetwork, after a DataSet object has been instatiated: // Neural Network initialization const size t inputs number= variables p o i n t e r ´>get inputs number(); const size t hidden perceptrons number = 12; 4 const size t outputs number= variables p o i n t e r ´>get inputs number(); NeuralNetwork neural network(inputs number, hidden perceptrons number, outputs number);

// Scaling and unscaling layers addition neural network.construct s c a l i n g l a y e r(); 9 ScalingLayer ∗ s c a l i n g l a y e r p o i n t e r= neural network. g e t s c a l i n g l a y e r p o i n t e r(); s c a l i n g l a y e r p o i n t e r ´>s e t s t a t i s t i c s(inputs s t a t i s t i c s); neural network.construct u n s c a l i n g l a y e r(); UnscalingLayer ∗ u n s c a l i n g l a y e r p o i n t e r= neural network. g e t u n s c a l i n g l a y e r p o i n t e r(); u n s c a l i n g l a y e r p o i n t e r ´>s e t s t a t i s t i c s(targets s t a t i s t i c s);

5.2.5 TrainingStrategy class This class implements all the concepts needed for the training of the ANN. The main ones are encapsulated into two abstract classes: LossIndex and OptimizationAlgorithm. Before exploring in detail their properties, notice that the construction of a TrainingStrategy object is very intuitive: it just takes as arguments two pointers to NeuralNetwork and DataSet.

5.2.6 LossIndex class The choice of a proper loss function is one of the critical issues in the ANN designing process. This abstract class provides all the virtual methods that can be specialized depending on the desired type of comparison between outputs and targets. This methods (properly overridden inside the derived classes) are called repeatedly during the training phase. When a TrainingStrategy object is constructed the loss method member is set by default to NORMALIZED SQUARED ERROR; to change it there’s a proper setter which takes care also of updating the corresponding pointer, by means of which the object derived from LossIndex becomes accessible to the user. Many types of loss functions are already available and can be adapted to different needs. For example, CROSS ENTROPY ERROR can be adopted for binary classification problems and MINKOWSKI ERROR is preferable when dealing with outliers, since is less sensible to extreme value with respect to the standard MEAN SQUARED ERROR; 5.2. STRUCTURE OF OPENNN 31

5.2.7 OptimizationAlgorithm class The second fundamental step is the choice of the algorithm used for training. Serveral alternatives are available: GRADIENT DESCENT, CONJUGATE GRADIENT, QUASI NEWTON METHOD and many others. The user can set a lot of properties of the class which affect the training routine, such as the maximum numbers of epochs, the goal for the loss value, its minimum decrease between two iterations and the maximum time of a run. Here is an example of declaration, setup and use of a TrainingStrategy object: // Training strategy initialization 2 TrainingStrategy training s t r a t e g y(&neural network,&data s e t); t r a i n i n g s t r a t e g y.set t r a i n i n g m e t h o d(TrainingStrategy:: GRADIENT DESCENT); // Optimization algorithm setup GradientDescent ∗ g r a d i e n t d e s c e n t m e t h o d p o i n t e r= training s t r a t e g y. g e t g r a d i e n t d e s c e n t p o i n t e r(); g r a d i e n t d e s c e n t m e t h o d p o i n t e r ´>set maximum epochs number(10); 7 g r a d i e n t d e s c e n t m e t h o d p o i n t e r ´>s e t m i n i m u m l o s s d e c r e a s e(1.0e ´6) ; // Loss function setup t r a i n i n g s t r a t e g y.set l o s s m e t h o d(TrainingStrategy::MEAN SQUARED ERROR) ;

// ANN training 12 TrainingStrategy:: Results results= training s t r a t e g y.perform t r a i n i n g ()

5.2.8 Backpropagation During the training process backpropagation takes place. We will briefly describe the main methods involved in the differentiation of the loss function, taking GradientDescent::perform training() as benchmark routine and focusing on the methods we need for our work. Moreover, NORMALIZED SQUARED ERROR is taken as reference loss function; the choice is motivated by the fact that our implementa- tion starts from this class and overrides some of its methods. For the theoretical description of backpropagation we refer to section 3.3. The most important variables defined inside perform training() are: [double] training loss: value of the loss function during training [Vector] gradient: derivative of the loss with respect to parameters [double] training rate: rate of decrease of the loss function.

training loss (in epoch 0): During the first scan of the data set the value of the loss (depending on the randomly initialized parameters) is computed calling the purely virtual method [public] virtual double LossIndex:: calculate t r a i n i n g e r r o r() const=0

Its implementation contains a loop in which the error is computed exploiting the subdivision of the data set into batches. Inputs are read and, through a pointer to the MultilayerPerceptron object, method [public] Matrix MultilayerPerceptron:: calculate o u t p u t s(const Matrix& inputs) const

is called to obtain the corresponding outputs. Comparison between outputs and targets is then implemented depending on the loss function chosen. With NORMALIZED SQUARED ERROR a summation of the squared differences between outputs and targets is performed.

gradient: Now the gradient can be computed and used to find the local steepest descent direction. The method which overrides 32 CHAPTER 5. ISOGLIB AND OPENNN

[public] virtual Vector LossIndex:: c a l c u l a t e t r a i n i n g e r r o r g r a d i e n t() const=0 updates his local variable training error gradient with the gradient of the batches. To do so four fundamental calls are necessary: 1. [public] FirstOrderForwardPropagation MultilayerPerceptron:: calculate f i r s t o r d e r f o r w a r d propagation( const Matrix&) const

Computes the value of all the activation functions and their derivatives for all the inputs instances contained in the current batch. Values are saved in the members of a FirstOrderForwardPropagation struct named [Vector>] layers activations [Vector>] layers activation derivatives both of dimensions (layers number) x (batch size) x (neurons number in the layer).

2. The layers activations of the last layer are the outputs of the ANN and are passed as first argument to

[public] Matrix NormalizedSquaredError:: c a l c u l a t e o u t p u t g r a d i e n t (const Matrix & outputs, const Matrix & targets) const

which returns the gradient of the loss function with respect to the outputs. In this specific case, being the loss a discrete sum of squared errors, return value is just 2*(outputs-targets). The returned value is saved in the local variable [Matrix] output gradient of dimensions (batch size) x (outputs number).

3. output gradient is passed as second argument to [public] Vector > LossIndex:: calculate l a y e r s d e l t a (const Vector >& layers a c t i v a t i o n d e r i v a t i v e, const Matrix& output g r a d i e n t) const

which finalizes the computation of the derivatives via chain rule.

4. Finally

[public] Vector LossIndex:: calculate e r r o r g r a d i e n t (const Matrix & inputs, const Vector < Matrix >& l a y e r s activations, const Vector < Matrix >& l a y e r s d e l t a) const

is called passing the indicated arguments. This method is put in charge of filling the vector containing the gradient performing the last products be- tween computed derivatives and values obtained during the forward propa- gation. The method Vector::tuck in(const size t &, const Vector &) allows to build the gradient layer by layer.

It could be useful to notice the correspondences between OpenNN variables and theoretical notation adopted in section 3.3. The indexes have the following meaning: i indicates the local neuron, k is the layer and b is the batch index, which is typical of the implementation and does not have a theoretical counterpart. We recall that indexing in C++ starts from 0.

• [Vector] training error gradient[i-1] “ p∇L qi

• [Matrix] targets(b,i-1) “ yˆi

pLq • [Matrix] outputs(b,i-1) “ ai 5.2. STRUCTURE OF OPENNN 33

pkq • [Vector>] layers activations[k-1](b,i-1) “ σpzi q “ pkq ai • [Vector>] layers activation derivatives[k-1](b,i-1) 1 pkq “ σ pzi q

BEi • [Matrix] output gradient(b,i-1) “ pLq Bai

pLq BEi Bai • [Vector>] layers delta[L-1](b,i-1) “ pLq ¨ pLq Bai Bzi training rate and training loss (in epoch ą 0) The training direction is then deducted and used to establish the magnitude of the step to be taken in the parameters space. A learning rate algorithm object, member of the selected OptimizationAlgorithm, is used to estimate the best train- ing rate and, meanwhile, to compute the updated value of the loss function. The method responsible of these computations is Vector LearningRateAlgorithm:: calculate directional p o i n t( const double&, const Vector &, const double&) const

The default algorithm employed is the Brent’s one. Golden section can be chosen as an alternative; otherwise the training rate can be set to a constant value.

Chapter 6

Implementation: interface of OpenNN and IsoGlib

In this chapter we will describe the portion of the code we developed, the way it is integrated into the OpenNN library and how it makes use of the SUPG stabilization we added to IsoGlib to solve PDEs.

6.1 SUPG solver in IsoGlib

Using the library IsoGlib we coded a solver for an Advection-Diffusion-Reaction problem stabilized using the SUPG method.

6.1.1 Our test class: SUPGdata This class contains the definition of all the parameters and data involved in our problem. It inherits from the IsoGlib class data class interface that already contains all the methods needed to load the geometry and boundary conditions data from the mesh file. Each one of the following functions compute the value of the parameters at the point (xx,yy,zz) at time t. All the methods are const and they override the base class ones.

void diff coeff(Real *outValues, Real xx, Real yy, Real zz, Real t) returns the diffusion coefficent

void beta coeff(Real *outValues, Real xx, Real yy, Real zz, Real t) returns the advection coefficient

void gamma coeff(Real *outValues, Real xx, Real yy, Real zz, Real t) returns the reaction coefficent (in our tests it is always set to 0)

void source term(Real *outValues, Real xx, Real yy, Real zz, Real t) returns the value of the right hand side of the PDE problem

void sol ex(Real *outValues, Real xx, Real yy, Real zz, Real tt) returns the exact solution (if available) of the PDE problem

void grad sol ex(Real *outValues, Real xx, Real yy, Real zz, Real tt) returns the gradient of the exact solution (if available) of the PDE problem

35 36CHAPTER 6. IMPLEMENTATION: INTERFACE OF OPENNN AND ISOGLIB

void lapl sol ex(Real *outValues, Real xx, Real yy, Real zz, Real tt) returns the laplacian of the exact solution (if available) of the PDE problem

6.1.2 SUPGLocalMatrix class This class, that inherits from the IsoGlib class LocalMatrixFast, defines how the stiffness matrix of the Finite Element solution of the problem is assembled. The parameters of the problem are set using the data class interface object (in our case it will be of type SUPGdata). The main methods of this class are: integrate on gauss point and rhs on gauss point, both used for the assembly phase, respectively for the stiffness matrix and the rhs. In particular, in these function we added the SUPG stabilization implementation, coding the stabilization parameter as a private member of the class that can be set by the user. An example of how our new classes interact and a solution to our problem is produced, can be found in the folder isoglib/Tests/GuidoVidulisADRExactSol. This test has been the model from which we coded the solution of the PDE in the interface class that we will describe right now.

6.2 New OpenNN classes

The main program of our work is built as an example for the OpenNN library, in order to construct and train our network. In the following section we will describe the new classes we added in OpenNN library that are involved in the training of the ANN.

6.2.1 IsoglibInterface class This class represents the interface between the libraries IsoGlib and OpenNN.

Members In its private members are stored all the data and information needed by IsoGlib to solve our specific SUPG problem. • [const char] *director name is the name of the directory where the pro- gram can find the mesh data file. • unsigned nDof is the number of points in which IsoGlib calculate the solution. • unsigned nElems is the number of elements of the mesh. • unsigned nGaussPoints is the number of gauss point used for the numer- ical integration for each element. • double h is the mesh refinement parameter. • double tau scaling is the scaling value for the dimensionless stabilizaton parameter prerdicted by the network. • SUPGdataBase * data pointer is a pointer to the IsoGlib object that con- tains the numerical setting of the advection diffusion problem that we are considering. In our case we provide 2 test cases, so two different classes that inherit from SUPGdataBase. • [Problem] pde prob is the IsoGlib object used to solve the PDE. • [TimeAdvancing] timeAdvancing is the IsoGlib object that is used to differ- entiate a time dependent problem froma steady one. • [supg local matrix *] localMatrix pointer points to the stiffness matrix used by IsoGlib to solve the PDE problem. 6.2. NEW OPENNN CLASSES 37

Public methods The IsoGlibInterface constructor takes as input the name of the folder in which meshload.dat is located (results of the call to IsoGlib will be saved there too) and the SUPGdataBase pointer; moreover it calls the private method set problem resolution that immediately loads the mesh and initializes the pde prob member. The number of degrees of freedom, of mesh elements and gauss points have to be set by the user by means of the method set nDof. This function take care of defining the mesh refinement and the scaling parameter for τ, too. In- deed the network works with dimensionless values, due to the scaling of the inputs, but the stabilization parameter needed to solve the PDE is a dimensional quantity. For this reason the scaling is performed using the characteristic values of the con- sidered problem and exploiting formula 4.8, which is the simplest way to obtain an estimate of the order of magnitude τ. The main public method of this class [public] Vector IsoglibInterface::calculate solution( double tau, double mu) takes the input µ and the output τ of the network and returns a vector containing the corresponding PDE solution, computed in the gauss points of the mesh. First, a scaling is computed for the value of tau predicted by the network, using the member tau scaling. After that, the private member solveSteady is called and the solution of the problem is stored inside the pde prob member. Through an IsoglibInterface object we are able to obtain the values needed to calculate our loss function, comparing the expected and the current solution. To do this, we experimentally found out that perform a L2 discrete comparison between the nodal values of the IsoGlib solution and exact solutions computed in the same nodes, does not lead to a significant measurement of the performance of the approximation. Indeed it happened that the predicted solution was far away from the exact one, but this type of loss indicated an optimal value. To solve this problem we employ the most natural error measure in the field of PDEs: the L2 norm. We compute it through gaussian numerical integration, as described in section 4.2. For this reason, the second part of the function calculate solution changes the nodes involved in the comparison, taking care of computing the value of the solution in gauss points, using a loop on all the elements of the mesh.

[public] Vector IsoglibInterface:: calculate s o l u t i o n(double tau, double mu) { double tau for EDP= tau ∗ t a u s c a l i n g; 4 solveSteady(tau for EDP, mu);

// used to acces sol r s o l u t i o n c l a s s ∗ s o l u t i o n p o i n t e r= pde prob. getSubProblem() ´> getSolution(); 9 // get elements of this process const int numMyElements= pde prob. getSubProblem() ´>getMesh() ´> getNumMyElements();

14 // solution Vector s o l u t i o n o n g a u s s p o i n t s( numMyElements ∗ nGaussPoints, 0 );

// compute value of the solution on gauss points for 19 // each element f o r( int locE=0; locE < numMyElements; locE++ ) { // element const Element&element= pde prob. getSubProblem() ´>getMesh() ´> 24 getGeometryMapParam(). getLocalElement( locE); const vect i n t&funcs= element.functions;

// for each Gauss points i n t numGauss= pde prob. getSubProblem() ´>getMesh() ´> 38CHAPTER 6. IMPLEMENTATION: INTERFACE OF OPENNN AND ISOGLIB

29 getGeometryMapParam(). getNumGaussPoints( locE); f o r( int locG=0; locG < numGauss ; ++locG) { // pointer to basis function values const GaussPoint ∗ point; 34 const ShapeValues ∗ basisValues; pde prob. getSubProblem() ´>getMesh() ´>getGeometryMap(). getBasisValues( locE, locG,&point, nullptr, &basisValues, nullptr);

39 // Gauss point const Real xx gp= point ´>physCoords[ 0 ]; const Real yy gp= point ´>physCoords[ 1 ]; const Real zz gp= point ´>physCoords[ 2 ]; const Real gWt= point ´>physWeight; 44 const int lfuncs=(int) funcs.size(); f o r( intk= 0;k < l f u n c s;k++ ) { const int dof= pde prob. getSubProblem() ´> 49 getDofManager() ´>getDofMapper(). getDof2( funcs[k], 0); const Real sol= solution p o i n t e r ´>s o l r[ dof]; s o l u t i o n o n g a u s s p o i n t s[ nGaussPoints ∗ locE+ locG] += basisValues ´>R[k] ∗ s o l; 54 }

} // end iterate gauss point

} // end iterate elements 59 return solution o n g a u s s p o i n t s; }

Private methods The private methods of this class take care of the setting and solution of the PDE problem using IsoGlib. 1. void setProblem( data c l a s s i n t e r f a c e ∗data, TestCase:: ProblemFunc setupProblem);

It fills the Problem member using functional data stored in the SUPGdata object and geometry data, that loads from the mesh file meshload.dat 2. void solveSteady(double tau,double mu);

It assembly the local SUPGlocalmatrix using the values of mu and tau given in input, and then solve the problem, saving the current solution inside the pde prob member.

6.2.2 Customized loss function: OutputFunction class In this class we coded the loss function described in section 4.2. We chose NORMALIZED SQUARED ERROR as reference loss function: our customized class OutputFunction inherits from the class NormalizedSquaredError.

Members • [IsoglibInterface *] isoglib interface pointer is the pointer to an IsoglibInterface object.

• [Matrix] unscaled inputs here are stored the original inputs of the network. They are used by IsoglibInterface to compute the corre- sponding solution. 6.2. NEW OPENNN CLASSES 39

Private methods 1. [private] Matrix c a l c u l a t e PDE s o l u t i o n(const Matrix < double>& tau values, const Vector & batch i n d i c e s) const;

Calculates the solution of the PDE given the stabilization parameter pre- dicted by the neural network. Here batch indices indicates the indices of the instances contained in the current batch. They are used to select the corresponding inputs. Its implementation contains a loop over the current batch in which the solution is computed Vector temp solution = isoglib interface pointer-> calculate solution(tau, mu); where mu is picked from the unscaled inputs and tau from the tau values, outputs of the network. 2. [private] Matrix OutputFunction:: calculate l o s s d e r i v a t i v e(const Matrix& tau values, const Vector & b a t c h i n d i c e s) const

It computes the gradient of our loss function with respect to the stabilization parameters. This method is involved in our customized backpropagation pro- cess, indeed the loss function is now composed by two function, and we have to differentiate both of them. First, the derivative of the normalized squared loss function with respect to the PDE solution values is computed: Matrix simple loss derivatives = PDE solutions - targets Secondly, the derivative of the PDE solution with respect to the stabilization parameter is computed. Matrix PDE solutions derivatives = calculate PDE solution derivative(tau values, batch indices) This can be done thanks to the private member calculate PDE solution derivative already described. At the end, the implementation of this function contains a loop in which the composition of the two derivatives is computed, exploiting the subdivision of the dataset into batches. 3. [private] Matrix c a l c u l a t e PDE s o l u t i o n d e r i v a t i v e(const Matrix < double>& tau values, const Vector & batch i n d i c e s) const;

This method calculates the derivative of the PDE solution with respect to the stabilization parameter. Since the function that relates these two values involve a call to many IsoGlib functions, this derivative cannot be compute exactly, but it must be subject to an approximation. How we explained in section 4.2 we choose to approximate it using Numerical Differentiation. In this method this approximation is obtained through two sequential call to the function that computes the PDE solution: Vector y = isoglib interface pointer->calculate solution(tau, mu); Vector y forward = isoglib interface pointer->calculate solution(tau + h, mu); Vector derivative = ( y forward - y ) / h;

Public methods that perform override The following methods are directly involved in the training process and they override the version implemented inside the parent class NormalizedSquaredError. We will 40CHAPTER 6. IMPLEMENTATION: INTERFACE OF OPENNN AND ISOGLIB describe them focusing on the differences with the description given in section 5.2.6. The idea is, given the stabilization parameter as output of the network, obtain the corresponding PDE solution and then compare it with the target, using the ”standard” normalized squared loss function . 1. [public] virtual double OutputFunction:: calculate t r a i n i n g e r r o r() o v e r r i d e

In this method, the calculate the loss for the training dataset, we have added the call to the function that, given the output of the network, calculate the corresponind PDE solution. Matrix PDE solutions= calculate PDE s o l u t i o n s(outputs, t r a i n i n g b a t c h e s[static c a s t (i)]);

At this point, the normalized squared loss function is called to compare this solution to the given targets. const double batch error = PDE solutions.calculate sum squared error(targets);

2. The procedure just described is reapeated in other two overridden functions [public] double OutputFunction::calculate selection error() const

[public] double OutputFunction::calculate training error (const Vector& parameters) const whose scope is the same, but are used in different points of the training pro- cess. 3. [public] Vector OutputFunction:: calculate t r a i n i n g e r r o r g r a d i e n t()

This function is crucial in the backpropagation process and its parent class version has been deeply described in section 5.2.8. This call makes the difference with respect to the NormalizedSquaredError version: const Matrix output gradient = calculate loss derivative (first order forward propagation.layers activations[layers number-1], training batches[static cast(i)]); Indeed we have substituted the call to calculate output gradient, that cal- culates the derivative of the loss function with respect to the output of the net- work, with the call to the new private member calculate loss derivative.

This function still calculates the gradient of the loss function with respect to the outputs(stabilization parameters), but since we have enriched our loss function,this is now computed in a different way, as previously described.

Implementation of the private methods 1. /// \param tau v a l u e s Stabilization parameters[ c u r r e n t b a t c h s i z ex 1] /// \param batch i n d i c e s Indeces of the current batch, used to select the inputs[current b a t c h s i z e] 3 /// \ return PDE solutions[ c u r r e n t b a t c h s i z exn Dof]

Matrix OutputFunction:: calculate PDE s o l u t i o n(const Matrix& tau values, const Vector & b a t c h i n d i c e s) const { unsigned elements number= isoglib i n t e r f a c e p o i n t e r ´> 8 get nElems(); 6.2. NEW OPENNN CLASSES 41

unsigned gauss points number= isoglib i n t e r f a c e p o i n t e r ´> get nGaussPoints(); unsigned current b a t c h s i z e= batch i n d i c e s.size(); Matrix s o l u t i o n s(current b a t c h s i z e, 13 elements number ∗ gauss points number);

f o r(size ti=0;i < c u r r e n t b a t c h s i z e;i++) { Real mu= unscaled i n p u t s(batch i n d i c e s[i], 0); 18 Real tau= tau v a l u e s(i, 0); Vector t e mp s ol u t io n= i s o g l i b i n t e r f a c e p o i n t e r ´> c a l c u l a t e s o l u t i o n(tau, mu);

23 s o l u t i o n s.set row(i, temp s ol u t io n); }

return solutions; }

2. /// \param tau v a l u e s Stabilization parameters[ c u r r e n t b a t c h s i z ex 1] /// \param batch i n d i c e s Indeces of the current batch, used to select the inputs[current b a t c h s i z e] 3 /// \ return Derivative of the loss function wrt the stabilization parameter[current b a t c h s i z ex 1]

Matrix OutputFunction:: calculate l o s s d e r i v a t i v e(const Matrix& tau values, const Vector & b a t c h i n d i c e s) const { unsigned current b a t c h s i z e= batch i n d i c e s.size(); 8 Matrix PDE solutions= c a l c u l a t e PDE s o l u t i o n(tau values, batch i n d i c e s); const Matrix t a r g e t s= d a t a s e t p o i n t e r ´>g e t t a r g e t s(batch i n d i c e s); 13 Matrix s i m p l e l o s s derivatives= PDE solutions ´ t a r g e t s; Matrix PDE s o l u t i o n s derivatives= c a l c u l a t e PDE s o l u t i o n d e r i v a t i v e(tau values, 18 b a t c h i n d i c e s);

Matrix c o m p o s e d l o s s derivatives(current b a t c h s i z e, 1);

23 f o r(inti=0;i < c u r r e n t b a t c h s i z e;i++) { c o m p o s e d l o s s derivatives(i, 0)= s i m p l e l o s s derivatives.get row(i). dot(PDE s o l u t i o n s derivatives.get row(i)); 28 }

return composed l o s s derivatives; }

3. /// \param tau v a l u e s Stabilization parameters[ c u r r e n t b a t c h s i z ex 1] /// \ return Derivative of the PDE solution wrt the stabilization parameter[current b a t c h s i z exn Dof] Matrix OutputFunction:: calculate PDE s o l u t i o n d e r i v a t i v e (const Matrix & tau values, const Vector & b a t c h i n d i c e s) const 4 { unsigned elements number= isoglib i n t e r f a c e p o i n t e r ´> get nElems(); unsigned gauss points number= isoglib i n t e r f a c e p o i n t e r ´> get nGaussPoints(); 42CHAPTER 6. IMPLEMENTATION: INTERFACE OF OPENNN AND ISOGLIB

9 unsigned current b a t c h s i z e= batch i n d i c e s.size(); Matrix s o l u t i o n s derivatives(current b a t c h s i z e, elements number ∗ gauss points number);

f o r(inti=0;i < c u r r e n t b a t c h s i z e;i++) 14 { double mu= unscaled i n p u t s(batch i n d i c e s[i], 0); double tau= tau v a l u e s(i, 0);

doubleh = 0.1; 19 Vector y= isoglib i n t e r f a c e p o i n t e r ´> c a l c u l a t e s o l u t i o n(tau, mu); Vector y forward= isoglib i n t e r f a c e p o i n t e r ´> c a l c u l a t e s o l u t i o n(tau+h, mu); Vector d e r i v a t i v e=(y forward ´ y)/h; 24 s o l u t i o n s derivatives.set row(i, derivative); }

return solutions derivatives; 29 }

6.3 SUPG example in OpenNN

In this customized example (OpenNN/examples/SUPG) is where we actually imple- mented the construction, training and testing of our neural network. Here we ex- ploited all the described features of OpenNNN library and all the new classes we wrote, to customize our network and carry out our idea.

6.3.1 Data of the problem In the first part, the number of the Test Problem is settled, according with the ref number, that indicates the mesh refinement. These information are enough to identify the folders where the data file meshload.dat and the dataset SUPG.txt can be found.

1 unsigned test number=1; unsigned ref number=3; s t r i n g dataset name=”data/Test”+ to s t r i n g(test number) +”/ref”+ t o s t r i n g(ref number)+”/SUPG.txt”; s t r i n g meshload f o l d e r=”data/Test”+ to s t r i n g(test number) +”/ref” + to s t r i n g(ref number);

Then, loading the data from this txt file, a DataSet object is created,and the number of inputs and targets for the network are setted.

6.3.2 Training

The NeuralNetwork and TrainingStrategy objects are created, as described in section 5.2.4 and 5.2.5. In particular, for our network, we customized in this way our TrainingStrategy object:

• We set GRADIENT DESCENT as algorithm used for the training training strategy.set training method( "GRADIENT DESCENT")

• We set the loss method such that the training prcoess uses our customized loss function OutputFunction. The creation of the OutputFunction pointer is made giving in input the name of the folder where the mesh data file is stored and the data class (SUPGdata[numberoftest]) created accordingly to the test number. Some information about the PDE problem (gauss points number, el- ements number, dofs number) are then setted in the isoglibinterface pointer stored in the output function pointer. 6.3. SUPG EXAMPLE IN OPENNN 43

1 SUPGdata1 data1; SUPGdata2 data2; SUPGdataBase∗ d a t a p o i n t e r; i f( test number==1 ) d a t a p o i n t e r=&data1; 6 e l s e if( test number==2 ) d a t a p o i n t e r=&data2;

OutputFunction ∗ o u t p u t f u n c t i o n p o i n t e r= new OutputFunction( meshload folder, data p o i n t e r) }

11 t r a i n i n g s t r a t e g y.set l o s s i n d e x p o i n t e r(output f u n c t i o n p o i n t e r)

To perform training, once all the data have been setted, the method perform training() is called const TrainingStrategy::Results training strategy results = training strategy.perform training() and all the crucial results are saved in a TrainingStrategy::Results object. At the end of the training phase, we save the history of the training and validation errors, that we use (see7) to analyze our numerical tests. Vector l o s s h i s t o r y=training s t r a t e g y r e s u l t s. g r a d i e n t d e s c e n t r e s u l t s p o i n t e r ´>l o s s h i s t o r y; Vector s e l e c t i o n h i s t o r y=training s t r a t e g y r e s u l t s. g r a d i e n t d e s c e n t r e s u l t s p o i n t e r ´>s e l e c t i o n e r r o r h i s t o r y; s i z e t loss s i z e= loss h i s t o r y.size();

6.3.3 Testing

For the testing phase of our network, we used an object of type TestingAnalysis. In particular, through the function calculate target outputs is possible to compute the outputs of the network corresponding to the testing inputs of the dataset, instances that have never been used in the training phase, so unknown for the Neural Network.

1 TestingAnalysis testing a n a l y s i s(&neural network,&data s e t);

Vector> r e s u l t s= testing a n a l y s i s. c a l c u l a t e t a r g e t o u t p u t s();

To visualize the results, we scale the inputs using the inputs statistics data, and the stabilization paremters (outputs) using the scaling coefficents stored in the IsoglibI nterface pointer.

Chapter 7

Numerical Tests

We now present some numerical results obtained using Isogeometric elements of the first order to solve a two-dimensional advection-diffusion problem. For the testing phase of our network we chose a problem for which the value of the exact solution is known. This choice simplified the creation of the dataset, since the targets can be obtained with a simple function evaluation in the gauss nodes. However, this hypothesis can be removed: solving once on a very fine mesh is then needed (see section 4.1). The problem belongs to the family of PDEs 2.1, with the restrictions of µ and b constant in space. We fixed the domain Ω, the forcing term f, the mesh size h and the advection coefficient b, leaving only µ as variable. Its influence on the best value of the stabilization parameter is the object of our interest. The problem is defined given:

• Ω “ p0, 1q2 • b “ p1, 1q • fpxq “ sinp2πxqrµp2 ` 4π2py ´ y2qq ` p1 ´ 2yqs2π cosp2πxqp2 ` 4π2py ´ y2qq • g “ 0 so that the exact solution (represented in figure 7.1) is given by:

2 uex “ sinp2πxqpy ´ y q

As we can see in Figure 7.2, the numerical solution obtained for µ “ 10´4 presents numerical oscillations.

Figure 7.1: Exact solution in the case µ “ 10´4

45 46 CHAPTER 7. NUMERICAL TESTS

Figure 7.2: Numerical oscillations in the case µ “ 10´4

7.1 Training of the network

For this Test Case we created the SUPGdata1 class where we stored all the terms defining the PDE problem (see section 6.1.1 for more details). We tested our network on different datasets, generated with a Matlab script. The script takes care of evaluating the exact solution in the correct points (the gauss nodes), respecting the order in which pointwise comparison is performed inside OpenNN during the training of the ANN. The first problem we faced was the choice of the training set. We started con- sidering data sets with about 20 instances without obtaining satisfactory results, as we can see in Figure 7.3 the magnitude of τ increases as the one of µ do the same, behaviour that is not aligned with theory.

Figure 7.3: Predicted τ values after training with a 20 instances dataset

Then we tried distributing logarithmically the values of the input but predictions were even worse. When we started taking big data sets the results became better and better as the training set grows, suggesting that the number of instances is the main factor influencing the efficiency of the learning. So we decided to train our ANN with data sets of 200-300 instances. 7.2. PREDICTION OF THE SUPG PARAMETER 47

The most relevant indicators of the success of the learning process are the trend of training and validation error (see section 3.3.1). We report below the results we obtain from these data, corresponding to different data set. They show a good fit of data, indeed at first the validation error is greater of the train one, but progressing in the training it stabilizes just slightly higher.

Figure 7.4: Errors trend during the learning phase with µ P p10´4, 10´2q

7.2 Prediction of the SUPG parameter

After the training phase, we tested our network on the test set. Looking at the results we tried to understand if our model was able to reproduce the theoretical formula 2.15. First of all, we observed how the output of the Neural Network changed as the input values changed. As we can see in figure 7.5 the SUPG parameter predicted by the ANN increases as µ decreases, indicating the necessity of a stronger stabilization as the regime of the problem becomes more advection-dominated.

Figure 7.5: Stabilization parameter predicted by the ANN given µ P p10´4, 10´2q

The trend of the graph is reasonable. Anyway, comparing the predicted values of τ with the theoretical ones, we found a sensible discrepancy (see figure 7.6): our ANN predictions are almost one order of magnitude smaller than what formula 2.15 48 CHAPTER 7. NUMERICAL TESTS states. To understand whether our code was performing correctly we needed to better analyze the norm we have chosen to measure the error during the training.

Figure 7.6: Predicted vs theoretical τ for µ P p10´4, 10´2q

7.3 Analysis of the results

First of all, we have verified that the numerical solutions computed using the stabi- lization parameter predicted by the network does not present numerical oscillations. In figure 7.7 we can see the stabilized (by the network) numerical solution, corre- sponding to µ “ 10´4. Analyzing figure 7.6, we can deduce that the theoretical estimate is pessimist: it forecast a stabilization parameter that is greater than the one needed to eliminate oscillations.

Figure 7.7: Stabilized solution in the case µ “ 10´4

Since the aim of our Neural Network is to minimize the L2 norm between the stabilized and exact solution, we asked ourselves if the value of the parameter ob- tained with this procedure is coherent with the theoretical one. A priori, we do not have any guarantee that the best SUPG parameter from the point of view, for example, of the respect of the Maximum Principle (see section 2.2.1) is the same that minimizes the L2 error between exact and numerical solution. We used IsoGlib to solve the PDE problem with a fixed µ modifying the magnitude of τ in order to find the value of the stabilization parameter that minimizes the L2 error. Exploiting 7.3. ANALYSIS OF THE RESULTS 49 the classes put in charge of errors computation already implemented inside IsoGlib, we found that an optimal SUPG parameter exists (See figure 7.8) in all the cases tested, but does not coincide with the theoretical one.

Figure 7.8: The L2 error for µ “ 5 ¨ 10´4 computed for different values of τ using IsoGlib. The function has a well defined minimum, the corresponding value of τ is what we define optimal stabilization parameter.

Fixing different values of µ and looking for the minimum of the L2 error with respect to τ we obtained the results shown in table 7.3:

µ theoretical τ L2-optimal τ 1 ¨ 10´5 4.42 ¨ 10´2 9.56 ¨ 10´3 5 ¨ 10´5 4.42 ¨ 10´2 9.52 ¨ 10´3 1 ¨ 10´4 4.41 ¨ 10´2 9.46 ¨ 10´3 5 ¨ 10´4 4.39 ¨ 10´2 9.00 ¨ 10´3 1 ¨ 10´3 4.37 ¨ 10´2 8.43 ¨ 10´3 5 ¨ 10´3 4.17 ¨ 10´2 4.02 ¨ 10´3

Table 7.1: Comparison between values of τ predicted by 2.15 and by the ANN

As we can see the two values does not coincide. Since for the training of the ANN we used an analogous metric to the one adopted by IsoGlib, we can expect our pre- diction coming from the (well working) ANN to be closer to the value we measured experimentally. Comparing the predictions made by the ANN on the test set with the L2-optimal values just computed we can see the perfect agreement of the two. In figure 7.10 we can see the optimal prediction after the complete training of the Neural Network. Besides, in figure 7.9 we report the τ predictions after a training with just 2 epochs. It is clear how the forecasting of the value of τ gets better going ahead with the training. This result suggests that the Neural Network is actually performing well, learn- ing from the data set how to minimize the L2 error between exact and numerical solution. The discrepancy from the theoretical expectations could then be ascribed to the choice of the norm adopted in the loss function. 50 CHAPTER 7. NUMERICAL TESTS

Figure 7.9: L2-optimal and predicted τ values, training with 2 epochs

Figure 7.10: L2-optimal and predicted τ values, training with 9 epochs

7.4 L2 error trend in a different PDE

Once we understood that the ANN worked correctly, we changed the test problem. We fixed:

• Ω “ p0, 1q2 • b “ p1, 1q • g “ 0 and fpxq such that the exact solution is: 1 u “ e4ypy ´ 1qpx ´ 1qx ex 10 We expected, as formula 2.15 suggests, that the best SUPG parameter depends only on the value of the diffusion coefficient (once the other parameters have been fixed). Performing the same kind of analysis described in the previous section, we found that the best values for τ (with respect to L2 norm) corresponding to the same value of µ were now different. In figure (7.11) this difference can be seen evidently. 7.4. L2 ERROR TREND IN A DIFFERENT PDE 51

Figure 7.11: L2-optimal τ for two different problems

As a consequence of this behavior of the network, any effort of approximating the relation between µ and τ have to be connected to the problem object of analysis. In practice: at the present state, our ANN can not be used, since training and deployment must happen on the same PDE. Future developments will be discussed in the conclusions.

Chapter 8

Conclusions

In this work we have integrated Artificial Neural Networks with numerical solution of PDEs. Our effort has been both theoretical (we needed to formalize the problem unifying the notation of the world of ANNs with the one of PDEs) and practical, in the sense of development of original sections of code, integrated into pre-existent C++ libraries. Our research started with the aim of predicting values of the SUPG stabilization parameter for numerical solvers that employ high order polynomial degree, since an exact formula for these cases does not exist. The implementation we provided aims to lay the foundations of this method. In- deed, we considered polynomials of order one and we implemented a Neural Network that predicts the value of the SUPG stabilization parameter coherently with the al- ready existing formulas: the stabilization amount increases as the problem becomes more advection-dominated and the orders of magnitude of the two predictions are comparable. At first, we have seen that the Neural Network approach guarantees a good prediction of the τ if training and testing happen on the same problem, which is possible only if the exact solution is already known for a great number of values of the diffusion coefficient. We compared the predicted values of the stabilization parameters with the theoretical ones, and we found out that the results were not exactly aligned. The problem was that the predictions of the ANN were not valid independently from the considered problem, characteristic typical of the known for- mulas. Lacking this fundamental step, we postponed the study of the dependency of τ from the polynomial degree. We computed the optimal values of tau with respect to the L2 norm using IsoGlib library, and we made sure that the Neural Network was actually learning what we expected. Despite great performances in learning, the discrepancy from the theoretical results had an important consequence: diffi- culties in the generalization of the procedure using an ANN trained on a known problem to predict the stabilization of a PDE that the Network have never seen. Indeed, training on different test, we found that the optimal value of τ, once the diffusion coefficient, the mesh size and the polynomial degree are fixed, depends on the specific PDE we are studying, differently from what the theoretical formulas suggest. We concluded that this fact is not a problem of the implementation of the net- work, but it depends on the norm chosen to do it. Indeed, we are really obtaining, using the ANN, the optimal value with respect to the L2 norm, but this do not coincide with what we expected theoretically. Our results are still limited, since the number of simplifying conditions we as- sumed satisfied makes it difficult to use our code in applications of practical interest. Anyway, we think that many of the hypotheses can be dropped without changing the general perspective we have introduced here. For example, the availability of analytical solution for the test used to train the ANN can be bypassed solving the problem once on a grid fine enough to catch the boundary layers; dependency on polynomial degree of the Isogeometric Elements can be considered as input of the ANN simply acting on the IsoGlib setup and dependency on the mesh size can be studied in depth generating a sequence of nested grids.

53

Bibliography

[HB82] Hughes, Thomas J. R.; Brooks, A. N. (1982), A theoretical framework for Petrov-Galerkin methods with discontinuous weighting functions: application to the streamline-upwind procedure

[Q17] Quarteroni, Alfio (2017), Numerical Models for Differential Problems, Springer International; 16, pp. 61-114, 267-292, 315-362

[CHB09] Cottrell,J.Austin; Hughes, Thomas J. R; Bazilevs, Yuri (2009) Isogeomet- ric Analysis, John Wiley Sons, Ltd, pp. 19-108.

[AQ11] Quarteroni, Alfio; Antonietti, Paola F.A. (2013), Numerical perfor- mance of discontinuous and stabilized continuous Galerkin methods for convec- tion–diffusion problems, Proceedings of the International Conference on Numer- ical Methods for Hyperbolic Equations: Theory and Applications pp. 75-85

[HSF18] Hughes, Thomas J. R.; Scovazzi, Guglielmo; Franca, Leopoldo P. (2018), Multiscale and Stabilized Methods, Encyclopedia of Computational Mechanics Second Edition, pp. 1-64

[HEV07] Houzeaux, Guillaume; Eguzkitza Beatriz; V´azquezMariano (2007), A Variational Multiscale Model Based on the Analytical Solution of the Approx- imate Subgrid Scale Equation, Preprint submitted to Comp. Meth. Appl. Mech. Eng., 2007

[C97] Codina, Ramon (1997), Comparison of some finite element methods for solv- ing the diffusion-convection-reaction equation, Computer Methods in Applied Mechanics and Engineering, 156, pp. 185-210

[C89] Cybenko, George (1989), Approximation by superpositions of a sigmoidal function, G. Math. Control Signal Systems 2, p. 303

[RDQ19] Regazzoni, Francesco; Ded`e, Luca; Quarteroni, Alfio (2019), Machine learning for fast and reliable solution of time-dependent differential equations, Journal of Computational Physics num. 397

[OpenNN] OpenNN C++ library Copyright: Artificial Intelligence Techniques SL. Github page of OpenNN OpenNN website

[R96] Rojas, Raul (1996), Neural Networks - A Systematic Introduction, Springer International, pp. 55-99, 151-184

[NUR] NURBS octave/MATLAB package Nurbs package webpage

55