On the Use of Neural Networks to Solve Differential Equations Alberto García Molina

Máster Universitario en Modelización e Investigación Matemática, Estadística y Computación 2019/2020 Trabajo Fin de Máster On the use of Neural Networks to solve Differential Equations Alberto García Molina Tutor/es Carlos Gorria Corres Lugar y fecha de presentación prevista 12 de Octubre del 2020 Abstract English. Artificial neural networks are parametric models, generally adjusted to solve regression and classification problem. For a long time, a question has laid around regarding the possibility of using these types of models to approximate the solutions of initial and boundary value problems, as a means for numerical integration. Recent improvements in deep-learning have made this approach much attainable, and integration methods based on training (fitting) artificial neural networks have begin to spring, motivated mostly by their mesh-free natureand scalability to high dimensions. In this work, we go all the way from the most basic elements, such as the definition of artificial neural networks and well-posedness of the problems, to solving several linear and quasi-linear PDEs using this approach. Throughout this work we explain general theory concerning artificial neural networks, including topics such as vanishing gradients, non-convex optimization or regularization, and we adapt them to better suite the initial and boundary value problems nature. Some of the original contributions in this work include: an analysis of the vanishing gradient problem with respect to the input derivatives, a custom regularization technique based on the network’s parameters derivatives, and a method to rescale the subgradients of the multi-objective of the loss function used to optimize the network. Spanish. Las redes neuronales son modelos paramétricos generalmente usados para resolver problemas de regresiones y clasificación. Durante bastante tiempo ha rondado la pregunta de si esposible usar este tipo de modelos para aproximar soluciones de problemas de valores iniciales y de contorno, como un medio de integración numérica. Los cambios recientes en deep-learning han hecho este enfoque más viable, y métodos basados en entrenar (ajustar) redes neuronales han empezado a surgir motivados por su no necesidad de un mallado y su buena escalabilidad a altas dimensiones. En este trabajo, vamos desde los elementos más básicos, como la definición de una red neuronal o la buena definición de los problemas, hasta ser capaces de resolver diversas EDPs lineales y casi-lineales. A lo largo del trabajo explicamos la teoría general relacionada con redes neuronales, que incluyen tópicos como los problemas de desvanecimiento de gradientes (vanishing gradient), optimización no-convexa y técnicas de regularización, y los adaptamos a la naturaleza de los problemas de valores iniciales y de contorno. Algunas de las contribuciones originales de este trabajo incluyen: un análisis del desvanecimiento de gradientes con respecto a las variables de entrada, una técnica de regularización customizada basada en las derivadas de los parámetros de la red neuronal, y un método para rescalar los subgradientes de la función de coste multi-objectivo usada para optimizar la red neuronal. I Acknowledgements To my advisor Carlos Gorria Corres, for his advice, and to my family and friends who have given me their support in all these months. II Preamble The structure of this work is divided into 5 chapters and 2 annexes. Chapter 0 starts by giving an initial pragmatic overview of multi-linear algebra. Its purpose is to give anyone foreign to this subject a working knowledge of tensors: defining their notation and how to operate with them. Tensors will be extensively used throughout Chapter 2 when describing artificial neural networks. Chapter 1 contains the actual introduction to problem at hand. Here we will be exploring the motivations for using artificial neural networks to numerically integrate initial/boundary value problems. On top of this, we will also be listing the differential operators that will be used, describe the general conditions under which we will be guaranteeing well-posedness, and examine state of the art. Chapter 2 will layout the theoretical framework of artificial neural networks. It will be covering the everything necessary to define and train a deep learning model from ground zero. The topics covered in this section include: definition and design choices, establishment of an objective (loss) function and non-convex optimization, and the use of regularization techniques. Although these topics are general to deep-learning, throughout this whole chapter we have adapted them, where necessary, to fit the subject of this work. Chapter 3 is the experimental part of this work. The first three sections contain the discussion on some practical issues, namely, the programming, approximating capacities of artificial neural networks and training multi-objective functions. Following these sections, lie the experiments and simulations of this work. Here we put into practice all the previous knowldege that we have build up to numerically integrate some instances of initial/boundary value problems. On each instance we benchmark and discuss the results for several set-ups based on the different architectures and training options seen up to this point. Chapter 4 has the final conclusions to this work. An analysis on the limitations andthe advantages of this technique with respect to others, as a way to approximate solutions of differential equations, is made. Also, based on the experience from this work, wesuggest possible lines of work and open related questions, which can be consider for further work. Annexes A & B include: a linear algebra perspective of some expressions in Chapter 2 for further clarity, and the code, respectively. III Contents Abstract I Preamble III List of Figures V Table Index VI 0 Overview of Multi-linear Algebra 1 0.1 What is a tensor? .................................. 1 0.2 Tensor Operations and Summation Convention ................. 2 0.3 Linear Algebra as Multi-linear Algebra ...................... 3 0.4 Derivatives of Vector Functions and Tensors ................... 4 0.5 The Chain Rule in Tensor Notation ........................ 5 1 Introduction 6 1.1 Posing the Problem ................................. 9 1.2 Relevant Literature ................................. 12 2 Artificial Neural Networks Framework 14 2.1 What are Artificial Neural Networks? ....................... 14 2.2 From Numerical Integration to Deep-Learning .................. 17 2.3 Derivatives: Back Propagation and Gradient Issues ............... 18 2.3.1 Derivatives Behaviour (Vanishing and Exploding Gradients) ...... 19 2.4 Optimizers ...................................... 25 2.4.1 First Order Methods ............................ 26 2.4.2 Second Order Methods ........................... 31 2.5 Activation Functions and Parameter Initialization ................ 35 2.5.1 Parameter Initialization .......................... 38 2.6 Regularization .................................... 39 2.6.1 Noise-based Regularizations ........................ 40 2.6.2 Restriction-based Regularizations ..................... 42 2.6.3 Other Regularizations ........................... 45 3 Case Studies and Simulations 46 3.1 Coding Artificial Neural Networks ......................... 46 3.2 Approximating a Function ............................. 47 3.3 Training with Multi-Objective Loss Functions .................. 51 3.4 Model Simulation .................................. 55 3.4.1 Model 1: The 1D Divergence Operator .................. 55 3.4.2 Model 2: The 2D Divergence Operator .................. 57 3.4.3 Model 3: The 2D Laplacian Operator ................... 61 3.4.4 Model 4: The 1D Advection Operator .................. 62 IV 3.4.5 Model 5: The 2D Clairaut Operator ................... 64 3.4.6 Model 6: The 2D Burgers Operator .................... 65 4 Conclusions 68 4.1 Author’s Final Thoughts .............................. 69 4.2 Further Work .................................... 69 A Linear Algebra Formulation of 2.3.1 70 B The Code 72 B.1 imports Cell ..................................... 73 B.2 auxiliryPlotting Class ............................... 73 B.3 myDataSets Class .................................. 79 B.4 problemInstance Class ............................... 82 B.5 secondOrderOptimizers Class ........................... 86 B.6 myLayer Class ................................... 87 B.7 myModel Class ................................... 89 B.8 execution Cell .................................... 100 Bibliography 102 List of Figures 2.1 Perceptron scheme. ................................. 14 2.2 A directed graph which could be a possible representation of the architecture or an artificial neural network. Nodes are artificial neurons and edges indicate which neurons feed into each other. ........................ 15 2.3 General scheme of a perceptron based fully-connected feed-forward artificial neural network. ................................... 15 2.4 Computational graph of example (2.6). ...................... 19 2.5 Computational graph (derivatives) of example (2.6). In green the flow of nodes required to compute 휕푓(푥, 푦)/휕푥 ......................... 19 2.6 Example model: A 2-3-4-2 artificial neural network. ............... 20 2.7 Main activation functions and their first order derivatives. ........... 36 2.8 Combination of sigmoid functions. ......................... 37 2.9 Secondary activation functions and their first order derivatives. ........ 37 2.10 Example of overfitting of a model.

On the Use of Neural Networks to Solve Differential Equations Alberto García Molina

CNC/IUGG: 2019 Quadrennial Report

The Method of Gauss-Newton to Compute Power Series Solutions of Polynomial Homotopies∗

Approximate Newton's Method for Tomographic Image Reconstruction

Petsc's Software Strategy for the Design Space of Composable

Sequential Quadratic Programming Methods∗

Arxiv:2011.07410V2 [Cs.CE] 3 Jul 2021

Trust Region Newton Method for Large-Scale Logistic Regression

Data-Scalable Hessian Preconditioning for Distributed Parameter PDE-Constrained Inverse Problems

International Conference on Differential, Difference Equations and Their Differential, Difference Applications

CREWES in 2016

Full Waveform Inversion and the Truncated Newton Method∗

Trust Region Newton Method for Large-Scale Logistic Regression