Neural Networks and Differential Equations

Neural Networks and Differential Equations Nathan Zaldivar October 2019 1 Introduction Artificial neural networks are computing systems designed to "learn" to perform tasks by considering examples, primarily without being programmed with task-specific rules. An artificial neural network is based on a collection of neurons which com- municate with each other. The idea of a mechanical approximation of a neuron was originally hypothesized in 1943 by Warren Mcculloch[3], but a neural network was not successfully created until 1959 when Windrow and Hoff developed MADALINE, which successfully filtered echos from phone lines[6]. Advancements in neural networks have recently led to dramatic improvements in classification performance in various applications in speech and natural language processing and computer vision[5]. Central to each of these problems is the idea of approximating some function of a d-dimensional variable, x 2 Rd. It has been shown that these functions may be approximated by the sum n X (2) (1) (1) f(x) = wj σ(wj x + bj ); (1) j=1 where w(1); w(2); b(1) 2 Rn are fixed constants. The most important result about equation (1) is that it uniformly approximates an arbitrary univariate function on the unit hypercube when σ is discriminatory[1]. The general format of equation (1) has allowed for neural networks to be applied to a wide range of problems; recently researchers have been utilizing neural networks as an alternate technique to approximate solutions to both ordinary and partial differential equations[4]. A main advantage of the application of neural networks to ODEs/PDEs is that it can provide an arbitrarily close approximation to equations in higher dimensions that are harder to solve with other numerical methods. PDEs are of paramount importance in financial math, however they are seldom used because it is inefficient to find solutions to find solutions to nontrivial models with traditional methods. We would like to solve this problem by utilizing neural networks. Ultimately, our goal for this semester is to be able to approximate non-trivial PDEs with applications to finance using neural network methods. 1 2 Background We define a general second order differential equation as follows ( H(~x; Ψ(~x); 5Ψ(~x); 52Ψ(~x)) = 0; 8~x 2 D (2) B(Ψ(~x); rΨ(~x)) = g; 8~x 2 @D: where B denotes the boundary condition, x 2 Rd, D ⊂ Rd is a compact domain, and Ψ(~x) is the solution to (2). We claim that it is possible to approximate Ψ(~x) using a multilayer perceptron with one hidden layer. First, we show that is possible to approximate an arbitrary solution to a general differential equation. The Kolmogorov{Arnold representation theorem states that any continuous function of k variables has an exact representation in terms of com- positions and superpositions of a finite number of functions of one variable [2]. We are guaranteed by the format of (2) that the solution Ψ is continuous on D, therefore it must have an exact representation. We then show that we may construct an approximation of Ψ that is within a given range of Ψ. Given any function F 2 C(Id), it is possible to find a sum, G(x), that is dense in the space C(Id)[1]. We claim that the sum may be defined as n X (2) (1) (1) G(x) = wj σ(wj x + bj ) (3) j=1 An outline of the proof that this sum accurately approximates F (x) follows. A function sigma is defined to be discriminatory if, for some measure µ in the space of signed Borel measures on In, σ(wT x + b)dµ(x) = 0: (4) ˆ Id for all wT 2 R and b 2 R implies µ = 0. Then, given any continuous discriminatory function σ, any function f 2 C(Id), and " > 0, then there exists a sum, G(x), of the form 3 such that jG(x) − F (x)j < " 8x 2 Id (5) This result is extremely important because it demonstrates that we can use a finite sum to approximate any continuous, differentiable function on Id. This is the result proven by Cybenko in [1]. We now define a loss function 2 L := jjH(x; G(x); rG(x); r G(x)jjl2(D) + jjB(G(x); rG(x)) − gjjl2(@D); (6) that will serve as a way of determining how accurate our approximation is. Using L as a measure of accuracy, G(x) is therefore determined to be argminG L(G). We can then leverage this strategy to approximate solutions to general differential equations by employing a numerical optimization method such as gradient descent. 2 3 Proposed Methodology We will begin by studying the conditions under which a neural network can accurately approximate a function. We will do this by closely reading Cybenko's 'Approximation by Superpositions of a Sigmoidal Function' and attempting to understand the proof of the universal approximation theorem. Then we will continue by creating our own neural network that approximates any continuous function on the unit line. We will train our neural network using stochastic gradient descent across randomly generated training data. Our programming and visualization will be done using Python. Next, we will learn how to use tensorflow to train neural networks and apply these skills to solve differential equations. We will finish the semester by utilizing tensorflow to solve partial differential equations pertaining to financial mathematics. References [1] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathe- matics of Control, Signals, and Systems, 5 (1992), p. 455{455. [2] A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, Seventeen Papers on Analysis American Mathematical Society Translations: Se- ries 2, (1963), p. 55{59. [3] G. Palm, Warren mcculloch and walter pitts: A logical calculus of the ideas immanent in nervous activity, Brain Theory, (1986), p. 229{230. [4] L. S. Tan, Z. Zainuddin, and P. Ong, Solving ordinary differential equations using neural networks, (2018). [5] R. Vidal, J. Bruna, R. Giryes, and S. Soatto, Mathematics of deep learn- ing, arXiv preprint arXiv:1712.04741, (2017). [6] Winter and Widrow, Madaline rule ii: a training algorithm for neural networks, IEEE 1988 International Conference on Neural Networks, (1988). 3.

Load more