Universal Approximation Theory of Neural Networks by Simon Odense

Universal Approximation Theory of Neural Networks by Simon Odense B.Sc., University of Victoria, 2013 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the Department of Mathematics and Statistics c Simon Odense, 2015 University of Victoria All rights reserved. This thesis may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author. ii Supervisory Committee Universal Approximation Theory of Neural Networks by Simon Odense B.Sc., University of Victoria, 2013 Dr. Roderick Edwards, Supervisor (Department of Mathematics and Statistics) Dr. Farouk Nathoo, Departmental Member (Department of Mathematics and Statistics) iii ABSTRACT Supervisory Committee Dr. Roderick Edwards, Supervisor (Department of Mathematics and Statistics) Dr. Farouk Nathoo, Departmental Member (Department of Mathematics and Statistics) Historically, artificial neural networks have been loosely defined as biologically inspired computational models. When deciding what sort of network to use for a given task there are two things that need to be considered. The first is the representational power of the given network, that is what class of problems can be solved by this network? Given a set of problems to be solved by neural networks, a network that can solve any of these problems is called a universal approximator. The second problem is the ability to find a desired network given an initial network via a learning rule. Here we are interested in the question of universal approximation. A general definition of artificial neural networks is provided along with definitions for different kinds of universal approximation. We then prove that the recurrent temporal restricted Boltzmann machine (RTRBM) satisfies a general type of universal approximation for stochastic processes, an extention of previous results for the simple RBM. We conclude by examining the potential use of such temporal artificial neural networks in the biological process of perception. iv Table of Contents Supervisory Committee ii Abstract iii Table of Contents iv List of Figures vi Acknowledgements vii 1 Introduction 1 2 Artificial Neural Networks 3 2.1 Dynamics and Stability . 3 2.2 The Hopfield Network . 8 2.3 Feed-Forward Networks . 9 2.4 Kullbeck-Leibler Divergence and the Neuromanifold . 12 2.5 The Boltzmann Machine . 15 3 Universal Approximation 22 3.1 Universal Approximation for Globally Stable Networks . 22 3.2 Turing Completeness of Neural Networks . 25 3.3 Universal Approximation of Stochastic Processes . 26 4 Universal Approximation of Stochastic Processes by the TRBM and RTRBM 30 4.1 The TRBM . 30 4.2 The Generalized TRBM . 36 4.3 The Recurrent Temporal Restricted Boltzmann Machines . 38 5 Experimental Results 42 v 5.1 Experimental Results . 42 5.2 The Reconstructed Distribution . 43 5.3 Simulations . 44 6 Conclusions 46 6.1 The TRBM vs the RTRBM . 47 6.2 Artificial Neural Networks and the Human Brain . 47 Bibliography 48 vi List of Figures Figure 2.1 An example of a restricted Boltzmann machine with 3 visible units and 4 hidden units. ........................... 18 Figure 4.1 Interactions between sets of nodes within and between time steps: 0 each hc;v turns off every Hv0 : v 6= v in the subsequent time step. hc;v will only be on if v is observed at time t and collectively the control nodes Hc have negligible effect on the visible distribution. H0 is an additional set of hidden units that will be used to model the initial distribution. ........................... 33 Figure 4.2 The temporal connections of a control node. Each v(i)0 is a configu- ration not equal to v .......................... 40 Figure 4.3 The initial chain. H is the machine defined in the first part of the proof. Each connection from an li to H or Hk only goes to non- control nodes in the set (control nodes include the chains g(v;1); :::; g(v;m)). The li’s have no visible connections. 41 Figure 5.1 The reconstruction process: Starting with a sequence of input vec- T 0T tors, v0 , the hidden sequence h is produced after scaling the parameters of the RTRBM by α and β. From there the reconstructed sequence, vT , is produced either by sampling or mean-field approximation. Note that using a mean-field approximation to encode data in the hidden units amounts to using hT to reconstruct vT . 44 Figure 5.2 A comparison of two sample frames from reconstructions of the bounc- ing balls under scaling factors (1,0) (on the left) and (0,1) (on the right). Under (1,0) one can see distinct balls. However, the balls stay mostly in the corners and exhibit very erratic movement. Under (0,1) distinct balls are no longer visible but the motion is smooth. 45 vii ACKNOWLEDGEMENTS I would like to thank: My friends and family for all their support. Roderick Edwards, for mentoring, support, encouragement, and patience. Sue Whitesides, for her role as external examiner and her helpful feedback. The University of Victoria, for funding and providing a great learning environment. Chapter 1 Introduction The mammalian nervous system represents an incredibly intricate and complicated dynam- ical system. The first step to understanding such a system is to break it down into its simplest parts. In our case, the basic unit of the nervous system is the neuron. The dynamic behaviour of each neuron was originally described as a four-dimensional ordinary differential equation [13]. Since then many alternative models have been developed either to reduce complexity, increase biological realism, or model the behaviour of different variations on the basic neuron [17]. A single nervous system may be comprised of trillions of neurons that interact via biological structures known as synapses. The nervous system is thus seen as a large network of neurons with some connectivity pattern. To make matters worse, a neural network is by nature adaptive. The synaptic efficacy changes in response to the dynamics of the network through the biological processes of long-term potentiation and long-term depression (increasing and lowering efficacy, respectively). Describing a biological neural network in full detail is obviously intractable. However, an alternate approach has yielded some insight as well as countless practical applications. This approach is that of the artificial neural network (ANN). An artificial neural network does not seek to produce a biologically accurate model of a neural network, but instead to extract its fundamental dynamic properties and study the resulting object in an abstract setting. In order to do this one must examine the dynamic properties of a neuron. The state of a single neuron can roughly be described by two phases, resting and spiking. When a neuron is resting it maintains a constant voltage gradient across its membrane. When the neuron is spiking the voltage gradient is increased and an electrical signal propagates down the axon towards a synapse. Upon receiving this signal chemicals called neurotransmitters are released causing a destabilization in the membrane potential of the neighbouring neu- 2 ron. If the total signal from neighbouring neurons crosses a certain threshold the behaviour of a neuron transitions from resting to spiking. This is the result of a Hopf bifurcation [17]. In the most basic abstraction, neurons are viewed as simple linear threshold units. At each time step, we determine the state of a neuron by taking a weighted sum of the states of its neighbouring neurons and passing this value to a transfer function. These models were originally studied abstractly as modelling logical statements [20]. Since then countless variations of the basic artificial neural network have been studied [9]. Despite there being no single definition for an artificial neural network they all share the same basic principles, a set of neurons with simple dynamic behaviour that communicate locally via a set of weights. Artificial neural networks have proved to be powerful tools in artificial intelligence and machine learning, providing state of the art results in image and voice recognition, computer vision, and other tasks [12][11][18]. Here we provide a definition which encompasses the most prominent kinds of networks, allowing us to study several important features of artificial neural networks under a com- mon framework. Most importantly we are concerned with the computational and dynamic capabilities of artificial neural networks as a class of computational models. Many results regarding the universal approximation of specific neural networks have been around for decades. We go over some of these results for various models before proving several new universal approximation results for related classes of probabilistic artificial neural networks. We conclude with a set of experiments that highlight the possibility of the use of artificial neural networks to provide insight into the biological process behind perception. 3 Chapter 2 Artificial Neural Networks 2.1 Dynamics and Stability A neural network is by nature a highly parallel system in which relatively simple computational units interact locally via a set of connections. In order to properly define an artificial neural network so that every network discussed here will fit, it is necessary to identify the essential properties shared by all such networks. Each network that will be discussed con- sists of a finite number of nodes, loosely called neurons, which, at each time step, take on real values determined from the previous state of the network along with the network’s parameters. In general a network is probabilistic; and in full generality a network can have long-term time dependence. Thus we see an artificial neural network as representing a Markov kernel from the measure space of a finite number of previous configurations to the space of current configurations. To be precise, given that X = M k for some k with M ⊂ R, a Markov kernel is a map κ : Xm × B ! [0; 1] with source (Xm; A) for some m and target (X; B) where A and B are σ-algebras on Xm and X respectively where for any B 2 B the map xm ! κ(xm;B) is measurable and for every xm 2 Xm, the map B ! κ(xm;B) is a probability measure of (X; B).

Load more