Open Stevenpetronethesis.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE NONLINEAR NEURON DISCRIMINANT FUNCTIONS FOR ALTERNATE DEEP LEARNING TRAINING ALGORITHMS STEVEN PETRONE SPRING 2020 A thesis submitted in partial fulfillment of the requirements for baccalaureate degrees in Computer Engineering with honors in Computer Engineering Reviewed and approved* by the following: Christopher Griffin Associate Research Professor Applied Research Laboratory & Department of Mathematics Thesis Supervisor Vijay Narayanan Professor of Computer Science and Engineering Honors Adviser *Signatures are on file in the Schreyer Honors College. i Abstract In this work we present a novel neuron discriminant function that allows for alternate training algorithms for deep learning. The new neuron type, which we call a posynomial neuron, can be combined with linear neurons to represent functions that are exponentials when inferencing new data, but are only polynomials of the network weights. We show that the properties of these net- works can be resistant to the vanishing gradient problem. We also formulate training the network as a geometric programming problem and discuss the interesting benefits this can have over training a network with gradient descent, such as data set analysis and network interpretability. We provide a C++ library that implements both posynomial and sigmoidal networks but provides flexibility for additional novel layer types. We also provide a tensor library that has applications beyond deep learning. ii Table of Contents List of Figures iv List of Tables v Acknowledgements vi 1 Introduction 1 1.1 Deep Learning . .2 1.2 Contribution . .2 1.3 Organization . .2 2 Literature Review 3 2.1 Neural Networks . .4 2.1.1 Artificial Neurons from Biology . .4 2.1.2 Logistic Regression and Nonlinearities . .4 2.1.3 Hidden Layers and XOR problem . .5 2.1.4 Derivative Backpropagation as an Engine for Gradient Descent . .6 2.1.5 Survey of Advancements in Neural Networks . .7 2.2 Posynomials . 10 2.2.1 Monomials, Polynomials, Posynomials . 10 2.2.2 Geometric Programming . 10 3 Theory 13 3.1 Posynomial Neural Network . 14 3.1.1 Monomial Based Discriminant Function . 14 3.1.2 Transforming Input Data Before a Posynomial Network . 14 3.1.3 Derivatives for gradient descent . 18 3.2 Geometric and Convex Programming . 19 3.2.1 Formulation . 19 4 Implementation 21 4.1 Data Structures . 22 4.1.1 Tree . 22 4.1.2 List . 22 4.1.3 Tensor . 22 iii 4.1.4 Layer . 23 4.1.5 Network . 24 4.2 Algorithms . 26 4.2.1 Inference . 26 4.2.2 Train/Learn . 27 4.3 Verification and Bug Finding . 27 4.3.1 Eradicating Memory Leaks with Valgrind . 27 4.3.2 Input Fuzzing with American Fuzzy Lop . 28 5 Experimental Results 29 5.1 XOR With 3 Layer Network . 30 5.2 XOR with 2 Layer Network . 34 5.3 Training XOR using an objective function and a nonlinear solver. 38 5.4 Run time Analysis . 39 5.4.1 Run time of Dot Product . 39 6 Conclusion 41 6.1 Discussion . 42 6.2 Future Directions . 42 6.2.1 Alternative to Gradient Descent . 42 6.2.2 Develop theory and training algorithm for complex value neural networks . 42 6.2.3 Layer analogous to the convolutional layer . 42 6.2.4 Library improvements . 43 Bibliography 44 iv List of Figures 2.1 Diagram of single neuron . .5 3.1 Rotation of XOR data points . 15 3.2 Posynomial learnable saddle point . 16 3.3 Posynomial learnable saddle point contour plot . 16 3.4 Saddle point that computes XOR . 17 3.5 Saddle point contour plot that computes XOR . 17 4.1 UML class diagram . 25 4.2 Valgrind verifying absence of memory leaks . 28 4.3 AFL Fuzzing to identify bugs . 28 5.1 Diagram of 3 layer XOR network . 30 5.2 Initial weights of 3 layer XOR network . 30 5.3 3 layer XOR network showing correct outputs . 31 5.4 Near perfect outputs for 3 layer XOR network . 31 5.5 Transformation of inputs by first layer in 3 layer XOR network . 32 5.6 Posynomial layers of 3 layer XOR network . 32 5.7 Contour plot of posynomial layers of 3 layer XOR network . 33 5.8 Plot of 3 layer XOR network . 33 5.9 Contour plot of 3 layer XOR network . 34 5.10 Diagram of 2 layer XOR network . 35 5.11 Initial weights of 2 layer XOR network . 35 5.12 Correct outputs for 2 layer XOR network . 36 5.13 Perfect outputs for 2 layer XOR network . 36 5.14 Plot of 2 layer XOR network . 37 5.15 Contour plot of 2 layer XOR network . 37 5.16 Posynomnial XOR trained with nonlinear solver . 38 5.17 Contour plot of posynomial XOR trained with nonlinear solver . 39 5.18 Run time of dot product implementations for tensor library . 40 v List of Tables 2.1 Definition of XOR . .6 vi Acknowledgements I would like to thank Christopher Griffin for his guidance with this work. 1 Chapter 1 Introduction 2 1.1 Deep Learning Deep learning is a common cross disciplinary research topic. While commonly thought of as a subfield of computer science and engineering, deep learning and the study of neural networks was inspired by attempts to study the mammalian brain. Now that it is a well established and popular field, deep learning has found applications in a surprisingly diverse range of disciplines. Deep learning attempts to solve problems such as classification, regression, and even content generation by training a network of artificial neurons using large data sets. Individual neurons ‘fire’ based on a computation between their inputs, the weighted connections and an internal bias. It is the process of tuning these weights that motivates the term ‘learning’ and the multilayered structure of the neurons that motivates use of the term ‘deep’. 1.2 Contribution The main contribution of this work is introducing a novel neural network layer type that is multiplicative in nature rather than linear. We show that this layer is a log transform, different from a linear layer with no activation function. When combined with a linear layer, a two layer network can resemble functions that compute multiplicative exponentials of the input when infer- encing. However, when the output is considered a function of the network weights it is a simple polynomial objective function, which allows for many novel optimization algorithms in addition to traditional gradient descent. We introduce some of these algorithms and their possible benefits to training speed, network interpretability, and security. We also discuss interesting properties of posynomial neural networks such as their resistance to the vanishing gradient problem. To train these networks we created a lightweight C++ library. The library includes many specialized data structures, including a generalized Tensor library that has applications beyond deep learning. Also included are traditional neural network data structures such as layers and networks and the neces- sary algorithms to train them. 1.3 Organization In Chapter 2 we will review the literature to provide an overview of the history and develop- ment of deep learning and using neural networks to learn from data, as well as a survey of more recent advancements in the field. We will also discuss posynomials and signomials as classes of functions, and Geometric Programming, a specialized optimization technique for these classes. In Chapter 3 we will introduce the theory of our contribution by describing our novel neuron types and the required calculus to train them with gradient descent. Additionally, we will formulate an objective function to train a neural network with Geometric Programming. In Chapter 4 we will discuss the implementation details of the libraries we created as part of this work and the steps we took to validate their functionality, stability, and security. Chapter 5 provides our experimental results where we discuss training multiple posynomial networks on the classic XOR problem and provide graphical representations of the network that show an intuitive way of understanding how posynomial networks work. We conclude our work in Chapter 6 where we provide a discussion along with future directions. 3 Chapter 2 Literature Review 4 2.1 Neural Networks Deep learning has had several periods of prolific research since its inception, with the most recent resurgence being in recent years. With recent advancements and discoveries, deep learn- ing is likely to forever change academic, industrial, government, and consumer technologies [1]. Deep learning is being implemented in IoT devices [2, 3] for computer vision, speech recogni- tion, and language processing from embedded devices to in ultra large scale data centers [4, 5] for training models from terabytes or more of data. Countless fields have discovered applications for deep learning including medical image processing [6], time series forecasting [7], autonomous vehicle driving [8], art generation [9], drug discovery [10, 11], language processing [12] and translation[13, 14]. Neural networks and deep learning have even been used to beat Chess and Go champions. Deep learning involves training a network of artificial neurons with large data sets in an attempt to mimic how biological organisms learn from experience. In fact, much of the early research that led to the field of deep learning was an attempt to understand the brain. 2.1.1 Artificial Neurons from Biology Many ideas that led to the creation of artificial neural networks came from attempts to under- stand the brain. The 1943 paper by Walter Pitts and Warren McCulloch [15] was one of the first attempts at describing a brain’s mental activity in terms of neural networks that can be quanti- fied with propositional logic. A neural network can be thought of as a mathematical graph where vertices emulate neurons.