Regularization Methods in Neural Networks

Regularization Methods in Neural Networks By Jacob Kasche and Fredrik Nordström Department of Statistics Uppsala University Supervisors: Johan Lyhagen and Andreas Östling 2020 Abstract Overfitting is a common problem in neural networks. This report uses a simple neural network to do simulations relevant for the field of image recognition. In this report, four common regularization methods for dealing with overfitting are evaluated. The methods L1, L2, Early stopping and Dropout are first tested with the MNIST data set and then with the CIFAR-10 data set. All methods are compared to a baseline where no regularization is used at sample sizes ranging from 500 to 50 000 images. The simulations in the report show that all four methods have repetitive patterns throughout the study and that Dropout continuously is superior to the other three methods as well as the baseline. Table of Contents 1 Introduction .................................................................................................................................. 1 2 History of artificial intelligence, machine learning and neural networks .................................... 2 2.1 Brief introduction to neural networks .................................................................................. 4 3 Data .............................................................................................................................................. 5 4 Method ......................................................................................................................................... 6 4.1 Base network ........................................................................................................................ 6 4.2 Metrics .................................................................................................................................. 7 4.3 Overfitting and regularizations............................................................................................. 8 4.4 Bias-variance tradeoff ........................................................................................................ 10 4.5 Simulation method ............................................................................................................. 11 5 Results ........................................................................................................................................ 13 5.1 MNIST ............................................................................................................................... 13 5.2 CIFAR-10 ........................................................................................................................... 19 6 Discussion .................................................................................................................................. 24 7 Conclusion ................................................................................................................................. 26 Appendix A ................................................................................................................................... 28 1 Introduction In recent years neural networks have reached new heights with human like results in perceptual problems.1 Skills earlier seen as near impossible for machines such as hearing and seeing can now be performed to a high degree by algorithms2. Thus, providing aid and new solutions in vastly different fields such as autonomous transportation, medical imaging, and language translation.3 Neural networks can be described as “…a means of doing machine learning, in which a computer learns to perform some task by analyzing training examples.”4. A common problem with neural networks is overfitting, this is when the neural network model random noise in the data set. A problem that frequently occurs when a network has too many hidden layer nodes, when the data set is small or of bad quality. Through the years different generalization methods have emerged and proved themselves at mitigating overfitting in neural networks. Some of the common regularization methods are L1, L2, Early stopping and Dropout5. Previous research has focused on the validity of the specific methods mentioned above with specific data sets but has not gone further in comparing the methods side by side more thoroughly to understand their behavioral patterns. Thus, this report aims to expand the research data concerning these four methods. This will be done by testing and evaluating the four regularization methods L1, L2, Early stopping and Dropout with a focus on the MNIST data set, and thereafter on the more complex CIFAR-10 data set. The four regularization methods will then be evaluated by comparing their performances, thus expanding the research fields’ understanding of the method’s behavioral patterns. 1 Chollet, François, and Joseph J. Allaire. , 'Deep Learning with R', Anonymous Translator(1st edn, Shelter Island, NY, Manning Publications Co, 2018), Section 1.1.6. 2 Chollet et al, 'Deep Learning with R', Section 1.1.6. 3 Synced. https://syncedreview.com/2019/10/31/google-introduces-huge-universal-language-translation-model-103- languages-trained-on-over-25-billion-examples/ Accessed (2020-01-13) 4 Hardesty. http://news.mit.edu/2017/explained-neural-networks-deep-learning-0414 Accessed (2020-01-13) 5 Brownlee. https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve- generalization-error/ Accessed (2020-01-13) 1 The first part of this report will start by a short description of the background of neural networks and provide an example of a neural network to give the reader a fundamental understanding of the research field. The second part will present a more detailed description of the two neural networks used in this report as well as the methods involved. The third part of the report will be concluded with the results and a discussion concerning the findings. The research question for this report is: how do the four methods perform at different sample sizes with the MNIST and CIFAR-10 data sets, and what does a comparison between them say about their specific behaviors? 2 History of artificial intelligence, machine learning and neural networks Artificial intelligence “the effort to automate intellectual tasks normally performed by humans” was born in the 1950’s6. The idea seemed very promising at first but as the tasks became more complex the problem of being able to create enough instructions for the algorithms to perform the desired task became apparent. Out of this problem the idea of machine learning rose. Could a machine learn how to perform a task or solve a problem by itself, given enough data? And could those specific rules then be used with new data?7 Figure 1. A comparison between classical programming and machine learning.8 6 Chollet et al. , 'Deep Learning with R', Section 1.1.1. 7 Chollet et al. , 'Deep Learning with R', Section 1.1.2. 8 Chollet et al. , 'Deep Learning with R', section 1.1.2. 2 This brings us to neural networks, a concept in machine learning. The idea of neural networks come from neurobiology and the human brain. Even though the inspiration of neural networks comes from some of our understandings of how the human brain works, there is no evidence for any resemblance between how the actual model in a neural network works and how the brain works.9 Machine learning can be argued to have been born in the 1950’s, however an earlier idea was of great importance for the invention of this field. In 1943 Warren Mcculloch and Walter Pitts published A Logical Calculus of Ideas Immanent in Nervous Activity, which was the first ever mathematical model of a neural network to be published. Fifteen years later, in 1958, Frank Rosenblatt published the idea of a perceptron in the paper The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain (1958). Shortly thereafter the “Mark 1 Perceptron” machine was built based on the algorithm which was used for pattern recognition. However, it was quickly proven that this single layer perceptron only was capable of learning linear separability. Hence unable to solve more complex problems with many classes. A point that the book Perceptrons: an introduction to computational geometry by Marvin Minsky and Seymour Paper in 1969 made clear. Further advancements were made but the initial hype caused unsustainable expectations and the field lost most of its momentum. This led to a lack of funding and overall stagnation in the 1970’s. In the 1980’s machine learning started to gain popularity again. One reason was a US-Japan joint conference where Japan announced efforts for further advancement in the field which quickly lead to more funding from the US as well. In the new millennia, with the internet further establishing itself, computational power increasing and big data becoming more available, neural networks have further established their utility and value.10 9 Chollet et al. , 'Deep Learning with R', section 1.1.4. 10 Strachnyi. https://medium.com/analytics-vidhya/brief-history-of-neural-networks-44c2bf72eec Accessed (2019- 12-15) 3 2.1 Brief introduction to neural networks Figure 2. Showing an example of a Neural Network.11 The data set is first processed so the information can be assigned to as many nodes as needed in the input layer to represent the information at hand. With images this is often translated to one node for each pixel and a value for each node corresponding to the color of the pixel. This information is then sent to

Load more