IMPROVING PERFORMANCE of DEEP NEURAL NETWORKS by DEVELOPING NOVEL ACTIVATION FUNCTIONS Phd Thesis
Total Page:16
File Type:pdf, Size:1020Kb
IOSUD - Universitatea Politehnica Timişoara Şcoala Doctorală de Studii Inginereşti IMPROVING PERFORMANCE OF DEEP NEURAL NETWORKS BY DEVELOPING NOVEL ACTIVATION FUNCTIONS PhD Thesis – Abstract To obtain a PhD degree at Politehnica University Timișoara In Computers and Information Technology author ing. Marina Adriana MERCIONI Thesis supervisor Prof.univ.dr.ing. Ștefan HOLBAN In the thesis entitled „Improving performance of deep neural networks by developing novel activation functions” are presented a multivariate series of activation functions developed along time in Deep Learning [1] domain. The thesis is structured in five parts having a total of eight chapters and contains more subchapters (Fig.1. and Fig.2.) divided as follows: 2. Analysis of the current stage 1. Introduction in Deep Learning 3. Optimization techniques 4. Architectures of neural networks 5. Deep Learning for text and sequence 6. Proposed activation functions 8. Conclusions 7. Experimental results Bibliography Fig.1. The architecture of the thesis Fig.2. The architecture of the paper by subchapters The thesis structure: The first part contains chapter 1 in order to present the motivation, the objective of the research treated in the paper. The second part contains the analysis of the current state in the field of activation functions. The third part consists of chapters 3, 4 and 5 in which are presented the optimization techniques used in the paper, different architectures of neural networks based on which I did experiments and a short presentation of Deep Learning for text and sequences. The fourth part is the main part of the paper, it contains chapters 6 and 7 with proposed activation functions and experimental results. The last part contains chapter 8 on the final conclusions on the issues presented in the paper as well as personal contributions. The paper concludes with future research directions, publications and bibliography. The aim of the paper is to analyze in detail and develop new activation functions in the field of Deep Learning that are able to increase performance tasks on both Computer Vision [2] and Natural Language Processing fields [3] but also other types of tasks such as detection anomalies [4] and predictions in time series. The interest for the analysis of Artificial Neural Networks through the use of an activation function has increased rapidly in recent years, thus becoming a highly studied field of research, as evidenced by existing articles in this direction. That is why in this thesis I want to emphasize the importance of the activation function in the analysis of several datasets for different tasks, to see how the activation function impacts the training process. I will continue to present, very succinctly, the essential aspects that are key points throughout each chapter of the thesis. The first chapter presents the motivation of choosing the topic for the thesis, the objective pursued and the structure of the thesis. My motivation starts from the fact that without activation functions, the neural network can only learn basic tasks, so by introducing the activation function the network is able to learn more complex tasks. So, the activation function is a key point in defining the architecture of the neural network and at the same time the activation function is one of the most important parameters that we must choose to successfully obtain a better performance in an artificial neural network. Starting from its role, which it plays in a neural network, my focus was on the study, analysis of how it works, the advantages and disadvantages of activation functions to find the function that best maps to the type of task, bringing the best performance. A decisive property that determines the performance of an activation function is given by whether or not it is smooth. [5] Property also analyzed by me in this thesis in comparison with other functions such as: tangent activation function (tanh) [6], ReLU function (rectified linear unit) [7], Swish activation function [8], and so on. This property gives to the function the capacity to have continuous derivatives, up to a certain specified order. This implies that the function is continuously differentiable, in other words the first derivative exists everywhere and is continuous. Also, due to the fact that this field is constantly evolving, another direction was the development of novel activation functions that would bring improvements to the architecture of artificial neural networks. This thesis aims to investigate the use of different activation functions that are an integral part of artificial neural networks in the field of Deep Learning and the development of novel activation functions to bring improvements during network training. This field has been very successful and has been widely studied in recent years due to its availability in terms of hardware, having graphics processing units (GPUs) and increasing volumes of data (Big Data). To achieve this goal, I have defined the following tasks: ● analysis of the current state in the field and identification of new research directions in the context of activation functions, which have developed over time, being a research area that is constantly evolving and how to correlate the functions of activation with the requirements of the task, the architecture’s type but also the data’s type. ● identification of activation functions that bring substantial improvements and lead to a rapid convergence of artificial neural networks. This task is based on the analysis of the activation functions and the selection of the most appropriate function based on the data, which also constitutes the exploitation and impact potential. ● implementation of several types of models and their evaluation according to certain specific metrics such as precision, accuracy, cost function, mean absolute error, etc. In the second chapter, which also corresponds to the second part, I focus on the presentation of the state of the art and the current situation in the field, the Multilayer Perceptron architecture, activation functions and the Backpropagation algorithm. Artificial Intelligence (AI) is the key point in technological development, allowing computers to model the real world to a very high degree, obtaining results comparable to reality. Significant progress has been made in the field of neural networks - a large and sufficient number to drive my attention in this direction. To achieve this, we need a large amount of information about everything around us, information that must be stored in the computer. Basically, a certain form is given to this data that computers can use to answer questions, providing better answers as the model has a growing volume of information. To generalize this context, many researchers have turned to learning algorithms to store a quantity of information in a short time. Lots of progress has been made in understanding and improving these learning algorithms, but Artificial Intelligence still remains a challenge. Fig.3. Artificial Intelligence, Automated Learning and Deep Learning - Venn Chart [9] The history of Deep Learning began in 1943, when a model based on the neural networks of the human brain was created. Since then, Deep Learning has constantly evolved, there were only two significant breaks in its development, encountered in the literature as the ugly winters of Artificial Intelligence. In 1960, the basic elements of a continuous pattern of Backpropagation were defined. [10] Although so many algorithms have been developed over time, activation functions have not been ignored either. Starting from the role of the activation function to filter information, I want to have functions with equally specific properties, as it is based on updating the weights in the neural network. Based on this consideration, several activation functions have been developed that allow us to use them in different ways and that help neural networks achieve convergence faster or give them the ability to use fewer layers in defining their architecture. With fewer layers in the architecture, we have fewer parameters in our network so it is a good way to optimize the network. The main purpose of the activation function is to introduce nonlinearity into the result of a neuron. Also, the activation function known as the transfer function can decide whether or not a neuron should be activated by a weighted amount to which bias is added. A neural network without activation functions is just a linear regression model. But the activation function gives the network the ability to learn more complex tasks. Another important aspect is that the activation function and the determination of the initialization of the weights was given by the space in which the optimization algorithm will take values. This algorithm comes as a solution for creating the neural network which consists of a non-convex optimization problem. Selecting an activation function is an important issue because it can affect how input data changes. It was shown that a novel type of activation function known as the Rectified Linear Unit (ReLU) improves the performance of the neural network in terms of time efficiency and spatial complexity. It was also studied the impact of using nonlinear behavior instead of sigmoid function or tangent function (also known as tanh) by using the controller to prevent possible numerical problems with unlimited activations. Also in this chapter I presented Perceptron and Multi-layer perceptron architectures. Because the activation function is a fundamental