Advanced Deterministic Optimization Algorithm for Deep Learning Artificial Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/338014968 Advanced Deterministic Optimization Algorithm for Deep Learning Artificial Neural Networks Preprint · December 2019 DOI: 10.13140/RG.2.2.33006.77127 CITATIONS READS 0 271 1 author: Jamilu Auwalu Adamu National Mathematical Centre 33 PUBLICATIONS 89 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: SUPERINTELLIGENT DEEP LEARNING ARTIFICIAL NEURAL NETWORKS View project Artificial Neural Network View project All content following this page was uploaded by Jamilu Auwalu Adamu on 18 December 2019. The user has requested enhancement of the downloaded file. Advanced Deterministic Optimization Algorithm for Deep Learning Artificial Neural Networks Jamilu Auwalu Adamu Mathematics Programme, 118 National Mathematical Centre, 904105, FCT-Abuja, Nigeria Correspondence: Mathematics Programme Building, 118 National Mathematical Centre, Small Sheda, Kwali, FCT- Abuja, Nigeria. Tel: +2348038679094. E-mail: [email protected] Received: November 26, 2019 Accepted: December 10, 2019 Online Published: XX, 2019 Abstract The existing choices of Activation Functions of a deep learning neural network are majorly based on personal human judgments, biases, experiences and little quantitative skills, thus, neither generated from the training data, testing data nor emanated from the referenced AI-MI-Purified Data Set. In my previous paper, Jameel’s ANNAF Stochastic Criterion and Lemma for selecting stochastic activation functions were proposed, however, the objective of this paper is to propose Definite Rules, not Trial and Error called “Jameel’s ANNAF Deterministic Criterion and Lemma” for the choice of advanced optimized Activation Functions. This is the only paper that first applied “proposed Jameel’s ANNAF Deterministic Criterion” to proposed about Two-Thousand Two-Hundred and Twenty Four, 2224 Advanced Activation Functions (mostly Deterministic) EMANATED from our AI SAMPLE DATA for the successful conduct of Deep Learning Artificial Neural Network. THREE out of which were Rated excellent Activation Functions for “Temperature vs Conductance” Deep Learning Artificial Neural Network. However, one can still find more candidates out of the remaining 2221 Activation Functions using the proposed criterion. The bottom line is that Advanced Deep Learning Artificial Neural Networks would depend on AI DATA, TIME CHANGE and the AREA OF APPLICATION. Keywords: Jameel’s ANNAF Deterministic Criterion, AI-ML-Purified Data, Activation Functions, TableCurve 2D, Derivative Calculator, Criterion 1. Introduction Casper Hansen (2019) says “Better optimized neural network; choose the right activation function, and your neural network can perform vastly better”. Artist Hans Hoffman wrote, “The ability to simplify means to eliminate the unnecessary so that the necessary may speak.” Taking close look at the existing set of Activation Functions and the Deep Learning Neural Network structure, it is a system that made up of both probabilistic and non-probabilistic (deterministic) functions, (please, see https://en.wikipedia.org/wiki/Activation_function). The current beliefs and practice in the academia, decision- makers, and professionals, one can use both probabilistic and deterministic Activation Functions in a Neural Network system as stated by the different opinions of members of Researchgate as at 6th June, 2019 that : “Right now I am using sigmoidal function as an activation function for the last layer and it is giving me output in the range of 0 to 1 which is obvious. So my question is whether I should use another function as an activation function in the last layer?”. Responses:“the most appropriate activation function for the output neuron(s) of a feedforward neural network used for regression problems (as in your application) is a linear activation, even if you first normalize your data.,“Yes you can use a linear function as activation function of the last layer”,“The most exact and accurate prediction of neural networks is made using tan-sigmoid function for hidden layer neurons and purelin This paper was accepted on December 10, 2019 by International Journal of Applied Science, IDEAS SPREAD INC, USA Page 1 function for output layer”, “You should normalize your dataset in [-1,1] range first . Then, for function approximation (as in your case) I agree with Ali Naderi and you better use tansig (for hidden layers) and purelin (for output layer). However, for classification tasks you better use tansig everywhere (for hidden as well as output layers)”, “regarding the activation function of hidden layer it is any sigmoid function except in the case of some bottleneck neurons in which case a hidden layer neuron has a linear function...thanks and regards...”,“You should use purelin Linear transformation function on the last layer of your network. “,“It is better to use sigmoid activation function both for the hidden layer and last layer neuron in order to get accurate results”, “if the input and output mapping is nonlinear, then use logistic function at the output layer, and you can still use linear activation function or logistic function at the hidden layer”,“This depends on the task, regression or classification, tansig or sigmoid”,“All the answers are great”,“In terms of using NNs for prediction, you have to use linear activation function for (only) the output layer. When you normalize your data into [0, 1] and then use sigmoid function, the accuracy may decrease” as of June, 6th, 2019. Also, “Now with the above transformations a ReLU activation function should never be able to fit a x² curve. It can approximate, but as the input grows the error of that approximated function will also grow exponentially, right? Now x² is a simple curve. How can ReLU perform better for real data which will be way more complicated than x² ?”, (Professionals discussion forum, Data Science of StackExchange (2018)). According to the discussion above, ReLU activation function was assumed to fit the curve x², a non-linear deterministic (non-probabilistic) function in a Neural Network. The dilemma here is that which scientific methods or criterion applied to fit x² with a ReLU activation function. Currently, artificial intelligence neural network algorithms, particularly Activation Functions were accused of lack of transparency, regulations, supervision, operating in secrecy contained a Black Box and difficulty in terms of explainability with many human biases, their final ranking has to be questionable full of bad recommendations, also, were accused will be used to determine the next US President and exposes children to unsolicited sexual videos contents. The objective of this paper is to propose Definite Rules, not Trial and Error called “Jameel’s ANNAF Deterministic Criterion and Lemma” for the choice of advanced optimized Activation Functions of Deep Learning Artificial Neural Network. The paper started with the Introduction, Material, and Methods, new research findings, and Lemma will be proposed. The paper will be crowned up with concluding remarks. 2. Materials and Methods 2.1 Materials 2.1.1 Basic Definitions Deterministic (probabilistic): A deterministic model is one in which every set of variable states is uniquely determined by parameters in the model and by sets of previous states of these variables; therefore, a deterministic model always performs the same way for a given set of initial conditions. Probabilistic (stochastic): In a stochastic model, randomness is present, and variable states are not described by unique values, but rather by probability distributions. Stochastic Neural Network: Stochastic neural networks are a type of artificial neural netwoks built by introducing random variations into the network, either by giving the network's neurons stochastic activation functions, or by giving them stochastic weights. An example of a neural network using stochastic transfer functions is a Boltzmann machine. Each neuron is binary valued, and the chance of it firing depends on the other neurons in the network. Stochastic neural networks have found applications in Risk Management, Oncology, Bioinformatics, and other similar fields. Deterministic Neural Network: deterministic system is a non-probabilistic system. A. M. Abdallah (2018) defined Deterministic Neural Network as “If the activation value exceeds the threshold, there is a probability associated with firing. That is there is a probability of the neuron not firing even if it exceeds the threshold. If the probability is one then that update is Deterministic”. This paper was accepted on December 10, 2019 by International Journal of Applied Science, IDEAS SPREAD INC, USA Page 2 Curve fitting: Curve fitting is one of the most powerful and most widely used analysis tools in Origin. Curve fitting examines the relationship between one or more predictors (independent variables) and a response variable (dependent variable), with the goal of defining a "best fit" model of the relationship. Origin provides tools for linear, polynomial, and nonlinear curve fitting along with validation and goodness-of-fit tests. You can summarize and present your results with customized fitting reports. Rank Models Tool: The Rank Models tool lets you fit multiple functions to a dataset, and then reports the best fitting model. Results are ranked by Akaike and Bayesian Information Criterion scores. 2.1.2 Software and Online Resource Materials Two of the fundamental pillars