Multilayer Perceptron (MLP) Application Guidelines

Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Multilayer Perceptron (MLP) Application Guidelines Paulo Cortez [email protected] http://www.dsi.uminho.pt/~pcortez Departamento de Sistemas de Informa¸cão University of Minho - Campus de Azurém 4800-058 Guimarães - Portugal Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning 1 Introduction 2 How to use MLPs 3 NN Design 4 Case Study I: Classification 5 Case Study II: Regression 6 Case Study III: Reinforcement Learning Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Node Activation Further information in [Bishop, 1995] and [Sarle, 2003]. Each node outputs an activation function applied over the P weighted sum of its inputs: si = f (wi,0 + j∈I wi,j × sj ) x = +1 0 w = Bias x i0 1 w i1 x Node 2 w i2 i n Inputs ... Σ f ui si w in xn Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Activation functions Linear: y = x ; Tanh: y = tanh(x); 1 Logistic (or sigmoid): y = 1+e−x ; 1 0 −4 −2 0 2 4 Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Only feedforward connections exist; Nodes are organized in layers; Input Hidden Hidden Output Layer Layer Layer Layer 3 shortcut 7 1 4 6 2 bias 8 shortcut 5 Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Why MLPs? Popularity - the most used type of NN; Universal Approximators - general-purpose models, with a huge number of applications; Nonlinearity - capable of modeling complex functions; Robustness - good at ignoring irrelevant inputs and noise; Adaptability - can adapt its weights and/or topology in response to environment changes; Easy of Use - black-box point of view, can be used with few knowledge about the relationship of the function to be modeled. Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Who is interested in MLPs? Computer Scientists : study properties of non-symbolic information processing; Statisticians: perform flexible data analysis; Engineers: exploit MLP capabilities in several areas; Cognitive Scientists: describe models of thinking; Biologists: interpret DNA sequences; ... Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Applications [Cortez et al., 2002] Supervised Learning (Input/Output Mapping): Classification (discrete outputs): Diabetes in Pima Indians; Sonar: Rocks vs Mines; Regression (numeric outputs): Prognostic Breast Cancer; Pumadyn Robot Arm; Reinforcement Learning (Output is not perfectly known): Control (e.g., autonomous driving); Game Playing (e.g., checkers); Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Software [Sarle, 2003] Commercial: SAS Enterprise Miner Software; MATLAB Neural Network Toolbox; STATISTICA: Neural Networks; Clementine http://www.spss.com/spssbi/clementine/index.htm; Free: Stuttgart Neural Network Simulator (SNNS) (C code source); Joone: Java Object Oriented Neural Engine (code source); R (statistical tool) http://www.r-project.org/; WEKA (data mining) http://www.cs.waikato.ac.nz/~ml/weka/; Built Your Own: Better control, but take caution! Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Classification Metrics The confusion matrix [Kohavi & Provost, 1998] matches the predicted and actual values; Table: The 2 × 2 confusion matrix. ↓ actual \ predicted → negative positive negative TN FP positive FN TP Three accuracy measures can be defined: TP the Sensitivity (Type II Error) = FN+TP × 100 (%) ; TN the Specificity (Type I Error) ; = TN+FP × 100 (%) TN+TP the Accuracy = TN+FP+FN+TP × 100 (%) ; Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Regression The error e is given by: e = d − db where d denotes the desired value and the db estimated value (given by the model); Given a dataset with the function pairs x1 → d1, ··· , xN → dN , we can compute: PN i=1|ei | Mean Absolute Deviation (MAD): MAD = N PN 2 Sum Squared Error (SSE): SSE = i=1 ei Mean Squared Error (MSE): MSE = SSE N √ Root Mean Squared Error (RMSE): RMSE = MSE Correlation metrics (scale independent): Person [-1, 1]; P 2 2 (y−by) Press r squared: R = 1 − P(y−y)2 Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Data Collection and Preprocessing Learning samples must be representative, hundreds or thousands of cases are required; Only numeric values can be fed (discretization); Treat missing values (delete or substitute); Rescaling: inputs: x−(max+min)/2 y = (max−min)/2 (maximum–mininum: range [-1,1]) x−x y = s (Mahalanobis scaling: mean 0 and std 1) numeric outputs: within [0,1] if logistic and [-1,1] if tanh. (x−min) y = max−min (maximum–mininum: range [0, 1]) Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Nominal variables (with 2 or more non-ordered classes): 1-of-C coding: use one binary variable per class; Color example: Blue - 1 0 0, Red - 0 1 0, Green - 0 0 1. x Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Consult [Flexer, 1996] for further details. Holdout, split the data into two exclusive sets, using random sampling: training: used to set the MLP weights (2/3); test: used to infer the MLP performance (1/3). K-fold, works as above but uses rotation: data is split into K exclusive folds of equal size; each part is used as a test set, being the rest used for training; the overall performance is measured by averaging the K runs results; 10-fold most used if hundreds of examples; leave-one-out (N-fold) used if less than 100 or 200 examples; Third extra set is needed if parameter tunning; Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Training Algorithms Gradient-descent (O(n)) [Riedmiller, 1994]: Backpropagation (BP) - most used, yet slow; Backpropagation with Momentum - faster than BP, requires additional parameter tunning; QuickProp - faster than BP with Momentum; RPROP - faster than QuickProp and stable in terms of its internal parameters; Levenberg-Marquardt One of the most accurate algorithms; O(n2) in terms of computer (memory, speed) requirements (use with small networks); Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning Overfitting Output B A Training Data Test Data Input If possible use large datasets: N p (weights); Model Selection - apply several models and then choose the one with the best generalization estimate; Regularization - use learning penalties or restrictions: Early stopping - stop when the estimate arises; Weight decay - in each epoch slightly decrease the weights; Jitter - training with added noise. Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning MLP Capabilities Linear learning when: there are no hidden layers; or only linear activation functions are used. Nonlinear learning: any continuous function mapping can be learned with one hidden layer; discontinuous complex functions can be learned with more hidden layers. It is better to perform only one classification/regression task per network; i.e., use C/1 output node(s). Activation Functions: Hidden Nodes: use the logistic (or tanh); Output Nodes: if outputs bounded, apply the same function; else use the linear function; Paulo Cortez Multilayer Perceptron (MLP)Application Guidelines Contents Introduction How to use MLPs NN Design Case Study I: Classification Case Study II: Regression Case Study III: Reinforcement Learning

Load more