Structural Learning of Neural Networks Phd Defense

Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Structural Learning of Neural Networks PhD Defense Pierre Wolinski TAU Team, LRI & Inria March 6, 2020 1/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Articial Intelligence Articial Intelligence: solve a given task, like image recognition/classication or winning at the go game... plane bird dog horse Classication task AI AlphaGo Credits: Alex Krizhevsky Credits: Google DeepMind Machine Learning: instead of implementing human methods into programs, let programs nd the best method to solve given tasks. 2/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Neural Networks Approach of Machine Learning: Neural Networks. Very exible class of models, trainable with a generic algorithm. trainable parameters 0.1 -0.5 2.3 output -1.2 inputs neuron neural network The articial neuron is the elementary brick of every neural network, innite ways to combine them ) choice of the model, training hyperparameters. 4/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Problem: Hyperparameters Hyperparameter search: architecture of the neural network; learning rate η: w w − ηrwL; penalty , : w Pn 2 w . λ r L( ; D) = i=1 kyi − Mw(xi )k2 + λ r( ) Main issues: ecient architectures are usually very large: how to reduce their size? ) pruning. optimal η and λ depend strongly on the neural network architecture: nd rules to get rid of η or nd default values for η and λ. 5/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Summary Goal: Build theoretically well-founded methods to help xing the hyperparameters η and λ, and the architecture. 1 Learning with Random Learning Rates, get rid of the learning rate η; 2 Interpreting a Penalty as the Inuence of a Bayesian Prior, bridge between empirical penalties and the Bayesian framework, default value for λ; 3 Asymmetrical Scaling Layer for Stable Network Pruning, reparameterization of neural networks to nd a default value for η, pruning technique. 6/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion 1 Introduction 2 Learning with Random Learning Rates Robustness of Training Methods Presentation of Alrao Experimental Results Conclusion 3 Interpreting a Penalty as the Inuence of a Bayesian Prior 4 Asymmetrical Scaling Layers for Stable Network Pruning 5 Conclusion 7/28 Pierre Wolinski Structural Learning of Neural Networks Idea of Alrao: instead of searching the optimal learning rate, add diversity in the learning rates: we attribute randomly a learning rate to each neuron, sampled from a distribution spanning many orders of magnitude; neurons with a wrong learning rates will be useless and ignored; emergence of an ecient sub-network. Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion Robustness of Training Methods When training a neural network, we want robustness: during training, between runs, between architectures and datasets, robust to hyperparameter change. Sensitive hyperparameter: learning rate. 8/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion Robustness of Training Methods When training a neural network, we want robustness: during training, between runs, between architectures and datasets, robust to hyperparameter change. Sensitive hyperparameter: learning rate. Idea of Alrao: instead of searching the optimal learning rate, add diversity in the learning rates: we attribute randomly a learning rate to each neuron, sampled from a distribution spanning many orders of magnitude; neurons with a wrong learning rates will be useless and ignored; emergence of an ecient sub-network. 8/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion Alrao: in the Internal Layers Setting: classication task performed by a feedforward network: Network = internal layers + classication layer Output Softmax Learning rates sampling: we sample a learning rate Classifier ηneuron per neuron from the log-uniform distribution on the interval (ηmin; ηmax): ... log min max η ∼ U(η ; η ): ... Pre-classifier

Structural Learning of Neural Networks Phd Defense

Training Autoencoders by Alternating Minimization

Q-Learning in Continuous State and Action Spaces

Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inﬂow Forecasting

Training Deep Networks Without Learning Rates Through Coin Betting

The Perceptron

Neural Networks a Simple Problem (Linear Regression)

Learning-Rate Annealing Methods for Deep Neural Networks

Arxiv:1912.08795V2 [Cs.LG] 16 Jun 2020 the Teacher and Student Network Logits

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

How Does Learning Rate Decay Help Modern Neural Networks?

Natural Language Grammatical Inference with Recurrent Neural Networks

Perceptron.Pdf