Structural Learning of Neural Networks Phd Defense

Structural Learning of Neural Networks Phd Defense

Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Structural Learning of Neural Networks PhD Defense Pierre Wolinski TAU Team, LRI & Inria March 6, 2020 1/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Articial Intelligence Articial Intelligence: solve a given task, like image recognition/classication or winning at the go game... plane bird dog horse Classication task AI AlphaGo Credits: Alex Krizhevsky Credits: Google DeepMind Machine Learning: instead of implementing human methods into programs, let programs nd the best method to solve given tasks. 2/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion How to Do Machine Learning Recipe of Machine Learning Example Fix a task to be performed Task: recognize handwritten digits. Build a training dataset Dataset: database of labeled images of handwritten digits. Build a model to perform the task Model: Convolutional Neural Network (usually a program with initially unde- termined parameters) Design an algorithm to train the model Training Algorithm: Stochastic Gradi- to perform the task (i.e., which modies ent Descent with backpropagation of the its parameters to improve it) gradient 3/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Neural Networks Approach of Machine Learning: Neural Networks. Very exible class of models, trainable with a generic algorithm. trainable parameters 0.1 -0.5 2.3 output -1.2 inputs neuron neural network The articial neuron is the elementary brick of every neural network, innite ways to combine them ) choice of the model, training hyperparameters. 4/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Problem: Hyperparameters Hyperparameter search: architecture of the neural network; learning rate η: w w − ηrwL; penalty , : w Pn 2 w . λ r L( ; D) = i=1 kyi − Mw(xi )k2 + λ r( ) Main issues: ecient architectures are usually very large: how to reduce their size? ) pruning. optimal η and λ depend strongly on the neural network architecture: nd rules to get rid of η or nd default values for η and λ. 5/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Learning with Random Learning Rates Interpreting a Penalty as the Inuence of a Bayesian Prior Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Summary Goal: Build theoretically well-founded methods to help xing the hyperparameters η and λ, and the architecture. 1 Learning with Random Learning Rates, get rid of the learning rate η; 2 Interpreting a Penalty as the Inuence of a Bayesian Prior, bridge between empirical penalties and the Bayesian framework, default value for λ; 3 Asymmetrical Scaling Layer for Stable Network Pruning, reparameterization of neural networks to nd a default value for η, pruning technique. 6/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion 1 Introduction 2 Learning with Random Learning Rates Robustness of Training Methods Presentation of Alrao Experimental Results Conclusion 3 Interpreting a Penalty as the Inuence of a Bayesian Prior 4 Asymmetrical Scaling Layers for Stable Network Pruning 5 Conclusion 7/28 Pierre Wolinski Structural Learning of Neural Networks Idea of Alrao: instead of searching the optimal learning rate, add diversity in the learning rates: we attribute randomly a learning rate to each neuron, sampled from a distribution spanning many orders of magnitude; neurons with a wrong learning rates will be useless and ignored; emergence of an ecient sub-network. Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion Robustness of Training Methods When training a neural network, we want robustness: during training, between runs, between architectures and datasets, robust to hyperparameter change. Sensitive hyperparameter: learning rate. 8/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion Robustness of Training Methods When training a neural network, we want robustness: during training, between runs, between architectures and datasets, robust to hyperparameter change. Sensitive hyperparameter: learning rate. Idea of Alrao: instead of searching the optimal learning rate, add diversity in the learning rates: we attribute randomly a learning rate to each neuron, sampled from a distribution spanning many orders of magnitude; neurons with a wrong learning rates will be useless and ignored; emergence of an ecient sub-network. 8/28 Pierre Wolinski Structural Learning of Neural Networks Introduction Robustness of Training Methods Learning with Random Learning Rates Presentation of Alrao Interpreting a Penalty as the Inuence of a Bayesian Prior Experimental Results Asymmetrical Scaling Layers for Stable Network Pruning Conclusion Conclusion Alrao: in the Internal Layers Setting: classication task performed by a feedforward network: Network = internal layers + classication layer Output Softmax Learning rates sampling: we sample a learning rate Classifier ηneuron per neuron from the log-uniform distribution on the interval (ηmin; ηmax): ... log min max η ∼ U(η ; η ): ... Pre-classifier

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    57 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us