Backpropagation Learning Simple Vs

Backpropagation Learning Simple Vs

MIT Department of Brain and Cognitive Sciences 9.641J, Spring 2005 - Introduction to Neural Networks Instructor: Professor Sebastian Seung Backpropagation learning Simple vs. multilayer perceptron Hidden layer problem • Radical change for the supervised learning problem. • No desired values for the hidden layer. • The network must find its own hidden layer activations. Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way of creating desired values for hidden layers Outline • The algorithm • Derivation as a gradient algoritihm • Sensitivity lemma Multilayer perceptron • L layers of weights and biases • L+1 layers of neurons 1 1 2 2 L L x0 ⎯ W⎯ ,b→ x1 ⎯ W⎯ ,b→ ⎯ W⎯ ,b→ x L L ⎛ nl−1 ⎞ l l l−1 l xi = fW⎜ ∑ ij x j + bi ⎝ j=1 ⎠ Reward function • Depends on activity of the output layer only. Rx( L ) • Maximize reward with respect to weights and biases. Example: squared error • Square of desired minus actual output, with minus sign. nL L 1 L 2 Rx()=− ∑()di − xi 2 i=1 Forward pass For l =1 to L, nl−1 l l l−1 l ui = ∑Wij x j + bi j=1 l l xi = fu( i ) Sensitivity computation • The sensitivity is also called “delta.” L L ∂R si = f ′( ui ) L ∂xi L L = f ′() ui (di − xi ) Backward pass for l = L to 2 nl l−1 l−1 l l s j = f ′() u j ∑si Wij i=1 Learning update • In any order l l l−1 ∆Wij = ηsi x j l l ∆bi = ηsi Backprop is a gradient update • Consider R as function of weights and biases. ∂R l l−1 l ∂R l = si x j ∆Wij = η l ∂Wij ∂Wij ∂R l l ∂R l = si ∆bi = η l ∂bi ∂bi Sensitivity lemma • Sensitivity matrix = outer product – sensitivity vector – activity vector ∂R ∂R l−1 l = l x j ∂Wij ∂bi • The sensitivity vector is sufficient. • Generalization of “delta.” Coordinate transformation nl−1 l l l−1 l ui = ∑Wij fu()j + bi j=1 l ∂ui l l−1 l−1 = Wij f ′() u j ∂u j ∂R ∂R l = l ∂ui ∂bi Output layer L L xi = fu( i ) L L L−1 L ui = ∑Wij x j + bi j ∂R L ∂R L = f ′() ui L ∂bi ∂xi Chain rule • composition of two functions ul−1 → R ul−1 → ul → R l ∂R ∂R ∂ui l−1 = ∑ l l−1 ∂u j i ∂ui ∂u j ∂R ∂R l l−1 l−1 = ∑ l Wij f ′( u j ) ∂b j i ∂bi Computational complexity • Naïve estimate – network output: order N – each component of the gradient: order N – N components: order N2 • With backprop: order N Biological plausibility • Local: pre- and postsynaptic variables l l l−1 Wij l l−1 Wij l x j ⎯⎯ → xi , s j ⎯← ⎯ si • Forward and backward passes use same weights • Extra set of variables Backprop for brain modeling • Backprop may not be a plausible account of learning in the brain. • But perhaps the networks created by it are similar to biological neural networks. • Zipser and Andersen: – train network – compare hidden neurons with those found in the brain. LeNet • Weight-sharing • Sigmoidal neurons • Learn binary outputs Machine learning revolution • Gradient following – or other hill-climbing methods • Empirical error minimization.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    22 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us