Backpropagation and Gradient Descent

Backpropagation and Gradient Descent

Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016 Overview ▪ Notation/background | Neural networks | Activation functions | Vectorization | Cost functions ▪ Introduction ▪ Algorithm Overview ▪ Four fundamental equations | Definitions (all 4) and proofs (1 and 2) ▪ Example from thesis related work 2 Neural Networks 1 3 Neural Networks 2 ▪ a – Activation of a neuron is related to the activations in the previous layer ▪ b – bias of a neuron 4 Activation Functions ▪ Similar to an ON/ OFF switch ▪ Required properties | Nonlinear | Continuously differentiable 5 Vectorization ▪ Represent each layer as a vector | Simplifies notation | Leads to faster computation by exploiting vector math ▪ z – weighted input vector 6 Cost Function ▪ Objective Function ▪ Example: ▪ Optimization Problem ▪ Assumptions | Can average over Cx | Function of the outputs ▪ x – individual training examples (fixed) 7 Introduction ▪ Backpropagation | Backward propagation of errors | Calculate gradients | One way to train neural networks ▪ Gradient Descent | Optimization method | Finds a local minimum | Takes steps proportional to -gradient at current point 8 Algorithm Overview 9 Equation 1 ▪ Definition of error: 10 Equation 2 ▪ Key difference | Transpose of weight matrix ▪ Pushes error backwards 11 Equation 3 ▪ Note that previous equations computed error 12 Equation 4 ▪ Describes learning rate ▪ General insights | Slow learning when: | Input activation approaches 0 | Output activation approaches 0 or 1 (from derivative of sigmoid) 13 Proof – Equation 1 ▪ Steps 1. Definition of error 2. Chain rule 3. k=j 4. BP1 (components) 14 Proof – Equation 2 ▪ Steps 1. Definition of error 2. Chain rule 3. Substitute definition of error 4. Derivative of weighted input vector 5. BP2 (components) ▪ Recall: 15 Example – Thesis Related Work 16 References ▪ Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, (2015) ▪ Bordes et al. “Translating embeddings for modeling multi-relational data”, NIPS'13, (2013) 17.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us