Deep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: Yingyu Liang Motivation I: Representation Learning Machine Learning 1-2-3

Deep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: Yingyu Liang Motivation I: representation learning Machine learning 1-2-3 • Collect data and extract features • Build model: choose hypothesis class 퓗 and loss function 푙 • Optimization: minimize the empirical loss Features 푥 Color Histogram Extract build 푇 features hypothesis 푦 = 푤 휙 푥 Red Green Blue Features: part of the model Nonlinear model build 푇 hypothesis 푦 = 푤 휙 푥 Linear model Example: Polynomial kernel SVM 푥1 푦 = sign(푤푇휙(푥) + 푏) 푥2 Fixed 휙 푥 Motivation: representation learning • Why don’t we also learn 휙 푥 ? Learn 휙 푥 Learn 푤 휙 푥 푦 = 푤푇휙 푥 푥 Feedforward networks • View each dimension of 휙 푥 as something to be learned … 푦 = 푤푇휙 푥 … 푥 휙 푥 Feedforward networks 푇 • Linear functions 휙푖 푥 = 휃푖 푥 don’t work: need some nonlinearity … 푦 = 푤푇휙 푥 … 푥 휙 푥 Feedforward networks 푇 • Typically, set 휙푖 푥 = 푟(휃푖 푥) where 푟(⋅) is some nonlinear function … 푦 = 푤푇휙 푥 … 푥 휙 푥 Feedforward deep networks • What if we go deeper? … … … … 푦 … … 푥 ℎ1 ℎ2 ℎ퐿 Figure from Deep learning, by Goodfellow, Bengio, Courville. Dark boxes are things to be learned. Motivation II: neurons Motivation: neurons Figure from Wikipedia Motivation: abstract neuron model • Neuron activated when the correlation between the input and a pattern 휃 exceeds some threshold 푏 푥1 • 푦 = threshold(휃푇푥 − 푏) 푥2 or 푦 = 푟(휃푇푥 − 푏) • 푟(⋅) called activation function 푦 푥푑 Motivation: artificial neural networks Motivation: artificial neural networks • Put into layers: feedforward deep networks … … … … 푦 … … 푥 ℎ1 ℎ2 ℎ퐿 Components in Feedforward networks Components • Representations: • Input • Hidden variables • Layers/weights: • Hidden layers • Output layer Components First layer Output layer … … … … 푦 … … Input 푥 Hidden variables ℎ1 ℎ2 ℎ퐿 Input • Represented as a vector • Sometimes require some preprocessing, e.g., Expand • Subtract mean • Normalize to [-1,1] Output layers Output layer • Regression: 푦 = 푤푇ℎ + 푏 • Linear units: no nonlinearity 푦 ℎ Output layers Output layer • Multi-dimensional regression: 푦 = 푊푇ℎ + 푏 • Linear units: no nonlinearity 푦 ℎ Output layers Output layer • Binary classification: 푦 = 휎(푤푇ℎ + 푏) • Corresponds to using logistic regression on ℎ 푦 ℎ Output layers Output layer • Multi-class classification: • 푦 = softmax 푧 where 푧 = 푊푇ℎ + 푏 • Corresponds to using multi-class logistic regression on ℎ 푧 푦 ℎ Hidden layers • Neuron take weighted linear combination of the previous … layer • So can think of outputting one value for the next layer … ℎ푖 ℎ푖+1 Hidden layers • 푦 = 푟(푤푇푥 + 푏) • Typical activation function 푟 푟(⋅) • Threshold t 푧 = 핀[푧 ≥ 0] • Sigmoid 휎 푧 = 1/ 1 + exp(−푧) 푥 푦 • Tanh tanh 푧 = 2휎 2푧 − 1 Hidden layers • Problem: saturation 푟(⋅) 푥 푦 Too small gradient Figure borrowed from Pattern Recognition and Machine Learning, Bishop Hidden layers • Activation function ReLU (rectified linear unit) • ReLU 푧 = max{푧, 0} Figure from Deep learning, by Goodfellow, Bengio, Courville. Hidden layers • Activation function ReLU (rectified linear unit) • ReLU 푧 = max{푧, 0} Gradient 1 Gradient 0 Hidden layers • Generalizations of ReLU gReLU 푧 = max 푧, 0 + 훼 min{푧, 0} • Leaky-ReLU 푧 = max{푧, 0} + 0.01 min{푧, 0} • Parametric-ReLU 푧 : 훼 learnable gReLU 푧 푧.

Deep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: Yingyu Liang Motivation I: Representation Learning Machine Learning 1-2-3

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support