Neural Networks and Deep Learning

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 39 • HW1 turned in • HW2 released • Office hour • Group formation signup Machine Learning: Chenhao Tan j Boulder j 2 of 39 Overview Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 3 of 39 Feature engineering Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 4 of 39 Feature engineering Feature Engineering Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan j Boulder j 5 of 39 Feature engineering Brainstorming What are features useful for sentiment analysis? Machine Learning: Chenhao Tan j Boulder j 6 of 39 Feature engineering What are features useful for sentiment analysis? • Unigram • Bigram • Normalizing options • Part-of-speech tagging • Parse-tree related features • Negation related features • Additional resources Machine Learning: Chenhao Tan j Boulder j 7 of 39 • find high-frequency words and content words • replace content words with “CW” • extract patterns, e.g., “does not CW much about CW” [Tsur et al., 2010] Feature engineering Sarcasm detection “Trees died for this book?” (book) Machine Learning: Chenhao Tan j Boulder j 8 of 39 Feature engineering Sarcasm detection “Trees died for this book?” (book) • find high-frequency words and content words • replace content words with “CW” • extract patterns, e.g., “does not CW much about CW” [Tsur et al., 2010] Machine Learning: Chenhao Tan j Boulder j 8 of 39 Feature engineering More examples: Which one will be retweeted more? [Tan et al., 2014] https://chenhaot.com/papers/wording-for-propagation.html Machine Learning: Chenhao Tan j Boulder j 9 of 39 Revisiting Logistic Regression Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 10 of 39 Revisiting Logistic Regression Revisiting Logistic Regression 1 P(Y = 0 j x; β) = P 1 + exp [β0 + i βiXi] P exp [β0 + i βiXi] P(Y = 1 j x; β) = P 1 + exp [β0 + i βiXi] X L = − log P(y(j) j X(j); β) j Machine Learning: Chenhao Tan j Boulder j 11 of 39 Revisiting Logistic Regression Revisiting Logistic Regression • Transformation on x (we map class labels from f0; 1g to f1; 2g): T li = βi x; i = 1; 2 exp li oi = P ; i = 1; 2 c2f1;2g exp lc P • Objective function (using cross entropy − i pi log qi): X (j) (j) (j) L (Y; Y^) = − P(y = 1) log P(^yi = 1 j x ; β) + P(y = 0) log P^(yi = 0 j Xi) j Machine Learning: Chenhao Tan j Boulder j 12 of 39 Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Linear Softmax layer x1 x2 l1 o1 ... l2 o2 xd Machine Learning: Chenhao Tan j Boulder j 13 of 39 Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Single layer Layer x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 14 of 39 Feed Forward Networks Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 15 of 39 Feed Forward Networks Deep Neural networks A two-layer example (one hidden layer) Input Hidden Output x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 16 of 39 Feed Forward Networks Deep Neural networks More layers: Input Hidden 1 Hidden 2 Hidden 3 Output x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 17 of 39 Feed Forward Networks Forward propagation algorithm How do we make predictions based on a multi-layer neural network? Store the biases for layer l in bl, weight matrix in Wl W1; b1 W2; b2 W3; b3 W4; b4 x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 18 of 39 Feed Forward Networks Forward propagation algorithm Suppose your network has L layers Make a prediction based on text point x 1: Initialize a0 = x 2: for l = 1 to L do 3: zl = Wlal−1 + bl 4: al = g(zl) 5: end for 6: The prediction ^y is simply aL Machine Learning: Chenhao Tan j Boulder j 19 of 39 Linear combinations of linear combinations are still linear combinations. Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Machine Learning: Chenhao Tan j Boulder j 20 of 39 Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Linear combinations of linear combinations are still linear combinations. Machine Learning: Chenhao Tan j Boulder j 20 of 39 Feed Forward Networks Neural networks in a nutshell • Training data Strain = f(x; y)g • Network architecture (model) ^y = fw(x) • Loss function (objective function) L (y; ^y) • Learning (next lecture) Machine Learning: Chenhao Tan j Boulder j 21 of 39 Feed Forward Networks Nonlinearity Options • Sigmoid 1 f (x) = 1 + exp(x) • tanh exp(x) − exp(−x) f (x) = exp(x) + exp(−x) • ReLU (rectified linear unit) f (x) = max(0; x) • softmax exp(x) x = P exp(x ) xi i https://keras.io/activations/ Machine Learning: Chenhao Tan j Boulder j 22 of 39 Feed Forward Networks Nonlinearity Options Machine Learning: Chenhao Tan j Boulder j 23 of 39 Feed Forward Networks Loss Function Options • `2 loss X 2 (yi − ^yi) i • ` loss 1 X jyi − ^yij i • Cross entropy X − yi log ^yi i • Hinge loss (more on this during SVM) max(0; 1 − y^y) https://keras.io/losses/ Machine Learning: Chenhao Tan j Boulder j 24 of 39 We consider a simple activation function ( 1 z ≥ 0 f (z) = 0 z < 0 Feed Forward Networks A Perceptron Example x = (x1; x2); y = f (x1; x2) b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 25 of 39 Feed Forward Networks A Perceptron Example x = (x1; x2); y = f (x1; x2) b x1 o1 x2 We consider a simple activation function ( 1 z ≥ 0 f (z) = 0 z < 0 Machine Learning: Chenhao Tan j Boulder j 25 of 39 w = (1; 1); b = −0:5 b x1 o1 x2 Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR? x1 0 1 0 1 x2 0 0 1 1 y = x1 _ x2 0 1 1 1 Machine Learning: Chenhao Tan j Boulder j 26 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR? x1 0 1 0 1 x2 0 0 1 1 y = x1 _ x2 0 1 1 1 w = (1; 1); b = −0:5 b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 26 of 39 w = (1; 1); b = −1:5 b x1 o1 x2 Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND? x1 0 1 0 1 x2 0 0 1 1 y = x1 ^ x2 0 0 0 1 Machine Learning: Chenhao Tan j Boulder j 27 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND? x1 0 1 0 1 x2 0 0 1 1 y = x1 ^ x2 0 0 0 1 w = (1; 1); b = −1:5 b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 27 of 39 w = (−1; −1); b = 0:5 b x1 o1 x2 Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND? x1 0 1 0 1 x2 0 0 1 1 y = :(x1 ^ x2) 1 0 0 0 Machine Learning: Chenhao Tan j Boulder j 28 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND? x1 0 1 0 1 x2 0 0 1 1 y = :(x1 ^ x2) 1 0 0 0 w = (−1; −1); b = 0:5 b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 28 of 39 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 Machine Learning: Chenhao Tan j Boulder j 29 of 39 But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! Machine Learning: Chenhao Tan j Boulder j 29 of 39 The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! But why? Machine Learning: Chenhao Tan j Boulder j 29 of 39 How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. Machine Learning: Chenhao Tan j Boulder j 29 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable.

Neural Networks and Deep Learning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support