Neural Networks and Deep Learning

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 39 • HW1 turned in • HW2 released • Office hour • Group formation signup Machine Learning: Chenhao Tan j Boulder j 2 of 39 Overview Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 3 of 39 Feature engineering Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 4 of 39 Feature engineering Feature Engineering Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan j Boulder j 5 of 39 Feature engineering Brainstorming What are features useful for sentiment analysis? Machine Learning: Chenhao Tan j Boulder j 6 of 39 Feature engineering What are features useful for sentiment analysis? • Unigram • Bigram • Normalizing options • Part-of-speech tagging • Parse-tree related features • Negation related features • Additional resources Machine Learning: Chenhao Tan j Boulder j 7 of 39 • find high-frequency words and content words • replace content words with “CW” • extract patterns, e.g., “does not CW much about CW” [Tsur et al., 2010] Feature engineering Sarcasm detection “Trees died for this book?” (book) Machine Learning: Chenhao Tan j Boulder j 8 of 39 Feature engineering Sarcasm detection “Trees died for this book?” (book) • find high-frequency words and content words • replace content words with “CW” • extract patterns, e.g., “does not CW much about CW” [Tsur et al., 2010] Machine Learning: Chenhao Tan j Boulder j 8 of 39 Feature engineering More examples: Which one will be retweeted more? [Tan et al., 2014] https://chenhaot.com/papers/wording-for-propagation.html Machine Learning: Chenhao Tan j Boulder j 9 of 39 Revisiting Logistic Regression Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 10 of 39 Revisiting Logistic Regression Revisiting Logistic Regression 1 P(Y = 0 j x; β) = P 1 + exp [β0 + i βiXi] P exp [β0 + i βiXi] P(Y = 1 j x; β) = P 1 + exp [β0 + i βiXi] X L = − log P(y(j) j X(j); β) j Machine Learning: Chenhao Tan j Boulder j 11 of 39 Revisiting Logistic Regression Revisiting Logistic Regression • Transformation on x (we map class labels from f0; 1g to f1; 2g): T li = βi x; i = 1; 2 exp li oi = P ; i = 1; 2 c2f1;2g exp lc P • Objective function (using cross entropy − i pi log qi): X (j) (j) (j) L (Y; Y^) = − P(y = 1) log P(^yi = 1 j x ; β) + P(y = 0) log P^(yi = 0 j Xi) j Machine Learning: Chenhao Tan j Boulder j 12 of 39 Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Linear Softmax layer x1 x2 l1 o1 ... l2 o2 xd Machine Learning: Chenhao Tan j Boulder j 13 of 39 Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Single layer Layer x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 14 of 39 Feed Forward Networks Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan j Boulder j 15 of 39 Feed Forward Networks Deep Neural networks A two-layer example (one hidden layer) Input Hidden Output x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 16 of 39 Feed Forward Networks Deep Neural networks More layers: Input Hidden 1 Hidden 2 Hidden 3 Output x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 17 of 39 Feed Forward Networks Forward propagation algorithm How do we make predictions based on a multi-layer neural network? Store the biases for layer l in bl, weight matrix in Wl W1; b1 W2; b2 W3; b3 W4; b4 x1 x2 o1 ... o2 xd Machine Learning: Chenhao Tan j Boulder j 18 of 39 Feed Forward Networks Forward propagation algorithm Suppose your network has L layers Make a prediction based on text point x 1: Initialize a0 = x 2: for l = 1 to L do 3: zl = Wlal−1 + bl 4: al = g(zl) 5: end for 6: The prediction ^y is simply aL Machine Learning: Chenhao Tan j Boulder j 19 of 39 Linear combinations of linear combinations are still linear combinations. Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Machine Learning: Chenhao Tan j Boulder j 20 of 39 Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Linear combinations of linear combinations are still linear combinations. Machine Learning: Chenhao Tan j Boulder j 20 of 39 Feed Forward Networks Neural networks in a nutshell • Training data Strain = f(x; y)g • Network architecture (model) ^y = fw(x) • Loss function (objective function) L (y; ^y) • Learning (next lecture) Machine Learning: Chenhao Tan j Boulder j 21 of 39 Feed Forward Networks Nonlinearity Options • Sigmoid 1 f (x) = 1 + exp(x) • tanh exp(x) − exp(−x) f (x) = exp(x) + exp(−x) • ReLU (rectified linear unit) f (x) = max(0; x) • softmax exp(x) x = P exp(x ) xi i https://keras.io/activations/ Machine Learning: Chenhao Tan j Boulder j 22 of 39 Feed Forward Networks Nonlinearity Options Machine Learning: Chenhao Tan j Boulder j 23 of 39 Feed Forward Networks Loss Function Options • `2 loss X 2 (yi − ^yi) i • ` loss 1 X jyi − ^yij i • Cross entropy X − yi log ^yi i • Hinge loss (more on this during SVM) max(0; 1 − y^y) https://keras.io/losses/ Machine Learning: Chenhao Tan j Boulder j 24 of 39 We consider a simple activation function ( 1 z ≥ 0 f (z) = 0 z < 0 Feed Forward Networks A Perceptron Example x = (x1; x2); y = f (x1; x2) b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 25 of 39 Feed Forward Networks A Perceptron Example x = (x1; x2); y = f (x1; x2) b x1 o1 x2 We consider a simple activation function ( 1 z ≥ 0 f (z) = 0 z < 0 Machine Learning: Chenhao Tan j Boulder j 25 of 39 w = (1; 1); b = −0:5 b x1 o1 x2 Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR? x1 0 1 0 1 x2 0 0 1 1 y = x1 _ x2 0 1 1 1 Machine Learning: Chenhao Tan j Boulder j 26 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR? x1 0 1 0 1 x2 0 0 1 1 y = x1 _ x2 0 1 1 1 w = (1; 1); b = −0:5 b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 26 of 39 w = (1; 1); b = −1:5 b x1 o1 x2 Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND? x1 0 1 0 1 x2 0 0 1 1 y = x1 ^ x2 0 0 0 1 Machine Learning: Chenhao Tan j Boulder j 27 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND? x1 0 1 0 1 x2 0 0 1 1 y = x1 ^ x2 0 0 0 1 w = (1; 1); b = −1:5 b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 27 of 39 w = (−1; −1); b = 0:5 b x1 o1 x2 Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND? x1 0 1 0 1 x2 0 0 1 1 y = :(x1 ^ x2) 1 0 0 0 Machine Learning: Chenhao Tan j Boulder j 28 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND? x1 0 1 0 1 x2 0 0 1 1 y = :(x1 ^ x2) 1 0 0 0 w = (−1; −1); b = 0:5 b x1 o1 x2 Machine Learning: Chenhao Tan j Boulder j 28 of 39 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 Machine Learning: Chenhao Tan j Boulder j 29 of 39 But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! Machine Learning: Chenhao Tan j Boulder j 29 of 39 The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! But why? Machine Learning: Chenhao Tan j Boulder j 29 of 39 How can we fix this? Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. Machine Learning: Chenhao Tan j Boulder j 29 of 39 Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR? x1 0 1 0 1 x2 0 0 1 1 x1 XOR x2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable.

Load more