Hidden Markov Models + Conditional Random Fields

10-715 Advanced Intro. to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models + Conditional Random Fields Matt Gormley Guest Lecture 2 Oct. 31, 2018 1 1. Data 2. Model (n) N 1 = x n=1 p(x ✓)= (x ) D { } | Z(✓) C C n v p d n Sample 1: C time flies like an arrow Y2C T1 T2 T3 T4 T5 Sample n n v d n 2: W W W W W time flies like an arrow 1 2 3 4 5 Sample n v p n n 3: flies fly with their wings 3. Objective N Sample p n n v v (n) 4: `(✓; )= log p(x ✓) with time you will see D | n=1 X 5. Inference 4. Learning 1. Marginal Inference p(x )= p(x0 ✓) ✓⇤ = argmax `(✓; ) C | x :x =x D 0 C0 C ✓ 2. Partition Function X Z(✓)= C (xC ) x C 3. MAP Inference X Y2C xˆ = argmax p(x ✓) x | 2 HIDDEN MARKOV MODEL (HMM) 3 Dataset for Supervised Part-of-Speech (POS) Tagging Data: = x(n), y(n) N D { }n=1 n v p d n y(1) Sample 1: time flies like an arrow x(1) n n v d n y(2) Sample 2: time flies like an arrow x(2) (3) n v p n n y Sample 3: (3) flies fly with their wings x (4) p n n v v y Sample 4: (4) with time you will see x 4 Naïve Bayes for Time Series Data We could treat each word-tag pair (i.e. token) as independent. This corresponds to a Naïve Bayes model with a single feature (the word). p(n, v, p, d, n, time, flies, like, an, arrow) = (.3 * .8 * .1 * .5 * …) v .1 v .1 n .8 n .8 p .2 p .2 d .2 d .2 n v p d n … … like like flies flies time time flies time like an arrow v .2 .5 .2 v .2 .5 .2 n .3 .4 .2 n .3 .4 .2 p .1 .1 .3 p .1 .1 .3 d .1 .2 .1 d .1 .2 .1 5 Hidden Markov Model A Hidden Markov Model (HMM) provides a joint distribution over the the sentence/tags with an assumption of dependence between adjacent tags. p(n, v, p, d, n, time, flies, like, an, arrow) = (.3 * .8 * .2 * .5 * …) v n p d v n p d v .1 .4 .2 .3 v .1 .4 .2 .3 n .8 .1 .1 0 n .8 .1 .1 0 p .2 .3 .2 .3 p .2 .3 .2 .3 d .2 .8 0 0 d .2 .8 0 0 <START> n v p d n … … like like flies flies time time flies time like an arrow v .2 .5 .2 v .2 .5 .2 n .3 .4 .2 n .3 .4 .2 p .1 .1 .3 p .1 .1 .3 d .1 .2 .1 d .1 .2 .1 6 From Mixture Model to HMM Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 “Naïve Bayes”: Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 HMM: 7 From Mixture Model to HMM Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 “Naïve Bayes”: Y0 Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 HMM: 8 SUPERVISED LEARNING FOR HMMS 10 Hidden Markov Model HMM Parameters: O S C O S C O .8 O .9 .08.02 O .9 .08.02 S .1 S .2 .7 .1 S .2 .7 .1 C .1 C .9 0 .1 C .9 0 .1 Y1 Y2 Y3 Y4 Y5 … … 1min 2min 3min 1min 2min 3min X1 X2 X3 X4 X5 O .1 .2 .3 O .1 .2 .3 S .01 .02.03 S .01 .02.03 C 0 0 0 C 0 0 0 12 Hidden Markov Model HMM Parameters: Assumption: y0 = START Generative Story: For notational convenience, we fold the initial probabilities C into the transition matrix B by our assumption. Y0 Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 13 Hidden Markov Model Joint Distribution: y0 = START Y0 Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 14 Training HMMs Whiteboard – (Supervised) Likelihood for an HMM – Maximum Likelihood Estimation (MLE) for HMM 15 Supervised Learning for HMMs Learning an HMM decomposes into solving two (independent) Mixture Models Yt Yt+1 Yt Xt 16 Supervised Learning for HMMs Learning an HMM decomposes into solving two (independent) Mixture Models Yt Yt+1 Yt Xt 17 HMMs: History • Markov chains: Andrey Markov (1906) – Random walks and Brownian motion • Used in Shannon’s work on information theory (1948) • Baum-Welsh learning algorithm: late 60’s, early 70’s. – Used mainly for speech in 60s-70s. • Late 80’s and 90’s: David Haussler (major player in learning theory in 80’s) began to use HMMs for modeling biological sequences • Mid-late 1990’s: Dayne Freitag/Andrew McCallum – Freitag thesis with Tom Mitchell on IE from Web using logic programs, grammar induction, etc. – McCallum: multinomial Naïve Bayes for text – With McCallum, IE using HMMs on CORA • … 19 Slide from William Cohen Higher-order HMMs • 1st-order HMM (i.e. bigram HMM) <START> Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 • 2nd-order HMM (i.e. trigram HMM) <START> Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 • 3rd-order HMM <START> Y1 Y2 Y3 Y4 Y5 X1 X2 X3 X4 X5 20 BACKGROUND: MESSAGE PASSING 21 Great Ideas in ML: Message Passing Count the soldiers there's 1 of me 1 2 3 4 5 before before before before before you you you you you 5 4 3 2 1 behind behind behind behind behind you you you you you 22 adapted from MacKay (2003) textbook Great Ideas in ML: Message Passing Count the soldiers there's Belief: 1 of me Must be 22 + 11 + 3 = 6 of 2 us before you only see 3 my incoming behind messages you 23 adapted from MacKay (2003) textbook Great Ideas in ML: Message Passing Count the soldiers there's Belief: Belief: 1 of me Must be Must be 11 + 11 + 4 = 6 of 22 + 11 + 3 = 6 of 1 before us us you only see 4 my incoming behind messages you 24 adapted from MacKay (2003) textbook Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11 here (= 7+3+1) 25 adapted from MacKay (2003) textbook Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here (= 3+3+1) 3 here 26 adapted from MacKay (2003) textbook Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 11 here (= 7+3+1) 7 here 3 here 27 adapted from MacKay (2003) textbook Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here Belief: Must be 3 here 14 of us 28 adapted from MacKay (2003) textbook Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here Belief: Must be 3 here 14 of us 29 adapted from MacKay (2003) textbook THE FORWARD-BACKWARD ALGORITHM 30 Inference for HMMs Whiteboard – Three Inference Problems for an HMM 1. Evaluation: Compute the probability of a given sequence of observations 2. Decoding: Find the most-likely sequence of hidden states, given a sequence of observations 3. Marginals: Compute the marginal distribution for a hidden state, given a sequence of observations 31 Dataset for Supervised Part-of-Speech (POS) Tagging Data: = x(n), y(n) N D { }n=1 n v p d n y(1) Sample 1: time flies like an arrow x(1) n n v d n y(2) Sample 2: time flies like an arrow x(2) (3) n v p n n y Sample 3: (3) flies fly with their wings x (4) p n n v v y Sample 4: (4) with time you will see x 32 Hidden Markov Model A Hidden Markov Model (HMM) provides a joint distribution over the the sentence/tags with an assumption of dependence between adjacent tags. p(n, v, p, d, n, time, flies, like, an, arrow) = (.3 * .8 * .2 * .5 * …) v n p d v n p d v .1 .4 .2 .3 v .1 .4 .2 .3 n .8 .1 .1 0 n .8 .1 .1 0 p .2 .3 .2 .3 p .2 .3 .2 .3 d .2 .8 0 0 d .2 .8 0 0 <START> n v p d n … … like like flies flies time time flies time like an arrow v .2 .5 .2 v .2 .5 .2 n .3 .4 .2 n .3 .4 .2 p .1 .1 .3 p .1 .1 .3 d .1 .2 .1 d .1 .2 .1 34 Forward-Backward Algorithm Y1 Y2 Y3 X1 X2 X3 find preferred tags Could be verb or noun Could be adjective or verb Could be noun or verb 35 Forward-Backward Algorithm Y1 Y2 Y3 X1 X2 X3 find preferred tags 36 Forward-Backward Algorithm Y1 Y2 Y3 v v v START n n n END a a a X1 X2 X3 find preferred tags • Let’s show the possible values for each variable • One possible assignment • And what the 7 factors think of it … 37 Forward-Backward Algorithm Y1 Y2 Y3 v v v START n n n END a a a X1 X2 X3 find preferred tags • Let’s show the possible values for each variable • One possible assignment • And what the 7 factors think of it … 38 Forward-Backward Algorithm Y1 Y2 Y3 v v v START n n n END a a a X1 X2 X3 find preferred tags • Let’s show the possible values for each variable • One possible assignment • And what the 7 factors think of it … 39 Forward-Backward Algorithm Y1 Y2 Y3 v v v START n n n END a a a X1 X2 X3 find preferred tags • Let’s show the possible values for each variable • One possible assignment • And what the 7 transition / emission factors think of it … 40 Forward-Backward Algorithm v n a Y Y Y 1 v 1 6 4 2 3 n 8 4 0.1 v a 0.1 8 0 v v START n n n END … a a a find tags pref.

Hidden Markov Models + Conditional Random Fields

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support