Markov Chains and Hidden Markov Models

COMP 182 Algorithmic Thinking Luay Nakhleh Markov Chains and Computer Science Hidden Markov Models Rice University ❖ What is p(01110000)? ❖ Assume: ❖ 8 independent Bernoulli trials with success probability of α? ❖ Answer: ❖ (1-α)5α3 ❖ However, what if the assumption of independence doesn’t hold? ❖ That is, what if the outcome in a Bernoulli trial depends on the outcomes of the trials the preceded it? Markov Chains ❖ Given a sequence of observations X1,X2,…,XT ❖ The basic idea behind a Markov chain (or, Markov model) is to assume that Xt captures all the relevant information for predicting the future. ❖ In this case: T p(X1X2 ...XT )=p(X1)p(X2 X1)p(X3 X2) p(XT XT 1)=p(X1) p(Xt Xt 1) | | ··· | − | − t=2 Y Markov Chains ❖ When Xt is discrete, so Xt∈{1,…,K}, the conditional distribution p(Xt|Xt-1) can be written as a K×K matrix, known as the transition matrix A, where Aij=p(Xt=j|Xt-1=i) is the probability of going from state i to state j. ❖ Each row of the matrix sums to one, so this is called a stochastic matrix. 590 Chapter 17. Markov and hidden Markov models Markov Chains 1 α 1 β − α − A11 A22 A33 1 2 A12 A23 1 2 3 β ❖ A finite-state Markov chain is equivalent(a) (b) Figure 17.1 State transition diagrams for some simple Markov chains. Left: a 2-state chain. Right: a to a stochastic automaton3-state left-to-right. chain. ❖ One way to represent aAstationary,finite-stateMarkovchainisequivalenttoa Markov chain is stochastic automaton.Itiscommon 590 to visualizeChapter such 17. automataMarkov and by hidden drawing Markov a directed models graph, where nodes represent states and arrows through a state transitionrepresent legaldiagram transitions, i.e., non-zero elements of A.Thisisknownasastate transition diagram.Theweightsassociatedwiththearcsaretheprobabilities.Forexample,thefollowing 2-state chain 1 α 1 β − α A11 A22 A33 − 1 αα A = A12 A23 (17.2) 1 2 −β 1 β ! 1 − 2 " 3 β is illustrated in Figure 17.1(left). The following 3-state chain (a) A11 A12 (b) 0 A = 0 A22 A23 (17.3) Figure 17.1 State transition diagrams for some simple⎛ Markov001 chains. Left: a⎞ 2-state chain. Right: a 3-state left-to-right chain. is illustrated⎝ in Figure 17.1(right).⎠ This is called a left-to-right transition matrix,andiscom- monly used in speech recognition (Section 17.6.2). Astationary,finite-stateMarkovchainisequivalenttoastochastic automaton.Itiscommon The A element of the transition matrix specifies the probability of getting from i to j in to visualize such automata by drawing a directed graph,ij where nodes represent states and arrows represent legal transitions, i.e., non-zero elementsone step. of TheA.Thisisknownasan-step transition matrixstate transitionA(n) is defined as diagram.Theweightsassociatedwiththearcsaretheprobabilities.Forexample,thefollowing Aij(n) ! p(Xt+n = j Xt = i) (17.4) 2-state chain | 1 αα which is the probability of getting from i to j in exactly n steps. Obviously A(1) = A.The A = (17.2) −β 1 β Chapman-Kolmogorov equations state that ! − " is illustrated in Figure 17.1(left). The following 3-state chain K Aij(m + n)= Aik(m)Akj(n) (17.5) A11 A12 0 k=1 A = 0 A22 A23 ' (17.3) ⎛ 001⎞ In words, the probability of getting from i to j in m + n steps is just the probability of getting is illustrated⎝ in Figure 17.1(right).⎠ This is calledfrom ai left-to-rightto k in m steps, transition and then matrix from,andiscom-k to j in n steps, summed up over all k.Wecanwrite monly used in speech recognition (Sectionthe 17.6.2). above as a matrix multiplication The Aij element of the transition matrix specifiesA(m + then)= probabilityA(m)A of(n getting) from i to j in (17.6) one step. The n-step transition matrix A(n) is defined as Hence A (n) p(X = j X = i) (17.4) ij ! t+n | t A(n)=AA(n 1) = AAA(n 2) = = An (17.7) which is the probability of getting from i to j in exactly n steps.− Obviously A(1) =−A.The··· Chapman-Kolmogorov equations state thatThus we can simulate multiple steps of a Markov chain by “powering up” the transition matrix. K Aij(m + n)= Aik(m)Akj(n) (17.5) k'=1 In words, the probability of getting from i to j in m + n steps is just the probability of getting from i to k in m steps, and then from k to j in n steps, summed up over all k.Wecanwrite the above as a matrix multiplication A(m + n)=A(m)A(n) (17.6) Hence A(n)=AA(n 1) = AAA(n 2) = = An (17.7) − − ··· Thus we can simulate multiple steps of a Markov chain by “powering up” the transition matrix. Markov Chains ❖ The Aij element of the transition matrix specifies the probability of getting from i to j in one step. ❖ The n-step transition matrix A(n) is defined as A (n)=p(X = j X = i) ij t+n | t Markov Chains ❖ Obviously, A(1)=A. ❖ The Chapman-Kolmogorov equations state that K Aij(m + n)= Aik(m)Akj(n) kX=1 Markov Chains ❖ Therefore, we have A(m+n)=A(m)A(n). ❖ Hence, A(n)=AA(n-1)=AAA(n-2)=…=An. ❖ Thus, we can simulate multiple steps of a Markov chain by “powering up” the transition matrix. Language Modeling ❖ One important application of Markov models is to make statistical language models, which are probability distributions over sequences of words. ❖ We define the state space to be all the words in English (or, the language of interest). ❖ The probabilities p(Xt=k) are called unigram statistics. ❖ If we use a first-order Markov model, then p(Xt=k|Xt-1=j) is called a bigram model. 592 Language ModelingChapter 17. Markov and hidden Markov models Bigrams Unigrams _ a b c d e f g h i j k l m n o p q r s t u v w x y z 1 0.16098 _ _ 2 0.06687 a a 3 0.01414 b b 4 0.02938 c c 5 0.03107 d d 6 0.11055 e e 7 0.02325 f f 8 0.01530 g g 9 0.04174 h h 10 0.06233 i i 11 0.00060 j j 12 0.00309 k k 13 0.03515 l l 14 0.02107 m m 15 0.06007 n n 16 0.06066 o o 17 0.01594 p p 18 0.00077 q q 19 0.05265 r r 20 0.05761 s s 21 0.07566 t t 22 0.02149 u u 23 0.00993 v v 24 0.01341 w w 25 0.00208 x x 26 0.01381 y y 27 0.00039 z z Unigram and bigram counts for Darwin’s On the Origin of Species Figure 17.2 Unigram and bigram counts from Darwin’s On The Origin Of Species.The2Dpictureonthe right is a Hinton diagram of the joint distribution. The size of the white squares is proportional to the value of the entry in the corresponding vector/ matrix. Based on (MacKay 2003, p22). Figure generated by ngramPlot. 17.2.2.1 MLE for Markov language models We now discuss a simple way to estimate the transition matrix from training data. The probability of any particular sequence of length T is given by p(x1:T θ)=π(x1)A(x1,x2 ) ...A(xT 1,xT ) (17.8) | − K T K K I(x1=j) I(xt=k,xt 1=j) = (πj ) (Ajk) − (17.9) j=1 t=2 j=1 ! ! ! k!=1 Hence the log-likelihood of a set of sequences =(x ,...,x ),wherex =(x ,...,x ) D 1 N i i1 i,Ti is a sequence of length Ti,isgivenby N logp( θ)= logp(x θ)= N 1 logπ + N logA (17.10) D| i| j j jk jk i=1 j j " " " "k where we define the following counts: N N T 1 i− 1 Nj ! I(xi1 = j),Njk ! I(xi,t = j, xi,t+1 = k) (17.11) i=1 i=1 t=1 " " " Parameter Estimation for Markov Chains ❖ The parameters of a Markov chain, denoted by θ, consist of the transition matrix (A) and the distribution on the initial states (#). ❖ We want to estimate these parameters from a training data set. ❖ Such a data set consists of sequences X1,X2,…,Xm. ❖ Sequence Xk has length Lk. Parameter Estimation for Markov Chains ❖ The maximum likelihood estimate (MLE) of the parameters is easy to obtain from the data: m m L 1 i− 1 i i i Nj = I(X1 = j) Njk = I(Xt = j, Xt+1 = k) i=1 i=1 t=1 X X X N 1 N j Aˆ = jk ⇡ˆj = 1 jk Njk i Ni k0 0 P P Parameter Estimation for Markov Chains ❖ The maximum likelihood estimate (MLE) of the parameters is easy to obtain from the data: m m L 1 i− 1 i i i Nj = I(X1 = j) Njk = I(Xt = j, Xt+1 = k) i=1 i=1 t=1 X X X N 1 N j Aˆ = jk ⇡ˆj = 1 jk Njk i Ni k0 0 1ife is true Indicator function:P (e)= P I 0ife is false ⇢ Parameter Estimation for Markov Chains ❖ It is very important to handle zero counts properly (this is called smoothing). Hidden Markov Models ❖ A hidden Markov model, or HMM, consists of a discrete-time, discrete state Markov chain, with hidden states Zt∈{1,…,K}, plus an observation model p(Xt|Zt) (emission probabilities).

Markov Chains and Hidden Markov Models

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support