Applications of Hidden Markov Models s1

Reasoning about Probabilities of Transitions and Observations: Forward Backward Algorithm

Spelling recognition and the spelling model: The diagrapm below depcits the transition probabilities of a spelling model regarding how a particular person P may type a 3-character word abc as depiced in our class handout on HMMs for spelling recognition. The information provided by the diagram correspond to the information encoded in vector π and matrix A in Rabiner’s tutorial paper on HMMs where Rabiner describes a HMM in terms of (π , A , B).

Spelling recognition and the keyboard model: In the following, let’s use a simplied 1-dimensioal keyboard of only 4 keys a, b, c, d as depiced in the simplified example in our class handout on HMMs for spelling recognition. The information provided by tsuch keyboard mdeol corresponds to the information encoded in matrix B in Rabiner’s tutorial paper on HMMs where Rabiner describes a HMM in terms of (π , A , B).

1 Questions: Consider the situation where we observe the character string bbd (i.e. seeing the observation sequence O: ReadyToTypebbdEndOfWord) as the result of the person trying to type the word abc. We don’t know the underlying state sequence Q= q1 q2 q3 q4 q5 execept that q1 = I and q5 = F, but we would like to reason about the most likely state for q2, the most likely state for q3, and the most likely state for q4. What are the most likely state for q2, the most likely state for q3, and the most likely state for q4 respectively.

The forward backward algorithm: According to Equations 26~29 on page 263 in Rabiner’s tutorial paper, to determine the most likely state for qt (the state at a time point t, for example t = 2, 3, or 4 in our case) we should determine the probabilities of Pr(qt=Si | O, λ) for evey state Si first.

The most likely state for qt is simply the state that maximize Pr(qt=Si | O, λ) among all the posisble states Si. In the same way, we can determine the most likely state for q3, and the most likely state for q4.

According to Equation 29 on page 263, the state that maximizes Pr(qt=Si | O, λ) is simply the state that maximizes αt(i)*βt(i) among all the posisble states Si. Here αt(i) corresponds to the entry on column t for state Si in the table you got from Homework

#2B by the forward algorithm while βt(i) is a slightly modified variant of the entry on column t for state Si in the table very similar to what you got from Homework #3A by the backward algorithm. In the following, we show how we can calculate the table of βt(i)’s.

2 Calculate βt(i)’s using a variant of the backward algorithm:

Incrementally determine the following βt(i)’s: determine β5(i)’s for state I, state a, or state b, or state c, or state F by setting them all to 1,

determine β4(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively then visiting the last state and seeing EndOfWord,

determine β3(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively then visiting the last 2 states and seeing d EndOfWord,

determine β2(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively then visiting the last 3 states and seeing bd EndOfWord,

determine β1(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively then visiting the last 4 states and seeing bbd EndOfWord,

ReadyToTypebbd EndOfWord,

3 Step 1: determine β5(i)’s for state I, state a, or state b, or state c, or state F by setting them all to 1:

ReadyToTyp b b d EndOfWord e

State I 1

State a 1

State b 1

State c 1

State F 1

4 Step 2: determine β4(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively then visiting the last state and seeing EndOfWord: 

ReadyToType b b d EndOfWord

State I 1*a I I * bI EndOfWord + 1

1*a I a* ba EndOfWord +

1*a I b* bb EndOfWord +

1*a I c* bc EndOfWord +

1*a I F* bF EndOfWord

State a 1*a a I * bI EndOfWord + 1

1*a a a* ba EndOfWord +

1*a a b* bb EndOfWord +

1*a a c* bc EndOfWord +

1*a a F* bF EndOfWord

State b 1*a b I * bI EndOfWord + 1

1*a b a* ba EndOfWord +

1*a b b* bb EndOfWord +

1*a b c* bc EndOfWord +

1*a b F* bF EndOfWord

State c 1*a c I * bI EndOfWord + 1

1*a c a* ba EndOfWord +

1*a c b* bb EndOfWord +

1*a c c* bc EndOfWord +

1*a c F* bF EndOfWord

State F 1*a F I * bI EndOfWord + 1

1*a F a* ba EndOfWord +

1*a Fb* bb EndOfWord +

1*a Fc* bc EndOfWord +

1*a F F* bF EndOfWord

5 

ReadyToType b b d EndOfWord

State I 1*a I I * 0+ 1 1*a I a* 0 +  1*a I b* 0 + 1*a I c* 0 + 1*a * 1 ReadyToType b b I F d EndOfWord State I 0 1 State a 1*a a I * 0 + 1 1*a a a* 0 + State a 0.11 1 1*a a b* 0 + 1*a a c* 0 + State b 0.27 1 1*a a F* 1 State c 0.8 1 State b 1*a b I * 0 + 1 State F 1 1*a b a* 0 + 1 1*a b b* 0 + 1*a b c* 0 + 1*a b F* 1

State c 1*a c I * 0 + 1 1*a c a* 0 + 1*a c b* 0 + 1*a c c* 0 + 1*a c F* 1

State F 1*a F I * 0 + 1 1*a F a* 0 + 1*a Fb* 0 + 1*a Fc* 0 + 1*a F F* 1

6 Step 3: determine β3(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively then visiting the last 2 states and seeing d EndOfWord: 

ReadyToType b b d EndOfWor d

State I 0*a I I * bI d + 0 1

0.11*a I a* ba d +

0.27*a I b* bb d +

0.8*a I c* bc d +

1*a I F* bF d

State a 0*a a I * bI d + 0.11 1

0.11*a a a* ba d +

0.27*a a b* bb d +

0.8*a a c* bc d +

1*a a F* bF d

State b 0*a b I * bI d + 0.27 1

0.11*a b a* ba d +

0.27*a b b* bb d +

0.8*a b c* bc d +

1*a b F* bF d

State c 0*a c I * bI d + 0.8 1

0.11*a c a* ba d +

0.27*a c b* bb d +

0.8*a c c* bc d +

1*a c F* bF d

State F 0*a F I * bI d + 1 1

0.11*a F a* ba d +

0.27*a Fb* bb d +

0.8*a Fc* bc d +

1*a F F* bF d

…

Step 4: determine β2(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively then visiting the last 3 states and seeing bdEndOfWord:

…

Step 5: determine β1(i)’s for state I, state a, or state b, or state c, or state F as the probabilities of starting in state I, state a, or state b, or state c, or state F respectively

7 then visiting the last 4 states and seeing bbdEndOfWord:

….

*********************************************************************

Calculate γt(i)’s for t from 1 to 5 for all states using αt(i)’s and βt(i)’s.

Step 6:

Note that the table from Step 5 above records all the βt(i)’s for t from 1 to 5 for all states. Similarly, the final table you got from Homework #2B records all the αt(i)’s for t from 1 to 5 for all states, which is shown below for convenience:

ReadyT b b d EndOfWord oType State I 1 0 0 0 0 State a 0 0.0228 0.0001824 0.00000 0 14592 State b 0 0.261 0.0564192 0.00022 0 735488 State c 0 0.0056 0.00578776 0.00124 0 40672 State F 0 0 0 0 0.00105680008

For each row i and each column t, multiply all the corresponding αt(i) and βt(i) entries in these two tables together to get a table of new values and use it for Step 7 below.

Step 7: Use the table from Step 6 above to determine the most likely state for q2, the most likely state for q3, and the most likely state for q4. According to Equation 29 on page 263, for any give time t the the most likely underlying state Si is the one that maximizes γt(i) = Pr(qt=Si | O, λ), which is simply the state that maximizes 1 αt(i)*βt(i) among all the posisble states Si.

1 For each column t, the value on row i in the new table is αt(i)*βt(i) = ct*γt(i) where ct is the constant ∑i αt(i)*βt(i) since γt(i) = ( αt(i)* βt(i) ) / ∑i αt(i)*βt(i) accortding to equation 27 on P. 263 of Rabinar’s paper. Note that ct is a fixed constant for each column t. Consequently the state Si that maximizes αt(i)*βt(i) among all the posisble states Si is also the state that maximizes γt(i) . 8