<<

THREE TECHNIQUES FOR STATE ORDER

ESTIMATION OF HIDDEN MARKOV MODELS

Predrag Pucar and Mille Millnert

Department of Electrical Engineering

Linkoping University

S Linkoping

Sweden Emailpredragisyliuse

White sequence Segmented white noise sequence

3 2.5

ABSTRACT

2

2

In this contribution three examples of techniques that

1

can be used for state order estimation of hidden

0 1.5

Markov mo dels are given The metho ds are also exem

using real laser range data and the computa plied -1

1

Two

tional burden of the three metho ds is discussed -2

techniques Maximum Description Length and Maxi

-3 0.5

0 20 40 60 80 100 120 140 0 20 40 60 80 100 120

mum a Posteriori Estimate are shown to b e very sim

ilar under certain circumstances The third technique

Figure Left White noise sequence with

Predictive Least Squares is novel in this context

Right Resulting segmentation of the white noise

using two states

INTRODUCTION

A phenomenon that often o ccurs in the segmentation

mulation and a motivation for lo oking into this kind

pro cess is the spurious jumping in the state estimate

of issues is given Then the three algortihms are pre

of a hidden Markovmodel HMM when more states

sented and nally a example including data from a

than needed are used The reason for that is that the

laser range radar system is presented

algorithms use all available degrees of freedom ie

the algorithms actually segment the signalimage into

PROBLEM FORMULATION

M segments if the signal mo dels underlying Markov

We will rst intro duce the concept of hidden Markov

chain has M states There is obviously a need for es

models HMM

timation of the number of states b efore applying the

segmentation routine

Denition An HMM is a doubly

Example Assume that a white noise sequence de

with one underlying process that is not observable but

pictedinFigisgiven The natural choice for the number

can only be observedthrough another set of output pro

of states to model the white noise sequence is sincethere

cesses that produce the observed data The underlying

are no jumps in the signal If we however choose a two

hidden process is a

state Markov chain and apply the BaumWelch algorithm to

segment the signal into two segments the result is the one

The HMMs are b eing used extensively in a variet yof

found in Fig

areas The standard issue is howto estimate the pa

rameters in the mo del pro ducing the output and how

to estimate the unobserved Markov chain sequence The pap er is organized as follows First a problem for

parameters in the chosen mo del is denoted by d and Thereisavast literature on the ab ovementioned topic

M is the numb er of states the following expression is see for example An often circumvented problem

obtained is how to decide on how many states to use in the as

sumed hidden Markovchain In practice when one is

N

X

log N

confronted with eg a segmentation problem that





V log e d M M

 i

N N

kind of information is seldom known However it is

i

crucial for the result of the applied algorithm

The expression ab ove has to b e calculated for dierent

THREE ALGORITHMS

M and the state order estimate is the M whichgives

minimal V

In this section the three prop osed algorithms are pre

sented The complete derivation of the expressions will

Predictive Least Squares

b e left out in this pap er For a complete version of the

The predictive least squares PLS idea originates from

pap er see The hidden Markov sequence will b e de

t

We start with a basic regression problem As

meaning the sequence from time instant noted by z

t



N N

sume two sets of observations y and x i where

t to t The subscript is suppressed when t



i M are given The usual pro cedure when ap

and the sup erscript is suppressed when t t The



t

plying least squares is to intro duce a mo del class and

observed pro cess is denoted by y

t



to pick a predictor for y The predictor is denoted by

t

t

Minimum Description Length

y y The ideal predictor should then minimize

t

N

t 

Assume a sequence y is given and we know it has

E y y y

t t

b een generated by a nite state source but wedonot

If the exp ectation in is replaced with a sample

knowthenumb er of states M In the sequel M will

the following estimate is obtained

denote the true value of the mo del state order M is

the auxiliary variable denoting the mo del state order

N

X

which is tested by the algorithm and M the estimate of



arg min y y

t t

the mo del state order Usually a criterion is calculated

N

t

for dierent values of M and then an M is chosen as

an estimate The desired result is of course that M

Note that the estimate of is based on the whole data

M

set The PLS approachis to change the predictor to

t

Assume that for every M wehave a co de Acode

y y ieatevery time instant the parameter

M

t t

can b e describ ed as a mapping from the source symbols

vector minimizing the criterion is calculated using

to a series of bits The mapping takes into accountthe

past data only The parameter vector estimate will

distribution of the symbols All the information the

vary in time since the number of data on which it

oretic techniques b oil down to nding an appropriate

is based grows If then all the prediction errors are

N

co de forcodingthe sequence y calculating the

accumulated we the following criterion is obtained

M

co de length for dierent co des and then picking the M

N

X

for which the co de gives the shortest co de length

M

t 

y y y V M

N

t t t PLS

when co ding y We have chosen the minimum de

t

scription length MDL as co ding principle Shortly

the MDL principle can b e summarized as cho osing the

where M is the numb er of regressors x included

mo del that minimizes the numb er of bits it takes to de

scrib e the given sequence Note that not only the data In the HMM case wersthave to calculate the one step

are enco ded using the mo del but also the mo del it predictor and then go through the PLS pro cedure for

self ie the realvalued parameters in the mo del How the HMM case We also p oint out diculties in prov

do es this apply in the HMM case The overall num ing consistency although in simulations the metho d

ber of bits will be the sum of the number of bits for shows good results The pro cedure is to use the EM

describing the data and the mo del If the n algorithm see for example to estimate the state se umber of

t

quence and the probabilities t P z jY where

i t

In step four and ve at time instant t the EM al z is the state of the Markovchain at time instant t

t

t

gorithms are run on the data up to t and y is pre and Y is the data sequence up to and including time

t

 t

dicted according to The squared errors M instant t The prediction y E fy jy g and can

t t

t



y y M are summed up and nally when the b e calculated as follows

t t

row is completely pro cessed wecho ose the number of X

t t t

P y jy P y jy z iP z ijy

t t t t

states to equal the numb er of states of the mo del which

i

minimized the PLS criterion

X X

t

P y jz j y z i

t t t

Howtocho ose the numb er of states to test is an intri

j i

cate question In our simulations wehavechosen an ad

t t

P z j jz i y P z ijy

t t t

hoc solution we simply start from one state and then

X X

t

P y jz j y q t

increase the numb er of states byoneuntil the PLS cri

t t ij i

j i

terion stops to decrease and starts to increase The

usual b ehavior of the PLS criterion for dierent M is

where q is the transition probabilities for the hid

ij

a rapid drop when we increase M and then when M

den Markovchain Taking exp ectation of the variable

passes the right numb er of states ie M M the

t

fy jy g results in

t

PLS criterion starts to increase slowly As the estimate

X X

wesimplycho ose the value of M if the PLS criterion

t

y E fy jz j y gq t

t t t ij i

starts to increase for M The drawback of this

j i

pro cedure is that some a priori knowledge ab out the

numb er of states is needed to avoid numerous testings

where the exp ectation usually is straightforward to cal

In our application we know that usually the number

culate

of states are one or two It is very unlikely that we

Since we do not know anything ab out the behavior

will need more than four states This knowledge of

of the PLScriterion as a function of M we have to

course inuences our testing strategy start with one

adopt an ad hoc rule when actually searching for the

state and then increase the number by one General

minimum of the criterion The pro cedure of calculating

advice is dicult to give

the PLScriterion for dierent mo del state orders M is

Example In this example the PLSmethod for model

rather computationally costly For every M anewEM

state order estimation is applied to a synthetic signal We

algorithm has to b e run

rst generate a sequence of states from a threestate Markov

chain Noise is then addedaccording to the fol lowing rela

The pro cedure when using the EM algorithm and PLS

tion

is the following

y z e

t t t

where e is zeromean and Gaussian white noise with vari

t

ance

Decide which mo del state orders that are to be

tested

If then the previously described PLS procedureisappliedin

state order to estimate the number of states of the Markov

Decide what searchstrategyto use when testing

chain we obtain the fol lowing accumulated error shown in

dierentnumb er of states

Fig

Run the EM algorithms in accordance to the de

The behavior of the PLScriterion in Example is

cided strategy testing the dierent state orders

typical The quick drop when increasing the mo del

state order towards the true one After the true mo del

Sum the honest prediction errors

state order is passed the trend is not so obvious De

Chose the state order that gives the low est accu

p ending on the realization and if short data sets are

mulated cost

used the mo del state order can b e overestimated

Consistency of the Estimate One imp ortantques In step two with the word strategy we mean the

tion regarding the estimate is of course the conver order in which the dierent EM algorithms for dierent

gence of the estimate when the numb er of data tends mo del state orders should b e tested

PLS - accumulated error

In the expression ab ove di denotes the number of

160 parameters of the output pro cess mo del corresp onding

to dierent Markov chain states T denotes the set i

140

of time instants where the Markovchain is in state i

and N i denotes the numb er of elements in T The i

120

result is striking in its similarity with Rissanens MDL

If wehave prior knowledge of the transition

100 criterion

matrix Qormayb e have it as a design parameter we

80

can calculate the aposteriori probability for the states

in a straightforward way 60

1 2 3 4 5 6 7 8

EXAMPLE

Number of used states

In this section the MDL approach is tested using an

Figure Resulting accumulated error obtained when

y a laser range radar The pixel val image obtained b

using PLS and an increasing number of states of the

ues are the distance to the terrain measured by a laser

hidden Markov chain The minimum is obtained for

The ob jective with the segmentation algorithm in this

three states which is in accordance with the problem

case the EM algorithm is to nd ob jects in the im

statement

age that dier from the background in other words a

rst step towards ob ject recognition The test image

to innity This question proves to be very dicult

that is used here shows a shield in the middle of the

to answer and we have not arrived at a satisfactory

image and in the upp er right corner there are some

treatment of that matter

bushes The waytointerpret the segmented image is

to lo ok at connected areas with the same segment la

Maximum a Posteriori Estimate

bel and then do further investigation by taking the

estimated parameters of the observed mo del variance

The last approach is based on using a bank of Kalman

of the residuals etc into account The problem we

lters when estimating the mo del parameters of the

are stressing here is that usually the user has to pick

mo del behind the observ ed data and the state se

the number of hidden states of the Markov chain for

quence The Kalman lters also give the distribution

each row or x one for all rows since the image is

of the estimates so for example the distribution of the

segmented row by row Here we used the ab ove pro

data assuming no underlying Markovchain is given by

p osed MDL loss function Similar results are obtained

the following expression

by using the MAP loss function The estimation rou



N

Y

tine however is dierent in that case In Fig the



V N N 

N

P y det S e

t

original laser image and the resulting segmentation is

i

shown Note that in the area in front of the shield

where S is given by the following equations

only one hidden state is used and that way spurious

t

jumping as in Fig is avoided

T

S P

t t t t



T

P P P S P

t t t t t

CONCLUSIONS

t t

V is the normalized sum of prediction errors and is

N

Three dierent algorithms for state order estimation

aknown vector

The per of hidden Markov mo dels are compared

formance and computational complexityofeachalgo

When a Markovchain with M states is intro duced and

rithm is investigated In the pap er it is shown under

after some calculations the following expression for the

what circumstances MDL and the MAP estimate coin

likeliho o d of the data is obtained

cide

T T

M M

X X

V

dilogN i

N i

N N

log P y jz M

N N N

iT iT

  Test image #1 without drop-outs

3 1100

1000 2.5 900

800 2

700

600 1.5

500 1 400 80 80 60 80 60 80 60 60 40 40 40 40 20 20 20 20 0 0

0 0

Figure Left The raw data obtained from the laser

system The zaxis is the distance to the terrain

Right Resulting segmentation of the laser range radar

image using EM and the MDL strategy

References

BH Juang and LR Rabiner Mixture Autore

gressive Hidden Markov Mo dels for Sp eech Sig

nals IEEE Trans on ASSP De

cemb er

P Pucar Segmentation of laser range radar im

ages using hidden markov eld mo dels Linkoping

studies in science and technology thesis no liu

teklic isbn Departmentof

Electrical Engineering Linkoping UniversitySwe

den

J Rissanen Mo deling by Shortest Data Descrip

tion Automatica

J Rissanen A Predictive LeastSquares Princi

ple IMA Journal of Math Control Informa

tion

RG Whiting Quality Monitoring in Manufactur

ing Systems A Partial ly Observed Markov Chain

Approach PhD thesis University of Toronto Canada