Three Techniques for State Order Estimation of Hidden

THREE TECHNIQUES FOR STATE ORDER ESTIMATION OF HIDDEN MARKOV MODELS Predrag Pucar and Mille Millnert Department of Electrical Engineering Linkoping University S Linkoping Sweden Emailpredragisyliuse White noise sequence Segmented white noise sequence 3 2.5 ABSTRACT 2 2 In this contribution three examples of techniques that 1 can be used for state order estimation of hidden 0 1.5 Markov mo dels are given The metho ds are also exem using real laser range data and the computa plied -1 1 Two tional burden of the three metho ds is discussed -2 techniques Maximum Description Length and Maxi -3 0.5 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 mum a Posteriori Estimate are shown to b e very sim ilar under certain circumstances The third technique Figure Left White noise sequence with variance Predictive Least Squares is novel in this context Right Resulting segmentation of the white noise signal using two states INTRODUCTION A phenomenon that often o ccurs in the segmentation mulation and a motivation for lo oking into this kind pro cess is the spurious jumping in the state estimate of issues is given Then the three algortihms are pre of a hidden Markovmodel HMM when more states sented and nally a example including data from a than needed are used The reason for that is that the laser range radar system is presented algorithms use all available degrees of freedom ie the algorithms actually segment the signalimage into PROBLEM FORMULATION M segments if the signal mo dels underlying Markov We will rst intro duce the concept of hidden Markov chain has M states There is obviously a need for es models HMM timation of the number of states b efore applying the segmentation routine Denition An HMM is a doubly stochastic process Example Assume that a white noise sequence de with one underlying process that is not observable but pictedinFigisgiven The natural choice for the number can only be observedthrough another set of output pro of states to model the white noise sequence is sincethere cesses that produce the observed data The underlying are no jumps in the signal If we however choose a two hidden process is a Markov chain state Markov chain and apply the BaumWelch algorithm to segment the signal into two segments the result is the one The HMMs are b eing used extensively in a variet yof found in Fig areas The standard issue is howto estimate the pa rameters in the mo del pro ducing the output and how to estimate the unobserved Markov chain sequence The pap er is organized as follows First a problem for parameters in the chosen mo del is denoted by d and Thereisavast literature on the ab ovementioned topic M is the numb er of states the following expression is see for example An often circumvented problem obtained is how to decide on how many states to use in the as sumed hidden Markovchain In practice when one is N X log N confronted with eg a segmentation problem that V log e d M M i N N kind of information is seldom known However it is i crucial for the result of the applied algorithm The expression ab ove has to b e calculated for dierent THREE ALGORITHMS M and the state order estimate is the M whichgives minimal V In this section the three prop osed algorithms are pre sented The complete derivation of the expressions will Predictive Least Squares b e left out in this pap er For a complete version of the The predictive least squares PLS idea originates from pap er see The hidden Markov sequence will b e de t We start with a basic regression problem As meaning the sequence from time instant noted by z t N N sume two sets of observations y and x i where t to t The subscript is suppressed when t i M are given The usual pro cedure when ap and the sup erscript is suppressed when t t The t plying least squares is to intro duce a mo del class and observed pro cess is denoted by y t to pick a predictor for y The predictor is denoted by t t Minimum Description Length y y The ideal predictor should then minimize t N t Assume a sequence y is given and we know it has E y y y t t b een generated by a nite state source but wedonot If the exp ectation in is replaced with a sample mean knowthenumb er of states M In the sequel M will the following estimate is obtained denote the true value of the mo del state order M is the auxiliary variable denoting the mo del state order N X which is tested by the algorithm and M the estimate of arg min y y t t the mo del state order Usually a criterion is calculated N t for dierent values of M and then an M is chosen as an estimate The desired result is of course that M Note that the estimate of is based on the whole data M set The PLS approachis to change the predictor to t Assume that for every M wehave a co de Acode y y ieatevery time instant the parameter M t t can b e describ ed as a mapping from the source symbols vector minimizing the criterion is calculated using to a series of bits The mapping takes into accountthe past data only The parameter vector estimate will distribution of the symbols All the information the vary in time since the number of data on which it oretic techniques b oil down to nding an appropriate is based grows If then all the prediction errors are N co de forcodingthe sequence y calculating the accumulated we the following criterion is obtained M co de length for dierent co des and then picking the M N X for which the co de gives the shortest co de length M t y y y V M N t t t PLS when co ding y We have chosen the minimum de t scription length MDL as co ding principle Shortly the MDL principle can b e summarized as cho osing the where M is the numb er of regressors x included mo del that minimizes the numb er of bits it takes to de scrib e the given sequence Note that not only the data In the HMM case wersthave to calculate the one step are enco ded using the mo del but also the mo del it predictor and then go through the PLS pro cedure for self ie the realvalued parameters in the mo del How the HMM case We also p oint out diculties in prov do es this apply in the HMM case The overall num ing consistency although in simulations the metho d ber of bits will be the sum of the number of bits for shows good results The pro cedure is to use the EM describing the data and the mo del If the n algorithm see for example to estimate the state se umber of t quence and the probabilities t P z jY where i t In step four and ve at time instant t the EM al z is the state of the Markovchain at time instant t t t gorithms are run on the data up to t and y is pre and Y is the data sequence up to and including time t t dicted according to The squared errors M instant t The prediction y E fy jy g and can t t t y y M are summed up and nally when the b e calculated as follows t t row is completely pro cessed wecho ose the number of X t t t P y jy P y jy z iP z ijy t t t t states to equal the numb er of states of the mo del which i minimized the PLS criterion X X t P y jz j y z i t t t Howtocho ose the numb er of states to test is an intri j i cate question In our simulations wehavechosen an ad t t P z j jz i y P z ijy t t t hoc solution we simply start from one state and then X X t P y jz j y q t increase the numb er of states byoneuntil the PLS cri t t ij i j i terion stops to decrease and starts to increase The usual b ehavior of the PLS criterion for dierent M is where q is the transition probabilities for the hid ij a rapid drop when we increase M and then when M den Markovchain Taking exp ectation of the variable passes the right numb er of states ie M M the t fy jy g results in t PLS criterion starts to increase slowly As the estimate X X wesimplycho ose the value of M if the PLS criterion t y E fy jz j y gq t t t t ij i starts to increase for M The drawback of this j i pro cedure is that some a priori knowledge ab out the numb er of states is needed to avoid numerous testings where the exp ectation usually is straightforward to cal In our application we know that usually the number culate of states are one or two It is very unlikely that we Since we do not know anything ab out the behavior will need more than four states This knowledge of of the PLScriterion as a function of M we have to course inuences our testing strategy start with one adopt an ad hoc rule when actually searching for the state and then increase the number by one General minimum of the criterion The pro cedure of calculating advice is dicult to give the PLScriterion for dierent mo del state orders M is Example In this example the PLSmethod for model rather computationally costly For every M anewEM state order estimation is applied to a synthetic signal We algorithm has to b e run rst generate a sequence of states from a threestate Markov chain Noise is then addedaccording to the fol lowing rela The pro cedure when using the EM algorithm and PLS tion is the following y z e t t t where e is zeromean and Gaussian white noise with vari t ance Decide which mo del state orders that are to be tested If then the previously described PLS procedureisappliedin state order to estimate the number of states of the Markov Decide what searchstrategyto use when testing chain we obtain the fol lowing accumulated error shown in dierentnumb er of states Fig Run the EM algorithms in accordance to the de The behavior of the PLScriterion in Example is cided strategy testing the dierent state orders typical The quick drop when increasing the mo del state order towards the true one After the true mo del

Three Techniques for State Order Estimation of Hidden

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support