Hidden Markov Model: Tutorial

Total Page:16

File Type:pdf, Size:1020Kb

Hidden Markov Model: Tutorial Hidden Markov Model: Tutorial Benyamin Ghojogh [email protected] Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada Fakhri Karray [email protected] Department of Electrical and Computer Engineering, Centre for Pattern Analysis and Machine Intelligence, University of Waterloo, Waterloo, ON, Canada Mark Crowley [email protected] Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada Abstract emitted observation symbols (Rabiner, 1989). HMM mod- This is a tutorial paper for Hidden Markov Model els the the probability density of a sequence of observed (HMM). First, we briefly review the background symbols (Roweis, 2003). on Expectation Maximization (EM), Lagrange This paper is a technical tutorial for evaluation, estima- multiplier, factor graph, the sum-product algo- tion, and training in HMM. We also explain some applica- rithm, the max-product algorithm, and belief tions for HMM. There exist many different applications for propagation by the forward-backward procedure. HMM. One of the most famous applications is in speech Then, we introduce probabilistic graphical mod- recognition (Rabiner, 1989; Gales et al., 2008) where an els including Markov random field and Bayesian HMM is trained for every word in the speech (Rabiner & network. Markov property and Discrete Time Juang, 1986). Another application of HMM is in action Markov Chain (DTMC) are also introduced. We, recognition (Yamato et al., 1992) because an action can then, explain likelihood estimation and EM in be seen as a sequence or times series of poses (Ghojogh HMM in technical details. We explain evalua- et al., 2017; Mokari et al., 2018). An interesting applica- tion in HMM where direct calculation and the tion of HMM, which is not trivial in the first glance, is in forward-backward belief propagation are both face recognition (Samaria, 1994). The face can be seen as explained. Afterwards, estimation in HMM is a sequence of organs being modeled by HMM (Nefian & covered where both the greedy approach and the Hayes, 1998). Usage of HMM in DNA analysis is another Viterbi algorithm are detailed. Then, we explain application of HMM (Eddy, 2004). how to train HMM using EM and the Baum- The remainder of paper is as follows. In Section2, we re- Welch algorithm. We also explain how to use view some necessary background for the paper. Section HMM in some applications such as speech and 3 introduces probabilistic graphical models. Expectation action recognition. maximization in HMM is explained in Section4. after- wards, evaluation, estimation, and training in HMM are explain in Sections5,6, and7, respectively. Some applica- 1. Introduction tions are introduced in more details in Section8. Finally, Assume we have a time series of hidden random variables. Section9 concludes the paper. We name the hidden random variables as states (Ghahra- mani, 2001). Every hidden random variable can generate 2. Background (emit) an observation symbol or output. Therefore, we have Some background required for better understanding of the a set of states and a set of observation symbols. A Hidden HMM algorithms is mentioned in this Section. Markov Model (HMM) has some hidden states and some 2.1. Expectation Maximization This section is taken from our previous paper (Ghojogh et al., 2019). Sometimes, the data are not fully observ- Hidden Markov Model: Tutorial 2 able. For example, the data are known to be whether zero or greater than zero. As an illustration, assume the data are collected for a particular disease but for convenience of the patients participated in the survey, the severity of the dis- ease is not recorded but only the existence or non-existence of the disease is reported. So, the data are not giving us complete information as Xi > 0 is not obvious whether is Xi = 2 or Xi = 1000. In this case, MLE cannot be directly applied as we do not have access to complete information and some data are missing. In this case, Expectation Maximization (EM) (Moon, 1996) is useful. The main idea of EM can be sum- marized in this short friendly conversation: – What shall we do? The data are missing! The log- likelihood is not known completely so MLE cannot be used. – Mmm, probably we can replace the missing data with Figure 1. (a) An example factor graph, and (b) its representation something... as a bipartite graph. – Aha! Let us replace it with its mean. – You are right! We can take the mean of log-likelihood over the possible values of the missing data. Then every- For solving this problem, we can introduce a new variable thing in the log-likelihood will be known, and then... η which is called “Lagrange multiplier”. Also, a new func- – And then we can do MLE! tion L(Θ1;:::; ΘK ; η), called “Lagrangian” is introduced: (obs) (miss) Assume D and D denote the observed data (Xi’s = 0 X > 0 in the above example) and the missing data ( i’s L(Θ1;:::; ΘK ; η) = Q(Θ1;:::; ΘK ) in the above example). The EM algorithm includes two (4) − ηP (Θ ;:::; Θ ) − c: main steps, i.e., E-step and M-step. 1 K Let Θ be the parameter which is the variable for likelihood Maximizing (or minimizing) this Lagrangian function estimation. In the E-step, the log-likelihood `(Θ), is taken gives us the solution to the optimization problem (Boyd & expectation with respect to the missing data D(miss) in or- Vandenberghe, 2004): der to have a mean estimation of it. Let Q(Θ) denote the expectation of the likelihood with respect to D(miss): set rΘ1;:::;ΘK ,ηL = 0; (5) Q(Θ) := (miss) (obs) [`(Θ)]: (1) ED jD ;Θ which gives us: Note that in the above expectation, the D(obs) and Θ are r L =set 0 =) r Q = ηr P; conditioned on, so they are treated as constants and not ran- Θ1;:::;ΘK Θ1;:::;ΘK Θ1;:::;ΘK dom variables. set rηL = 0 =) P (Θ1;:::; ΘK ) = c: In the M-step, the MLE approach is used where the log- likelihood is replaced with its expectation, i.e., Q(Θ); 2.3. Factor Graph, The Sum-Product and therefore: Max-Product Algorithms, and The Forward-Backward Procedure Θb = arg max Q(Θ): (2) Θ 2.3.1. FACTOR GRAPH These two steps are iteratively repeated until convergence A factor graph is a bipartite graph where the two partitions of the estimated parameters Θ. b of graph include functions (or factors) fj; 8j and the vari- ables x ; 8i (Kschischang et al., 2001; Loeliger, 2004). Ev- 2.2. Lagrange Multiplier i ery factor is a function of two or more variables. The vari- Suppose we have a multi-variate function Q(Θ1;:::; ΘK ) ables of a factor are connected by edges to it. Hence, we (called “objective function”) and we want to maximize (or have a bipartite graph. Figure1 shows an example factor minimize) it. However, this optimization is constrained and graphs where the factor and variable nodes are represented its constraint is equality P (Θ1;:::; ΘK ) = c where c is a by circles and squares, respectively. constant. So, the constrained optimization problem is: 2.3.2. THE SUM-PRODUCT ALGORITHM maximize Q(Θ1;:::; ΘK ); Θ1;:::;ΘK (3) Assume we have a factor graph with some variable nodes subject to P (Θ1;:::; ΘK ) = c: and factor nodes. For this, one of the nodes is considered as Hidden Markov Model: Tutorial 3 the root of tree. The result of algorithm does not depend on > 1 Initialize the messages to [1; 1;:::; 1] which node to be taken as the root (Bishop, 2006). Usually, 2 for time t from 1 to τ do the first variable or the last one are considered as the root. 3 Forward pass: Do sum-product algorithm Then, some messages from the leaf (or leaves) are sent up 4 Backward pass: Do sum-product algorithm on the tree to the root (Bishop, 2006). The messages can be 5 belief = forward message × backward initialized to any fixed constant values (see Eq. (8)). The message message from the variable node x to its neighbor factor i 6 if max(belief change) then node f is: j 7 break the loop Y mxi!fj = mf!xi ; (6) 8 Return beliefs f2N (xi)nfj Algorithm 1: The forward-backward algorithm where f 2 N (xi) n fj means all the neighbor factor nodes using the sum-product sub-algorithm to the variable xi except fj. Also, mxi!fj and mf!xi denote the message from the variable xi to the factor fj and the message from the factor f to the variable xi, re- 2.3.3. THE MAX-PRODUCT ALGORITHM spectively. The message from a the factor node fj to its The max-product algorithm (Weiss & Freeman, 2001; neighbor variable xi is: Pearl, 2014) is similar to the sum-product algorithm where X Y the summation operator is replaced by the maximum op- m = f m : (7) fj !xi j x!fj erator. In this algorithm, the messages from the variable values of x2N (fj )nxi x2N (fj )nxi nodes to the factor nodes and vice versa are: The Eq. (6) includes both sum and product. This is Y why this procedure is named the sum-product algorithm mxi!fj = mf!xi ; (10) (Kschischang et al., 2001; Bishop, 2006). In the factor f2N (xi)nfj Y graph, if the variable xi has degree one, the message is: mfj !xi = max fj mx!fj : (11) values of x2N (fj )nxi > k x2N (fj )nxi mxi!fj = [1; 1;:::; 1] 2 R ; (8) where k is the number of possible values that the variables 2.3.4. BELIEF PROPAGATION WITH can get.
Recommended publications
  • Conditioning and Markov Properties
    Conditioning and Markov properties Anders Rønn-Nielsen Ernst Hansen Department of Mathematical Sciences University of Copenhagen Department of Mathematical Sciences University of Copenhagen Universitetsparken 5 DK-2100 Copenhagen Copyright 2014 Anders Rønn-Nielsen & Ernst Hansen ISBN 978-87-7078-980-6 Contents Preface v 1 Conditional distributions 1 1.1 Markov kernels . 1 1.2 Integration of Markov kernels . 3 1.3 Properties for the integration measure . 6 1.4 Conditional distributions . 10 1.5 Existence of conditional distributions . 16 1.6 Exercises . 23 2 Conditional distributions: Transformations and moments 27 2.1 Transformations of conditional distributions . 27 2.2 Conditional moments . 35 2.3 Exercises . 41 3 Conditional independence 51 3.1 Conditional probabilities given a σ{algebra . 52 3.2 Conditionally independent events . 53 3.3 Conditionally independent σ-algebras . 55 3.4 Shifting information around . 59 3.5 Conditionally independent random variables . 61 3.6 Exercises . 68 4 Markov chains 71 4.1 The fundamental Markov property . 71 4.2 The strong Markov property . 84 4.3 Homogeneity . 90 4.4 An integration formula for a homogeneous Markov chain . 99 4.5 The Chapmann-Kolmogorov equations . 100 iv CONTENTS 4.6 Stationary distributions . 103 4.7 Exercises . 104 5 Ergodic theory for Markov chains on general state spaces 111 5.1 Convergence of transition probabilities . 113 5.2 Transition probabilities with densities . 115 5.3 Asymptotic stability . 117 5.4 Minorisation . 122 5.5 The drift criterion . 127 5.6 Exercises . 131 6 An introduction to Bayesian networks 141 6.1 Introduction . 141 6.2 Directed graphs .
    [Show full text]
  • Multivariate Poisson Hidden Markov Models for Analysis of Spatial Counts
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Saskatchewan's Research Archive MULTIVARIATE POISSON HIDDEN MARKOV MODELS FOR ANALYSIS OF SPATIAL COUNTS A Thesis Submitted to the Faculty of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Mathematics and Statistics University of Saskatchewan, Saskatoon, SK, Canada by Chandima Piyadharshani Karunanayake @Copyright Chandima Piyadharshani Karunanayake, June 2007. All rights Reserved. PERMISSION TO USE The author has agreed that the libraries of this University may provide the thesis freely available for inspection. Moreover, the author has agreed that permission for copying of the thesis in any manner, entirely or in part, for scholarly purposes may be granted by the Professor or Professors who supervised my thesis work or in their absence, by the Head of the Department of Mathematics and Statistics or the Dean of the College in which the thesis work was done. It is understood that any copying or publication or use of the thesis or parts thereof for finanancial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to the author and to the University of Saskatchewan in any scholarly use which may be made of any material in this thesis. Requests for permission to copy or to make other use of any material in the thesis should be addressed to: Head Department of Mathematics and Statistics University of Saskatchewan 106, Wiggins Road Saskatoon, Saskatchewan Canada, S7N 5E6 i ABSTRACT Multivariate count data are found in a variety of fields.
    [Show full text]
  • Regime Heteroskedasticity in Bitcoin: a Comparison of Markov Switching Models
    Munich Personal RePEc Archive Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Chappell, Daniel Birkbeck College, University of London 28 September 2018 Online at https://mpra.ub.uni-muenchen.de/90682/ MPRA Paper No. 90682, posted 24 Dec 2018 06:38 UTC Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Daniel R. Chappell Department of Economics, Mathematics and Statistics Birkbeck College, University of London [email protected] 28th September 2018 Abstract Markov regime-switching (MRS) models, also known as hidden Markov models (HMM), are used extensively to account for regime heteroskedasticity within the returns of financial assets. However, we believe this paper to be one of the first to apply such methodology to the time series of cryptocurrencies. In light of Moln´arand Thies (2018) demonstrating that the price data of Bitcoin contained seven distinct volatility regimes, we will fit a sample of Bitcoin returns with six m-state MRS estimations, with m ∈ {2,..., 7}. Our aim is to identify the optimal number of states for modelling the regime heteroskedasticity in the price data of Bitcoin. Goodness-of-fit will be judged using three information criteria, namely: Bayesian (BIC); Hannan-Quinn (HQ); and Akaike (AIC). We determined that the restricted 5-state model generated the optimal estima- tion for the sample. In addition, we found evidence of volatility clustering, volatility jumps and asymmetric volatility transitions whilst also inferring the persistence of shocks in the price data of Bitcoin. Keywords Bitcoin; Markov regime-switching; regime heteroskedasticity; volatility transitions. 1 2 List of Tables Table 1. Summary statistics for Bitcoin (23rd April 2014 to 31st May 2018) .
    [Show full text]
  • 12 : Conditional Random Fields 1 Hidden Markov Model
    10-708: Probabilistic Graphical Models 10-708, Spring 2014 12 : Conditional Random Fields Lecturer: Eric P. Xing Scribes: Qin Gao, Siheng Chen 1 Hidden Markov Model 1.1 General parametric form In hidden Markov model (HMM), we have three sets of parameters, j i transition probability matrix A : p(yt = 1jyt−1 = 1) = ai;j; initialprobabilities : p(y1) ∼ Multinomial(π1; π2; :::; πM ); i emission probabilities : p(xtjyt) ∼ Multinomial(bi;1; bi;2; :::; bi;K ): 1.2 Inference k k The inference can be done with forward algorithm which computes αt ≡ µt−1!t(k) = P (x1; :::; xt−1; xt; yt = 1) recursively by k k X i αt = p(xtjyt = 1) αt−1ai;k; (1) i k k and the backward algorithm which computes βt ≡ µt t+1(k) = P (xt+1; :::; xT jyt = 1) recursively by k X i i βt = ak;ip(xt+1jyt+1 = 1)βt+1: (2) i Another key quantity is the conditional probability of any hidden state given the entire sequence, which can be computed by the dot product of forward message and backward message by, i i i i X i;j γt = p(yt = 1jx1:T ) / αtβt = ξt ; (3) j where we define, i;j i j ξt = p(yt = 1; yt−1 = 1; x1:T ); i j / µt−1!t(yt = 1)µt t+1(yt+1 = 1)p(xt+1jyt+1)p(yt+1jyt); i j i = αtβt+1ai;jp(xt+1jyt+1 = 1): The implementation in Matlab can be vectorized by using, i Bt(i) = p(xtjyt = 1); j i A(i; j) = p(yt+1 = 1jyt = 1): 1 2 12 : Conditional Random Fields The relation of those quantities can be simply written in pseudocode as, T αt = (A αt−1): ∗ Bt; βt = A(βt+1: ∗ Bt+1); T ξt = (αt(βt+1: ∗ Bt+1) ): ∗ A; γt = αt: ∗ βt: 1.3 Learning 1.3.1 Supervised Learning The supervised learning is trivial if only we know the true state path.
    [Show full text]
  • Modeling Dependence in Data: Options Pricing and Random Walks
    UNIVERSITY OF CALIFORNIA, MERCED PH.D. DISSERTATION Modeling Dependence in Data: Options Pricing and Random Walks Nitesh Kumar A dissertation submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy in Applied Mathematics March, 2013 UNIVERSITY OF CALIFORNIA, MERCED Graduate Division This is to certify that I have examined a copy of a dissertation by Nitesh Kumar and found it satisfactory in all respects, and that any and all revisions required by the examining committee have been made. Faculty Advisor: Harish S. Bhat Committee Members: Arnold D. Kim Roummel F. Marcia Applied Mathematics Graduate Studies Chair: Boaz Ilan Arnold D. Kim Date Contents 1 Introduction 2 1.1 Brief Review of the Option Pricing Problem and Models . ......... 2 2 Markov Tree: Discrete Model 6 2.1 Introduction.................................... 6 2.2 Motivation...................................... 7 2.3 PastWork....................................... 8 2.4 Order Estimation: Methodology . ...... 9 2.5 OrderEstimation:Results. ..... 13 2.6 MarkovTreeModel:Theory . 14 2.6.1 NoArbitrage.................................. 17 2.6.2 Implementation Notes. 18 2.7 TreeModel:Results................................ 18 2.7.1 Comparison of Model and Market Prices. 19 2.7.2 Comparison of Volatilities. 20 2.8 Conclusion ...................................... 21 3 Markov Tree: Continuous Model 25 3.1 Introduction.................................... 25 3.2 Markov Tree Generation and Computational Tractability . ............. 26 3.2.1 Persistentrandomwalk. 27 3.2.2 Number of states in a tree of fixed depth . ..... 28 3.2.3 Markov tree probability mass function . ....... 28 3.3 Continuous Approximation of the Markov Tree . ........ 30 3.3.1 Recursion................................... 30 3.3.2 Exact solution in Fourier space .
    [Show full text]
  • Estimation of Viterbi Path in Bayesian Hidden Markov Models
    Estimation of Viterbi path in Bayesian hidden Markov models Jüri Lember1,∗ Dario Gasbarra2, Alexey Koloydenko3 and Kristi Kuljus1 May 14, 2019 1University of Tartu, Estonia; 2University of Helsinki, Finland; 3Royal Holloway, University of London, UK Abstract The article studies different methods for estimating the Viterbi path in the Bayesian framework. The Viterbi path is an estimate of the underlying state path in hidden Markov models (HMMs), which has a maximum joint posterior probability. Hence it is also called the maximum a posteriori (MAP) path. For an HMM with given parameters, the Viterbi path can be easily found with the Viterbi algorithm. In the Bayesian framework the Viterbi algorithm is not applicable and several iterative methods can be used instead. We introduce a new EM-type algorithm for finding the MAP path and compare it with various other methods for finding the MAP path, including the variational Bayes approach and MCMC methods. Examples with simulated data are used to compare the performance of the methods. The main focus is on non-stochastic iterative methods and our results show that the best of those methods work as well or better than the best MCMC methods. Our results demonstrate that when the primary goal is segmentation, then it is more reasonable to perform segmentation directly by considering the transition and emission parameters as nuisance parameters. Keywords: HMM, Bayes inference, MAP path, Viterbi algorithm, segmentation, EM, variational Bayes, simulated annealing 1 Introduction and preliminaries arXiv:1802.01630v2 [stat.CO] 11 May 2019 Hidden Markov models (HMMs) are widely used in application areas including speech recognition, com- putational linguistics, computational molecular biology, and many more.
    [Show full text]
  • A Predictive Model Using the Markov Property
    A Predictive Model using the Markov Property Robert A. Murphy, Ph.D. e-mail: [email protected] Abstract: Given a data set of numerical values which are sampled from some unknown probability distribution, we will show how to check if the data set exhibits the Markov property and we will show how to use the Markov property to predict future values from the same distribution, with probability 1. Keywords and phrases: markov property. 1. The Problem 1.1. Problem Statement Given a data set consisting of numerical values which are sampled from some unknown probability distribution, we want to show how to easily check if the data set exhibits the Markov property, which is stated as a sequence of dependent observations from a distribution such that each suc- cessive observation only depends upon the most recent previous one. In doing so, we will present a method for predicting bounds on future values from the same distribution, with probability 1. 1.2. Markov Property Let I R be any subset of the real numbers and let T I consist of times at which a numerical ⊆ ⊆ distribution of data is randomly sampled. Denote the random samples by a sequence of random variables Xt t∈T taking values in R. Fix t T and define T = t T : t>t to be the subset { } 0 ∈ 0 { ∈ 0} of times in T that are greater than t . Let t T . 0 1 ∈ 0 Definition 1 The sequence Xt t∈T is said to exhibit the Markov Property, if there exists a { } measureable function Yt1 such that Xt1 = Yt1 (Xt0 ) (1) for all sequential times t0,t1 T such that t1 T0.
    [Show full text]
  • Local Conditioning in Dawson–Watanabe Superprocesses
    The Annals of Probability 2013, Vol. 41, No. 1, 385–443 DOI: 10.1214/11-AOP702 c Institute of Mathematical Statistics, 2013 LOCAL CONDITIONING IN DAWSON–WATANABE SUPERPROCESSES By Olav Kallenberg Auburn University Consider a locally finite Dawson–Watanabe superprocess ξ =(ξt) in Rd with d ≥ 2. Our main results include some recursive formulas for the moment measures of ξ, with connections to the uniform Brown- ian tree, a Brownian snake representation of Palm measures, continu- ity properties of conditional moment densities, leading by duality to strongly continuous versions of the multivariate Palm distributions, and a local approximation of ξt by a stationary clusterη ˜ with nice continuity and scaling properties. This all leads up to an asymptotic description of the conditional distribution of ξt for a fixed t> 0, given d that ξt charges the ε-neighborhoods of some points x1,...,xn ∈ R . In the limit as ε → 0, the restrictions to those sets are conditionally in- dependent and given by the pseudo-random measures ξ˜ orη ˜, whereas the contribution to the exterior is given by the Palm distribution of ξt at x1,...,xn. Our proofs are based on the Cox cluster representa- tions of the historical process and involve some delicate estimates of moment densities. 1. Introduction. This paper may be regarded as a continuation of [19], where we considered some local properties of a Dawson–Watanabe super- process (henceforth referred to as a DW-process) at a fixed time t> 0. Recall that a DW-process ξ = (ξt) is a vaguely continuous, measure-valued diffu- d ξtf µvt sion process in R with Laplace functionals Eµe− = e− for suitable functions f 0, where v = (vt) is the unique solution to the evolution equa- 1 ≥ 2 tion v˙ = 2 ∆v v with initial condition v0 = f.
    [Show full text]
  • Ergodicity, Decisions, and Partial Information
    Ergodicity, Decisions, and Partial Information Ramon van Handel Abstract In the simplest sequential decision problem for an ergodic stochastic pro- cess X, at each time n a decision un is made as a function of past observations X0,...,Xn 1, and a loss l(un,Xn) is incurred. In this setting, it is known that one may choose− (under a mild integrability assumption) a decision strategy whose path- wise time-average loss is asymptotically smaller than that of any other strategy. The corresponding problem in the case of partial information proves to be much more delicate, however: if the process X is not observable, but decisions must be based on the observation of a different process Y, the existence of pathwise optimal strategies is not guaranteed. The aim of this paper is to exhibit connections between pathwise optimal strategies and notions from ergodic theory. The sequential decision problem is developed in the general setting of an ergodic dynamical system (Ω,B,P,T) with partial information Y B. The existence of pathwise optimal strategies grounded in ⊆ two basic properties: the conditional ergodic theory of the dynamical system, and the complexity of the loss function. When the loss function is not too complex, a gen- eral sufficient condition for the existence of pathwise optimal strategies is that the dynamical system is a conditional K-automorphism relative to the past observations n n 0 T Y. If the conditional ergodicity assumption is strengthened, the complexity assumption≥ can be weakened. Several examples demonstrate the interplay between complexity and ergodicity, which does not arise in the case of full information.
    [Show full text]
  • A Study of Hidden Markov Model
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2004 A Study of Hidden Markov Model Yang Liu University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Mathematics Commons Recommended Citation Liu, Yang, "A Study of Hidden Markov Model. " Master's Thesis, University of Tennessee, 2004. https://trace.tennessee.edu/utk_gradthes/2326 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled "A Study of Hidden Markov Model." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Master of Science, with a major in Mathematics. Jan Rosinski, Major Professor We have read this thesis and recommend its acceptance: Xia Chen, Balram Rajput Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled “A Study of Hidden Markov Model.” I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Mathematics.
    [Show full text]
  • Particle Gibbs for Infinite Hidden Markov Models
    Particle Gibbs for Infinite Hidden Markov Models Nilesh Tripuraneni* Shixiang Gu* Hong Ge University of Cambridge University of Cambridge University of Cambridge [email protected] MPI for Intelligent Systems [email protected] [email protected] Zoubin Ghahramani University of Cambridge [email protected] Abstract Infinite Hidden Markov Models (iHMM’s) are an attractive, nonparametric gener- alization of the classical Hidden Markov Model which can automatically infer the number of hidden states in the system. However, due to the infinite-dimensional nature of the transition dynamics, performing inference in the iHMM is difficult. In this paper, we present an infinite-state Particle Gibbs (PG) algorithm to re- sample state trajectories for the iHMM. The proposed algorithm uses an efficient proposal optimized for iHMMs and leverages ancestor sampling to improve the mixing of the standard PG algorithm. Our algorithm demonstrates significant con- vergence improvements on synthetic and real world data sets. 1 Introduction Hidden Markov Models (HMM’s) are among the most widely adopted latent-variable models used to model time-series datasets in the statistics and machine learning communities. They have also been successfully applied in a variety of domains including genomics, language, and finance where sequential data naturally arises [Rabiner, 1989; Bishop, 2006]. One possible disadvantage of the finite-state space HMM framework is that one must a-priori spec- ify the number of latent states K. Standard model selection techniques can be applied to the fi- nite state-space HMM but bear a high computational overhead since they require the repetitive trainingexploration of many HMM’s of different sizes.
    [Show full text]
  • Strong Memoryless Times and Rare Events in Markov Renewal Point
    The Annals of Probability 2004, Vol. 32, No. 3B, 2446–2462 DOI: 10.1214/009117904000000054 c Institute of Mathematical Statistics, 2004 STRONG MEMORYLESS TIMES AND RARE EVENTS IN MARKOV RENEWAL POINT PROCESSES By Torkel Erhardsson Royal Institute of Technology Let W be the number of points in (0,t] of a stationary finite- state Markov renewal point process. We derive a bound for the total variation distance between the distribution of W and a compound Poisson distribution. For any nonnegative random variable ζ, we con- struct a “strong memoryless time” ζˆ such that ζ − t is exponentially distributed conditional on {ζˆ ≤ t,ζ>t}, for each t. This is used to embed the Markov renewal point process into another such process whose state space contains a frequently observed state which repre- sents loss of memory in the original process. We then write W as the accumulated reward of an embedded renewal reward process, and use a compound Poisson approximation error bound for this quantity by Erhardsson. For a renewal process, the bound depends in a simple way on the first two moments of the interrenewal time distribution, and on two constants obtained from the Radon–Nikodym derivative of the interrenewal time distribution with respect to an exponential distribution. For a Poisson process, the bound is 0. 1. Introduction. In this paper, we are concerned with rare events in stationary finite-state Markov renewal point processes (MRPPs). An MRPP is a marked point process on R or Z (continuous or discrete time). Each point of an MRPP has an associated mark, or state.
    [Show full text]