MARKOV CHAIN and HIDDEN MARKOV MODEL Markov Chain

Total Page:16

File Type:pdf, Size:1020Kb

MARKOV CHAIN and HIDDEN MARKOV MODEL Markov Chain MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG [email protected] Markov chain and hidden Markov model are probably the simplest models which can be used to model sequential data, i.e. data samples which are not independent from each other. Markov Chain Let I be a countable set. Each i ∈ I is called a state and I is called the state-space. Without loss of generality we assume I = {1, 2,...}, and in most cases we have I a finite set and use the notation I = {1, 2,...,k} or I = {S1,S2,...,Sk}. λ is said to be a distribution on I if 0 ≤ λi < ∞ and i∈I λi =1. Definition 1.1. A matrix T ∈ Rk×k is stochastic if each row of T is a probability distribution.P One example of a stochastic matrix is 1 − α α T = β 1 − β with α, β ∈ [0, 1]. Figure 1 shows another example of a transition matrix on I = {S1,S2,S3} using finite state machine. Figure 1. Finite state machine for a Markov chain X0 → X1 → X2 →···→ Xn where the random variables Xi’s take values from I = {S1,S2,S3}. The numbers T (i, j)’s on the arrows are the transition probabilities such that Tij = P (Xt+1 = Sj|Xt = Si). Definition 1.2. We say that (Xn)n≥0 is a Markov chain with initial distribution λ and transition matrix T if (i) X0 has distribution λ; (ii) for n ≥ 0, conditional on Xn = i, Xn+1 has distribution (Tij : j ∈ I) and is independent of X0,...,Xn−1. By the Markov property we have (1) P (X0,...,Xn) = P (X0)P (X1|X0) · · · P (Xn|X0,...,Xn−1) n (2) = P (X0) P (Xt|Xt−1) t=1 Y1 englishMARKOV CHAIN AND HIDDEN MARKOV MODEL 2 which greatly simplifies the joint distribution of X0,...,Xn. Note also that in our definition the process is homogeneous, i.e. we have P (Xt = Sj |Xt−1 = Si)= Tij which does not depend on t. Assume that X takes values from X = {S1,...,Sk}, the behavior of the process can then be described k×k by a transition matrix T ∈ R where we have Tij = P (Xt = Sj |Xt−1 = Si). The set of parameters Θ for a Markov chain is Θ= {λ, T }. Graphical Model for Markov Chain. The Markov chain X0,...,Xn can be represented in terms of a graphical model, where each node represents a random variable, and the edges indicate conditional dependence structure. Graphical model is a very useful tool to visualize probabilistic models as well as to design efficient inference algorithms. Figure 2. Graphical Model for Markov Chain Random Walk on Graphs. The behavior of a Markov chain can also be described as a random walk on the graph shown in Figure 1. Initially a vertice is chosen according to the initial distribution λ and is denoted as SX0 ; at time t the current position is SXt and the next vertice is chosen with respect to the probability TXt,., the Xt-th row of the transition matrix T . Many properties of Markov chain can be identified by studying λ and T . For example, the distribution 1 of X0 is determined by λ, while the distribution of X1 is determined by λT , etc. Hidden Markov Model A hidden Markov model is an extension of a Markov chain which is able to capture the sequential relations among hidden variables. Formally we have Zt = (Xt, Yt) for t = 0, 1,...,n with Xt ∈ I and Yt ∈ O = {O1,...,Ol} such that the joint probability of Z0,...,Zn can be factorized as: n (3) P (Z0,...,Zn) = [P (X0)P (Y0|X0)] [P (Xt|Xt−1)P (Yt|Xt)] t=1 n Y n (4) = P (X0) P (Xt|Xt−1) P (Yt|Xt) . " t=1 # "t=0 # Y Y In other words, the X0,...,Xn is a Markov chain and Yt is independent of all other variables given Xt. The k×l set of parameters for a HMM Θ = {λ, T, Γ} where Γ ∈ R is defined as Γij = P (Yt = Oj |Xt = Si). If P (Yt|Xt) is assumed to be a Multinomial distribution, then the total number of parameters for a HMM is (k − 1) + k(k − 1) + k(l − 1). Figure 3 shows the graphical model for HMM, from which we can easily see the conditional independence structure of all variables (X0, Y0),..., (Xn, Yn). Figure 3. Graphical Model for Hidden Markov Model 1We assume λ ∈ R1×k to be a row vector. englishMARKOV CHAIN AND HIDDEN MARKOV MODEL 3 HMM is suitable for situations where the observed sequences Y0,...,Yn are influenced by a hidden Markov chain X0,...,Xn. For example, in speech recognition, we observe the phoneme sequences Y0,...,Yn. The sequence of Y0,...,Yn can be thought as noisy observations of the underlying words X0,...,Xn. In this case, we would like to infer the unknown words based on the observation sequence Y0,...,Yn. Three Fundamental Problems in HMM There are three basic problems of interest for the hidden Markov model: • Problem 1 : Given an observation sequence y0y1 ...yn and the model parameters Θ= {λ, T, Γ}, how to efficiently compute P (Y = y|Θ) = P (Y0 = y0,...,Yn = yn|Θ), the probability of the observation sequence given the model? • Problem 2 : Given an observation sequence y0y1 ...yn and the model parameters Θ= {λ, T, Γ}, how to find the optimal sequence of states x0x1 ...xn in the sense of maximizing P (X = x|Θ, Y = y)= P (X0 = x0,...,Xn = xn|Θ, Y0 = y0,...,Yn = yn)? • Problem 3 : How to estimate the model parameters Θ= {λ, T, Γ} by maximizing P (Y = y|Θ)? Forward-Backward Algorithm. The solution of problem 1 can be computed as P (Y = y|Θ) = P (X = x|Θ)P (Y = y|Θ, X = x) x X n n (5) = · · · P (X0 = x0) P (Xt = xt|Xt−1 = xt−1) P (Yt = yt|Xt = xt) x x xn " t=1 t=0 # X0 X1 X Y Y However, the total number of possible hidden sequences x is large and thus direct computation is very expensive. Intuitively, we want to move some of the sums inside the product to reduce the computation. The basic idea of the forward algorithm is as follows. First, the forward variable αt(i) is defined by (6) αt(i)= P (y0,...,yt,Xt = Si) is the probability of observing a partial sequence y0,...,yt and ending up in state Si. We have (7) αt+1(i) = P (y0,...,yt+1,Xt+1 = Si) (8) = P (Xt+1 = Si)P (y0,...,yt+1|Xt+1 = Si) (9) = P (Xt+1 = Si)P (yt+1|Xt+1 = Si)P (y0,...,yt|Xt+1 = Si) (10) = P (yt+1|Xt+1 = Si)P (y0,...,yt,Xt+1 = Si) (11) = P (yt+1|Xt+1 = Si) P (y0,...,yt,Xt = xt,Xt+1 = Si) xt X (12) = P (yt+1|Xt+1 = Si) P (Xt+1 = Si|Xt = xt)P (y0,...,yt,Xt = xt) xt X k (13) = Γi,yt+1 Tj,iαt(j). j=1 X Initially we have α0(i)= λiΓi,y0 and the final solution is k (14) P (Y = y|Θ) = αn(i). i=1 X The backward algorithm can be constructed similarly by defining the backward variable βt(i)= P (yt+1,...,yn|Xt = Si). englishMARKOV CHAIN AND HIDDEN MARKOV MODEL 4 Viterbi Algorithm. The solution of problem 2 can be written as (15) x∗ = argmax P (X = x|Y = y, Θ) x (16) = argmax P (X = x, Y = y, Θ). x A formal technique for finding the best state sequence x∗ based on dynamic programming is known as the Viterbi algorithm. Define the quantity (17) δt(i)= max P (x0,...,xt−1,Xt = Si,y0,...,yt|Θ), x0,...,xt−1 which is the highest probability along a single path at time t ending at state Si. We have (18) δt+1(j) = max {δt(i)P (Xt+1 = Sj |Xt = Si)P (Yt+1 = yt+1|Xt+1 = Sj )} i (19) = max δt(i)Tij Γj,yt . i +1 ∗ Initially we have δ0(i)= λiΓi,y0 and the final highest probability is P = maxSi∈I δn(i). To find the optimal ∗ sequence x we need to define some auxiliary variables ψt+1(j) which stores the optimal path: (20) ψt+1(j) = argmax δt(i)Tij Γj,yt+1 = argmax {δt(i)Tij } , i i ∗ ∗ for t = 1, 2,...,n. The final optimal path can be traced back by using xn = argmaxi δn(i) and xt = ∗ ψt+1(xt+1) for t = n − 1,..., 0. Baum-Welch Algorithm. Let Θ = (λ, T, Γ) represent all of the parameters of the HMM model. Given m observation sequences y1,..., ym, the parameters can be estimated by maximizing the (log)-likelihood: m (21) Θˆ = argmax p(Y = yl|Θ) Θ l=1 Ym (22) = argmax log p(Y = yl|Θ) Θ l=1 X m nl nl l (23) = argmax log · · · λx0 Txt,xt+1 Γxt,yt . Θ l=1 x0 xn t=1 t=0 X X Xl Y Y In principle, the above equation can be maximized using standard numerical optimization methods to find Θˆ . In practice, the above estimation is often solved by the well-known Baum-Welch algorithm, which is a special case of the Expectation Maximization (EM) algorithm. Details will be discussed after we introduce the EM algorithm. Learning with (x, y). There are often cases where we are able to know both the state sequences and the observation sequences.
Recommended publications
  • Conditioning and Markov Properties
    Conditioning and Markov properties Anders Rønn-Nielsen Ernst Hansen Department of Mathematical Sciences University of Copenhagen Department of Mathematical Sciences University of Copenhagen Universitetsparken 5 DK-2100 Copenhagen Copyright 2014 Anders Rønn-Nielsen & Ernst Hansen ISBN 978-87-7078-980-6 Contents Preface v 1 Conditional distributions 1 1.1 Markov kernels . 1 1.2 Integration of Markov kernels . 3 1.3 Properties for the integration measure . 6 1.4 Conditional distributions . 10 1.5 Existence of conditional distributions . 16 1.6 Exercises . 23 2 Conditional distributions: Transformations and moments 27 2.1 Transformations of conditional distributions . 27 2.2 Conditional moments . 35 2.3 Exercises . 41 3 Conditional independence 51 3.1 Conditional probabilities given a σ{algebra . 52 3.2 Conditionally independent events . 53 3.3 Conditionally independent σ-algebras . 55 3.4 Shifting information around . 59 3.5 Conditionally independent random variables . 61 3.6 Exercises . 68 4 Markov chains 71 4.1 The fundamental Markov property . 71 4.2 The strong Markov property . 84 4.3 Homogeneity . 90 4.4 An integration formula for a homogeneous Markov chain . 99 4.5 The Chapmann-Kolmogorov equations . 100 iv CONTENTS 4.6 Stationary distributions . 103 4.7 Exercises . 104 5 Ergodic theory for Markov chains on general state spaces 111 5.1 Convergence of transition probabilities . 113 5.2 Transition probabilities with densities . 115 5.3 Asymptotic stability . 117 5.4 Minorisation . 122 5.5 The drift criterion . 127 5.6 Exercises . 131 6 An introduction to Bayesian networks 141 6.1 Introduction . 141 6.2 Directed graphs .
    [Show full text]
  • Multivariate Poisson Hidden Markov Models for Analysis of Spatial Counts
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Saskatchewan's Research Archive MULTIVARIATE POISSON HIDDEN MARKOV MODELS FOR ANALYSIS OF SPATIAL COUNTS A Thesis Submitted to the Faculty of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Mathematics and Statistics University of Saskatchewan, Saskatoon, SK, Canada by Chandima Piyadharshani Karunanayake @Copyright Chandima Piyadharshani Karunanayake, June 2007. All rights Reserved. PERMISSION TO USE The author has agreed that the libraries of this University may provide the thesis freely available for inspection. Moreover, the author has agreed that permission for copying of the thesis in any manner, entirely or in part, for scholarly purposes may be granted by the Professor or Professors who supervised my thesis work or in their absence, by the Head of the Department of Mathematics and Statistics or the Dean of the College in which the thesis work was done. It is understood that any copying or publication or use of the thesis or parts thereof for finanancial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to the author and to the University of Saskatchewan in any scholarly use which may be made of any material in this thesis. Requests for permission to copy or to make other use of any material in the thesis should be addressed to: Head Department of Mathematics and Statistics University of Saskatchewan 106, Wiggins Road Saskatoon, Saskatchewan Canada, S7N 5E6 i ABSTRACT Multivariate count data are found in a variety of fields.
    [Show full text]
  • Regime Heteroskedasticity in Bitcoin: a Comparison of Markov Switching Models
    Munich Personal RePEc Archive Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Chappell, Daniel Birkbeck College, University of London 28 September 2018 Online at https://mpra.ub.uni-muenchen.de/90682/ MPRA Paper No. 90682, posted 24 Dec 2018 06:38 UTC Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Daniel R. Chappell Department of Economics, Mathematics and Statistics Birkbeck College, University of London [email protected] 28th September 2018 Abstract Markov regime-switching (MRS) models, also known as hidden Markov models (HMM), are used extensively to account for regime heteroskedasticity within the returns of financial assets. However, we believe this paper to be one of the first to apply such methodology to the time series of cryptocurrencies. In light of Moln´arand Thies (2018) demonstrating that the price data of Bitcoin contained seven distinct volatility regimes, we will fit a sample of Bitcoin returns with six m-state MRS estimations, with m ∈ {2,..., 7}. Our aim is to identify the optimal number of states for modelling the regime heteroskedasticity in the price data of Bitcoin. Goodness-of-fit will be judged using three information criteria, namely: Bayesian (BIC); Hannan-Quinn (HQ); and Akaike (AIC). We determined that the restricted 5-state model generated the optimal estima- tion for the sample. In addition, we found evidence of volatility clustering, volatility jumps and asymmetric volatility transitions whilst also inferring the persistence of shocks in the price data of Bitcoin. Keywords Bitcoin; Markov regime-switching; regime heteroskedasticity; volatility transitions. 1 2 List of Tables Table 1. Summary statistics for Bitcoin (23rd April 2014 to 31st May 2018) .
    [Show full text]
  • Superprocesses and Mckean-Vlasov Equations with Creation of Mass
    Sup erpro cesses and McKean-Vlasov equations with creation of mass L. Overb eck Department of Statistics, University of California, Berkeley, 367, Evans Hall Berkeley, CA 94720, y U.S.A. Abstract Weak solutions of McKean-Vlasov equations with creation of mass are given in terms of sup erpro cesses. The solutions can b e approxi- mated by a sequence of non-interacting sup erpro cesses or by the mean- eld of multityp e sup erpro cesses with mean- eld interaction. The lat- ter approximation is asso ciated with a propagation of chaos statement for weakly interacting multityp e sup erpro cesses. Running title: Sup erpro cesses and McKean-Vlasov equations . 1 Intro duction Sup erpro cesses are useful in solving nonlinear partial di erential equation of 1+ the typ e f = f , 2 0; 1], cf. [Dy]. Wenowchange the p oint of view and showhowtheyprovide sto chastic solutions of nonlinear partial di erential Supp orted byanFellowship of the Deutsche Forschungsgemeinschaft. y On leave from the Universitat Bonn, Institut fur Angewandte Mathematik, Wegelerstr. 6, 53115 Bonn, Germany. 1 equation of McKean-Vlasovtyp e, i.e. wewant to nd weak solutions of d d 2 X X @ @ @ + d x; + bx; : 1.1 = a x; t i t t t t t ij t @t @x @x @x i j i i=1 i;j =1 d Aweak solution = 2 C [0;T];MIR satis es s Z 2 t X X @ @ a f = f + f + d f + b f ds: s ij s t 0 i s s @x @x @x 0 i j i Equation 1.1 generalizes McKean-Vlasov equations of twodi erenttyp es.
    [Show full text]
  • 12 : Conditional Random Fields 1 Hidden Markov Model
    10-708: Probabilistic Graphical Models 10-708, Spring 2014 12 : Conditional Random Fields Lecturer: Eric P. Xing Scribes: Qin Gao, Siheng Chen 1 Hidden Markov Model 1.1 General parametric form In hidden Markov model (HMM), we have three sets of parameters, j i transition probability matrix A : p(yt = 1jyt−1 = 1) = ai;j; initialprobabilities : p(y1) ∼ Multinomial(π1; π2; :::; πM ); i emission probabilities : p(xtjyt) ∼ Multinomial(bi;1; bi;2; :::; bi;K ): 1.2 Inference k k The inference can be done with forward algorithm which computes αt ≡ µt−1!t(k) = P (x1; :::; xt−1; xt; yt = 1) recursively by k k X i αt = p(xtjyt = 1) αt−1ai;k; (1) i k k and the backward algorithm which computes βt ≡ µt t+1(k) = P (xt+1; :::; xT jyt = 1) recursively by k X i i βt = ak;ip(xt+1jyt+1 = 1)βt+1: (2) i Another key quantity is the conditional probability of any hidden state given the entire sequence, which can be computed by the dot product of forward message and backward message by, i i i i X i;j γt = p(yt = 1jx1:T ) / αtβt = ξt ; (3) j where we define, i;j i j ξt = p(yt = 1; yt−1 = 1; x1:T ); i j / µt−1!t(yt = 1)µt t+1(yt+1 = 1)p(xt+1jyt+1)p(yt+1jyt); i j i = αtβt+1ai;jp(xt+1jyt+1 = 1): The implementation in Matlab can be vectorized by using, i Bt(i) = p(xtjyt = 1); j i A(i; j) = p(yt+1 = 1jyt = 1): 1 2 12 : Conditional Random Fields The relation of those quantities can be simply written in pseudocode as, T αt = (A αt−1): ∗ Bt; βt = A(βt+1: ∗ Bt+1); T ξt = (αt(βt+1: ∗ Bt+1) ): ∗ A; γt = αt: ∗ βt: 1.3 Learning 1.3.1 Supervised Learning The supervised learning is trivial if only we know the true state path.
    [Show full text]
  • Modeling Dependence in Data: Options Pricing and Random Walks
    UNIVERSITY OF CALIFORNIA, MERCED PH.D. DISSERTATION Modeling Dependence in Data: Options Pricing and Random Walks Nitesh Kumar A dissertation submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy in Applied Mathematics March, 2013 UNIVERSITY OF CALIFORNIA, MERCED Graduate Division This is to certify that I have examined a copy of a dissertation by Nitesh Kumar and found it satisfactory in all respects, and that any and all revisions required by the examining committee have been made. Faculty Advisor: Harish S. Bhat Committee Members: Arnold D. Kim Roummel F. Marcia Applied Mathematics Graduate Studies Chair: Boaz Ilan Arnold D. Kim Date Contents 1 Introduction 2 1.1 Brief Review of the Option Pricing Problem and Models . ......... 2 2 Markov Tree: Discrete Model 6 2.1 Introduction.................................... 6 2.2 Motivation...................................... 7 2.3 PastWork....................................... 8 2.4 Order Estimation: Methodology . ...... 9 2.5 OrderEstimation:Results. ..... 13 2.6 MarkovTreeModel:Theory . 14 2.6.1 NoArbitrage.................................. 17 2.6.2 Implementation Notes. 18 2.7 TreeModel:Results................................ 18 2.7.1 Comparison of Model and Market Prices. 19 2.7.2 Comparison of Volatilities. 20 2.8 Conclusion ...................................... 21 3 Markov Tree: Continuous Model 25 3.1 Introduction.................................... 25 3.2 Markov Tree Generation and Computational Tractability . ............. 26 3.2.1 Persistentrandomwalk. 27 3.2.2 Number of states in a tree of fixed depth . ..... 28 3.2.3 Markov tree probability mass function . ....... 28 3.3 Continuous Approximation of the Markov Tree . ........ 30 3.3.1 Recursion................................... 30 3.3.2 Exact solution in Fourier space .
    [Show full text]
  • A Predictive Model Using the Markov Property
    A Predictive Model using the Markov Property Robert A. Murphy, Ph.D. e-mail: [email protected] Abstract: Given a data set of numerical values which are sampled from some unknown probability distribution, we will show how to check if the data set exhibits the Markov property and we will show how to use the Markov property to predict future values from the same distribution, with probability 1. Keywords and phrases: markov property. 1. The Problem 1.1. Problem Statement Given a data set consisting of numerical values which are sampled from some unknown probability distribution, we want to show how to easily check if the data set exhibits the Markov property, which is stated as a sequence of dependent observations from a distribution such that each suc- cessive observation only depends upon the most recent previous one. In doing so, we will present a method for predicting bounds on future values from the same distribution, with probability 1. 1.2. Markov Property Let I R be any subset of the real numbers and let T I consist of times at which a numerical ⊆ ⊆ distribution of data is randomly sampled. Denote the random samples by a sequence of random variables Xt t∈T taking values in R. Fix t T and define T = t T : t>t to be the subset { } 0 ∈ 0 { ∈ 0} of times in T that are greater than t . Let t T . 0 1 ∈ 0 Definition 1 The sequence Xt t∈T is said to exhibit the Markov Property, if there exists a { } measureable function Yt1 such that Xt1 = Yt1 (Xt0 ) (1) for all sequential times t0,t1 T such that t1 T0.
    [Show full text]
  • Local Conditioning in Dawson–Watanabe Superprocesses
    The Annals of Probability 2013, Vol. 41, No. 1, 385–443 DOI: 10.1214/11-AOP702 c Institute of Mathematical Statistics, 2013 LOCAL CONDITIONING IN DAWSON–WATANABE SUPERPROCESSES By Olav Kallenberg Auburn University Consider a locally finite Dawson–Watanabe superprocess ξ =(ξt) in Rd with d ≥ 2. Our main results include some recursive formulas for the moment measures of ξ, with connections to the uniform Brown- ian tree, a Brownian snake representation of Palm measures, continu- ity properties of conditional moment densities, leading by duality to strongly continuous versions of the multivariate Palm distributions, and a local approximation of ξt by a stationary clusterη ˜ with nice continuity and scaling properties. This all leads up to an asymptotic description of the conditional distribution of ξt for a fixed t> 0, given d that ξt charges the ε-neighborhoods of some points x1,...,xn ∈ R . In the limit as ε → 0, the restrictions to those sets are conditionally in- dependent and given by the pseudo-random measures ξ˜ orη ˜, whereas the contribution to the exterior is given by the Palm distribution of ξt at x1,...,xn. Our proofs are based on the Cox cluster representa- tions of the historical process and involve some delicate estimates of moment densities. 1. Introduction. This paper may be regarded as a continuation of [19], where we considered some local properties of a Dawson–Watanabe super- process (henceforth referred to as a DW-process) at a fixed time t> 0. Recall that a DW-process ξ = (ξt) is a vaguely continuous, measure-valued diffu- d ξtf µvt sion process in R with Laplace functionals Eµe− = e− for suitable functions f 0, where v = (vt) is the unique solution to the evolution equa- 1 ≥ 2 tion v˙ = 2 ∆v v with initial condition v0 = f.
    [Show full text]
  • Ergodicity, Decisions, and Partial Information
    Ergodicity, Decisions, and Partial Information Ramon van Handel Abstract In the simplest sequential decision problem for an ergodic stochastic pro- cess X, at each time n a decision un is made as a function of past observations X0,...,Xn 1, and a loss l(un,Xn) is incurred. In this setting, it is known that one may choose− (under a mild integrability assumption) a decision strategy whose path- wise time-average loss is asymptotically smaller than that of any other strategy. The corresponding problem in the case of partial information proves to be much more delicate, however: if the process X is not observable, but decisions must be based on the observation of a different process Y, the existence of pathwise optimal strategies is not guaranteed. The aim of this paper is to exhibit connections between pathwise optimal strategies and notions from ergodic theory. The sequential decision problem is developed in the general setting of an ergodic dynamical system (Ω,B,P,T) with partial information Y B. The existence of pathwise optimal strategies grounded in ⊆ two basic properties: the conditional ergodic theory of the dynamical system, and the complexity of the loss function. When the loss function is not too complex, a gen- eral sufficient condition for the existence of pathwise optimal strategies is that the dynamical system is a conditional K-automorphism relative to the past observations n n 0 T Y. If the conditional ergodicity assumption is strengthened, the complexity assumption≥ can be weakened. Several examples demonstrate the interplay between complexity and ergodicity, which does not arise in the case of full information.
    [Show full text]
  • A Class of Measure-Valued Markov Chains and Bayesian Nonparametrics
    Bernoulli 18(3), 2012, 1002–1030 DOI: 10.3150/11-BEJ356 A class of measure-valued Markov chains and Bayesian nonparametrics STEFANO FAVARO1, ALESSANDRA GUGLIELMI2 and STEPHEN G. WALKER3 1Universit`adi Torino and Collegio Carlo Alberto, Dipartimento di Statistica e Matematica Ap- plicata, Corso Unione Sovietica 218/bis, 10134 Torino, Italy. E-mail: [email protected] 2Politecnico di Milano, Dipartimento di Matematica, P.zza Leonardo da Vinci 32, 20133 Milano, Italy. E-mail: [email protected] 3University of Kent, Institute of Mathematics, Statistics and Actuarial Science, Canterbury CT27NZ, UK. E-mail: [email protected] Measure-valued Markov chains have raised interest in Bayesian nonparametrics since the seminal paper by (Math. Proc. Cambridge Philos. Soc. 105 (1989) 579–585) where a Markov chain having the law of the Dirichlet process as unique invariant measure has been introduced. In the present paper, we propose and investigate a new class of measure-valued Markov chains defined via exchangeable sequences of random variables. Asymptotic properties for this new class are derived and applications related to Bayesian nonparametric mixture modeling, and to a generalization of the Markov chain proposed by (Math. Proc. Cambridge Philos. Soc. 105 (1989) 579–585), are discussed. These results and their applications highlight once again the interplay between Bayesian nonparametrics and the theory of measure-valued Markov chains. Keywords: Bayesian nonparametrics; Dirichlet process; exchangeable sequences; linear functionals
    [Show full text]
  • A Study of Hidden Markov Model
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2004 A Study of Hidden Markov Model Yang Liu University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Mathematics Commons Recommended Citation Liu, Yang, "A Study of Hidden Markov Model. " Master's Thesis, University of Tennessee, 2004. https://trace.tennessee.edu/utk_gradthes/2326 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled "A Study of Hidden Markov Model." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Master of Science, with a major in Mathematics. Jan Rosinski, Major Professor We have read this thesis and recommend its acceptance: Xia Chen, Balram Rajput Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled “A Study of Hidden Markov Model.” I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Mathematics.
    [Show full text]
  • Markov Process Duality
    Markov Process Duality Jan M. Swart Luminy, October 23 and 24, 2013 Jan M. Swart Markov Process Duality Markov Chains S finite set. RS space of functions f : S R. ! S For probability kernel P = (P(x; y))x;y2S and f R define left and right multiplication as 2 X X Pf (x) := P(x; y)f (y) and fP(x) := f (y)P(y; x): y y (I do not distinguish row and column vectors.) Def Chain X = (Xk )k≥0 of S-valued r.v.'s is Markov chain with transition kernel P and state space S if S E f (Xk+1) (X0;:::; Xk ) = Pf (Xk ) a:s: (f R ) 2 P (X0;:::; Xk ) = (x0;:::; xk ) , = P[X0 = x0]P(x0; x1) P(xk−1; xk ): ··· µ µ µ Write P ; E for process with initial law µ = P [X0 ]. x δx x 2 · P := P with δx (y) := 1fx=yg. E similar. Jan M. Swart Markov Process Duality Markov Chains Set k µ k x µk := µP (x) = P [Xk = x] and fk := P f (x) = E [f (Xk )]: Then the forward and backward equations read µk+1 = µk P and fk+1 = Pfk : In particular π invariant law iff πP = π. Function h harmonic iff Ph = h h(Xk ) martingale. , Jan M. Swart Markov Process Duality Random mapping representation (Zk )k≥1 i.i.d. with common law ν, take values in (E; ). φ : S E S measurable E × ! P(x; y) = P[φ(x; Z1) = y]: Random mapping representation (E; ; ν; φ) always exists, highly non-unique.
    [Show full text]