Introduction to Hidden Markov Models

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Hidden Markov Models Introduction to Hidden Markov Models Alperen Degirmenci This document contains derivations and algorithms for im- Z Z Z Z plementing Hidden Markov Models. The content presented 1 2 t N here is a collection of my notes and personal insights from two seminal papers on HMMs by Rabiner in 1989 [2] and Ghahramani in 2001 [1], and also from Kevin Murphy’s book X1 X2 Xt XN [3]. This is an excerpt from my project report for the MIT 6.867 Machine Learning class taught in Fall 2014. Fig. 1. A Bayesian network representing a first-order HMM. The hidden states are shaded in gray. I. HIDDEN MARKOV MODELS (HMMS) HMMs have been widely used in many applications, such 1) Number of states in the model, K: This is the number as speech recognition, activity recognition from video, gene of states that the underlying hidden Markov process has. finding, gesture tracking. In this section, we will explain what The states often have some relation to the phenomena being HMMs are, how they are used for machine learning, their modeled. For example, if a HMM is being used for gesture advantages and disadvantages, and how we implemented our recognition, each state may be a different gesture, or a part own HMM algorithm. of the gesture. States are represented as integers 1;:::;K. A. Definition We will encode the state Zt at time t as a K × 1 vector of A hidden Markov model is a tool for representing prob- binary numbers, where the only non-zero element is the k-th ability distributions over sequences of observations [1]. In element (i.e. Ztk = 1), corresponding to state k 2 K at time this model, an observation Xt at time t is produced by a t. While this may seem contrived, it will later on help us in stochastic process, but the state Zt of this process cannot be our computations. (Note that [2] uses N instead of K). directly observed, i.e. it is hidden [2]. This hidden process 2) Number of distinct observations, Ω: Observations are is assumed to satisfy the Markov property, where state Zt at represented as integers 1;:::; Ω. We will encode the observa- time t depends only on the previous state, Zt−1 at time t−1. tion Xt at time t as a Ω×1 vector of binary numbers, where This is, in fact, called the first-order Markov model. The nth- the only non-zero element is the l-th element (i.e. Xtl = 1), order Markov model depends on the n previous states. Fig. 1 corresponding to state l 2 Ω at time t. While this may seem shows a Bayesian network representing the first-order HMM, contrived, it will later on help us in our computations. (Note where the hidden states are shaded in gray. that [2] uses M instead of Ω, and [1] uses D. We decided We should note that even though we talk about “time” to use Ω since this agrees with the lecture notes). to indicate that observations occur at discrete “time steps”, 3) State transition model, A: Also called the state transi- “time” could also refer to locations within a sequence [3]. tion probability distribution [2] or the transition matrix [3], The joint distribution of a sequence of states and observa- this is a K × K matrix whose elements Aij describe the tions for the first-order HMM can be written as, probability of transitioning from state Zt−1;i to Zt;j in one N time step where i; j 2 f1;:::;Kg. This can be written as, Y P (Z1:N ;X1:N ) = P (Z1)P (X1jZ1) P (ZtjZt−1)P (XtjZt) Aij = P (Zt;j = 1jZt−1;i = 1): (3) t=2 (1) P Each row of A sums to 1, j Aij = 1, and therefore it is where the notation Z1:N is used as a shorthand for called a stochastic matrix. If any state can reach any other Z1;:::;ZN . Notice that Eq. 1 can be also written as, state in a single step (fully-connected), then Aij > 0 for N N Y Y P (X1:N ;Z1:N ) = P (Z1) P (ZtjZt−1) P (XtjZt) t=2 t=1 1-α 1-β A A A (2) 11 22 33 A A which is same as the expression given in the lecture notes. α 12 23 There are five elements that characterize a hidden Markov 1 2 1 2 3 model: β A21 A32 The author is with the School of Engineering and Applied (a) (b) Sciences at Harvard University, Cambridge, MA 02138 USA. ([email protected]). This document is an excerpt Fig. 2. A state transition diagram for (a) a 2-state, and (b) a 3-state ergodic from a project report for the MIT 6.867 Machine Learning class taught in Markov chain. For a chain to be ergodic, any state should be reachable from Fall 2014. any other state in a finite amount of time. 1 c 2014 Alperen Degirmenci μΦ →Z μZ →Φ μΦ →Z Zt-1 ,Zt t t Zt ,Zt+1 Zt ,Zt+1 t+1 all i; j; otherwise A will have some zero-valued elements. Φ Φ Fig. 2 shows two state transition diagrams for a 2-state and Zt-1,Zt Zt Zt,Zt+1 Zt+1 μZ →Φ μΦ →Z μZ →Φ 3-state first-order Markov chain. For these diagrams, the state t Zt-1 ,Zt Zt ,Zt+1 t t+1 Zt-1 ,Zt μZ →Φ μΦ →Z transition models are, t Xt ,Zt Xt ,Zt t ΦX ,Z 2A A 0 3 t t 1 − α α 11 12 A = ;A = A A A : (4) μ μ (a) (b) 21 22 23 ΦX ,Z →X X →ΦX ,Z β 1 − β 4 5 t t t t t t 0 A32 A33 Xt The conditional probability can be written as K K Fig. 3. Factor graph for a slice of the HMM at time t. Y Y Zt−1;iZt;j P (ZtjZt−1) = Aij : (5) i=1 j=1 • Problem 2: Given observations X1;:::;XN and the Taking the logarithm, we can write this as model λ, how do we find the “correct” hidden state K K Z ;:::;Z X X sequence 1 N that best “explains” the observa- logP (ZtjZt−1) = Zt−1;iZt;j log Aij (6) tions? This corresponds to finding the most probable i=1 j=1 sequence of hidden states from the lecture notes, and > = Zt log (A)Zt−1: (7) can be solved using the Viterbi algorithm. A related problem is calculating the probability of being in state 4) Observation model, B: Also called the emission prob- Ztk given the observations, P (Zt = kjX1:N ), which abilities, B is a Ω × K matrix whose elements Bkj describe can be calculated using the forward-backward algo- the probability of making observation Xt;k given state Zt;j. rithm. This can be written as, • Problem 3: How do we adjust the model parameters λ = (A; B; π) to maximize P (X jλ)? This corresponds Bkj = P (Xt = kjZt = j): (8) 1:N to the learning problem presented in the lecture notes, The conditional probability can be written as and can be solved using the Expectation-Maximization K Ω (EM) algorithm (in the case of HMMs, this is called the Y Y Zt;j Xt;k Baum-Welch algorithm). P (XtjZt) = Bkj : (9) j=1 k=1 C. The Forward-Backward Algorithm Taking the logarithm, we can write this as K Ω The forward-backward algorithm is a dynamic program- X X ming algorithm that makes use of message passing (be- logP (XtjZt) = Zt;jXt;k log Bkj (10) lief propagation). It allows us to compute the filtered and j=1 k=1 smoothed marginals, which can be then used to perform = X> log (B)Z : (11) t t inference, MAP estimation, sequence classification, anomaly 5) Initial state distribution, π: This is a K × 1 vector detection, and model-based clustering. We will follow the of probabilities πi = P (Z1i=1). The conditional probability derivation presented in Murphy [3]. can be written as, 1) The Forward Algorithm: In this part, we compute K the filtered marginals, P (ZtjX1:T ) using the predict-update Y Z1i cycle. The prediction step calculates the one-step-ahead P (Z1jπ) = πi : (12) i=1 predictive density, Given these five parameters presented above, an HMM P (Zt =jjX1:t−1) = ··· can be completely specified. In literature, this often gets K X abbreviated as = P (Zt = jjZt−1 = i)P (Zt−1 = ijX1:t−1) λ = (A; B; π): (13) i=1 (14) B. Three Problems of Interest which acts as the new prior for time t. In the update state, In [2] Rabiner states that for the HMM to be useful in the observed data from time t is absorbed using Bayes rule: real-world applications, the following three problems must be solved: αt(j) , P (Zt = jjX1:t) • Problem 1: Given observations X ;:::;X and a 1 N = P (Zt = jjXt;X1:t−1) model λ = (A; B; π), how do we efficiently compute P (XtjZt = j; X1:t−1 )P (Zt = jjX1:t−1) P (X jλ), the probability of the observations given = 1:N P P (X jZ = j; X)P (Z = jjX ) the model? This is a part of the exact inference problem j t t 1:t−1 t 1:t−1 presented in the lecture notes, and can be solved using 1 = P (XtjZt = j)P (Zt = jjX1:t−1) (15) forward filtering.
Recommended publications
  • Conditioning and Markov Properties
    Conditioning and Markov properties Anders Rønn-Nielsen Ernst Hansen Department of Mathematical Sciences University of Copenhagen Department of Mathematical Sciences University of Copenhagen Universitetsparken 5 DK-2100 Copenhagen Copyright 2014 Anders Rønn-Nielsen & Ernst Hansen ISBN 978-87-7078-980-6 Contents Preface v 1 Conditional distributions 1 1.1 Markov kernels . 1 1.2 Integration of Markov kernels . 3 1.3 Properties for the integration measure . 6 1.4 Conditional distributions . 10 1.5 Existence of conditional distributions . 16 1.6 Exercises . 23 2 Conditional distributions: Transformations and moments 27 2.1 Transformations of conditional distributions . 27 2.2 Conditional moments . 35 2.3 Exercises . 41 3 Conditional independence 51 3.1 Conditional probabilities given a σ{algebra . 52 3.2 Conditionally independent events . 53 3.3 Conditionally independent σ-algebras . 55 3.4 Shifting information around . 59 3.5 Conditionally independent random variables . 61 3.6 Exercises . 68 4 Markov chains 71 4.1 The fundamental Markov property . 71 4.2 The strong Markov property . 84 4.3 Homogeneity . 90 4.4 An integration formula for a homogeneous Markov chain . 99 4.5 The Chapmann-Kolmogorov equations . 100 iv CONTENTS 4.6 Stationary distributions . 103 4.7 Exercises . 104 5 Ergodic theory for Markov chains on general state spaces 111 5.1 Convergence of transition probabilities . 113 5.2 Transition probabilities with densities . 115 5.3 Asymptotic stability . 117 5.4 Minorisation . 122 5.5 The drift criterion . 127 5.6 Exercises . 131 6 An introduction to Bayesian networks 141 6.1 Introduction . 141 6.2 Directed graphs .
    [Show full text]
  • Multivariate Poisson Hidden Markov Models for Analysis of Spatial Counts
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Saskatchewan's Research Archive MULTIVARIATE POISSON HIDDEN MARKOV MODELS FOR ANALYSIS OF SPATIAL COUNTS A Thesis Submitted to the Faculty of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Mathematics and Statistics University of Saskatchewan, Saskatoon, SK, Canada by Chandima Piyadharshani Karunanayake @Copyright Chandima Piyadharshani Karunanayake, June 2007. All rights Reserved. PERMISSION TO USE The author has agreed that the libraries of this University may provide the thesis freely available for inspection. Moreover, the author has agreed that permission for copying of the thesis in any manner, entirely or in part, for scholarly purposes may be granted by the Professor or Professors who supervised my thesis work or in their absence, by the Head of the Department of Mathematics and Statistics or the Dean of the College in which the thesis work was done. It is understood that any copying or publication or use of the thesis or parts thereof for finanancial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to the author and to the University of Saskatchewan in any scholarly use which may be made of any material in this thesis. Requests for permission to copy or to make other use of any material in the thesis should be addressed to: Head Department of Mathematics and Statistics University of Saskatchewan 106, Wiggins Road Saskatoon, Saskatchewan Canada, S7N 5E6 i ABSTRACT Multivariate count data are found in a variety of fields.
    [Show full text]
  • Regime Heteroskedasticity in Bitcoin: a Comparison of Markov Switching Models
    Munich Personal RePEc Archive Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Chappell, Daniel Birkbeck College, University of London 28 September 2018 Online at https://mpra.ub.uni-muenchen.de/90682/ MPRA Paper No. 90682, posted 24 Dec 2018 06:38 UTC Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Daniel R. Chappell Department of Economics, Mathematics and Statistics Birkbeck College, University of London [email protected] 28th September 2018 Abstract Markov regime-switching (MRS) models, also known as hidden Markov models (HMM), are used extensively to account for regime heteroskedasticity within the returns of financial assets. However, we believe this paper to be one of the first to apply such methodology to the time series of cryptocurrencies. In light of Moln´arand Thies (2018) demonstrating that the price data of Bitcoin contained seven distinct volatility regimes, we will fit a sample of Bitcoin returns with six m-state MRS estimations, with m ∈ {2,..., 7}. Our aim is to identify the optimal number of states for modelling the regime heteroskedasticity in the price data of Bitcoin. Goodness-of-fit will be judged using three information criteria, namely: Bayesian (BIC); Hannan-Quinn (HQ); and Akaike (AIC). We determined that the restricted 5-state model generated the optimal estima- tion for the sample. In addition, we found evidence of volatility clustering, volatility jumps and asymmetric volatility transitions whilst also inferring the persistence of shocks in the price data of Bitcoin. Keywords Bitcoin; Markov regime-switching; regime heteroskedasticity; volatility transitions. 1 2 List of Tables Table 1. Summary statistics for Bitcoin (23rd April 2014 to 31st May 2018) .
    [Show full text]
  • Poisson Processes Stochastic Processes
    Poisson Processes Stochastic Processes UC3M Feb. 2012 Exponential random variables A random variable T has exponential distribution with rate λ > 0 if its probability density function can been written as −λt f (t) = λe 1(0;+1)(t) We summarize the above by T ∼ exp(λ): The cumulative distribution function of a exponential random variable is −λt F (t) = P(T ≤ t) = 1 − e 1(0;+1)(t) And the tail, expectation and variance are P(T > t) = e−λt ; E[T ] = λ−1; and Var(T ) = E[T ] = λ−2 The exponential random variable has the lack of memory property P(T > t + sjT > t) = P(T > s) Exponencial races In what follows, T1;:::; Tn are independent r.v., with Ti ∼ exp(λi ). P1: min(T1;:::; Tn) ∼ exp(λ1 + ··· + λn) . P2 λ1 P(T1 < T2) = λ1 + λ2 P3: λi P(Ti = min(T1;:::; Tn)) = λ1 + ··· + λn P4: If λi = λ and Sn = T1 + ··· + Tn ∼ Γ(n; λ). That is, Sn has probability density function (λs)n−1 f (s) = λe−λs 1 (s) Sn (n − 1)! (0;+1) The Poisson Process as a renewal process Let T1; T2;::: be a sequence of i.i.d. nonnegative r.v. (interarrival times). Define the arrival times Sn = T1 + ··· + Tn if n ≥ 1 and S0 = 0: The process N(t) = maxfn : Sn ≤ tg; is called Renewal Process. If the common distribution of the times is the exponential distribution with rate λ then process is called Poisson Process of with rate λ. Lemma. N(t) ∼ Poisson(λt) and N(t + s) − N(s); t ≥ 0; is a Poisson process independent of N(s); t ≥ 0 The Poisson Process as a L´evy Process A stochastic process fX (t); t ≥ 0g is a L´evyProcess if it verifies the following properties: 1.
    [Show full text]
  • 12 : Conditional Random Fields 1 Hidden Markov Model
    10-708: Probabilistic Graphical Models 10-708, Spring 2014 12 : Conditional Random Fields Lecturer: Eric P. Xing Scribes: Qin Gao, Siheng Chen 1 Hidden Markov Model 1.1 General parametric form In hidden Markov model (HMM), we have three sets of parameters, j i transition probability matrix A : p(yt = 1jyt−1 = 1) = ai;j; initialprobabilities : p(y1) ∼ Multinomial(π1; π2; :::; πM ); i emission probabilities : p(xtjyt) ∼ Multinomial(bi;1; bi;2; :::; bi;K ): 1.2 Inference k k The inference can be done with forward algorithm which computes αt ≡ µt−1!t(k) = P (x1; :::; xt−1; xt; yt = 1) recursively by k k X i αt = p(xtjyt = 1) αt−1ai;k; (1) i k k and the backward algorithm which computes βt ≡ µt t+1(k) = P (xt+1; :::; xT jyt = 1) recursively by k X i i βt = ak;ip(xt+1jyt+1 = 1)βt+1: (2) i Another key quantity is the conditional probability of any hidden state given the entire sequence, which can be computed by the dot product of forward message and backward message by, i i i i X i;j γt = p(yt = 1jx1:T ) / αtβt = ξt ; (3) j where we define, i;j i j ξt = p(yt = 1; yt−1 = 1; x1:T ); i j / µt−1!t(yt = 1)µt t+1(yt+1 = 1)p(xt+1jyt+1)p(yt+1jyt); i j i = αtβt+1ai;jp(xt+1jyt+1 = 1): The implementation in Matlab can be vectorized by using, i Bt(i) = p(xtjyt = 1); j i A(i; j) = p(yt+1 = 1jyt = 1): 1 2 12 : Conditional Random Fields The relation of those quantities can be simply written in pseudocode as, T αt = (A αt−1): ∗ Bt; βt = A(βt+1: ∗ Bt+1); T ξt = (αt(βt+1: ∗ Bt+1) ): ∗ A; γt = αt: ∗ βt: 1.3 Learning 1.3.1 Supervised Learning The supervised learning is trivial if only we know the true state path.
    [Show full text]
  • Modeling Dependence in Data: Options Pricing and Random Walks
    UNIVERSITY OF CALIFORNIA, MERCED PH.D. DISSERTATION Modeling Dependence in Data: Options Pricing and Random Walks Nitesh Kumar A dissertation submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy in Applied Mathematics March, 2013 UNIVERSITY OF CALIFORNIA, MERCED Graduate Division This is to certify that I have examined a copy of a dissertation by Nitesh Kumar and found it satisfactory in all respects, and that any and all revisions required by the examining committee have been made. Faculty Advisor: Harish S. Bhat Committee Members: Arnold D. Kim Roummel F. Marcia Applied Mathematics Graduate Studies Chair: Boaz Ilan Arnold D. Kim Date Contents 1 Introduction 2 1.1 Brief Review of the Option Pricing Problem and Models . ......... 2 2 Markov Tree: Discrete Model 6 2.1 Introduction.................................... 6 2.2 Motivation...................................... 7 2.3 PastWork....................................... 8 2.4 Order Estimation: Methodology . ...... 9 2.5 OrderEstimation:Results. ..... 13 2.6 MarkovTreeModel:Theory . 14 2.6.1 NoArbitrage.................................. 17 2.6.2 Implementation Notes. 18 2.7 TreeModel:Results................................ 18 2.7.1 Comparison of Model and Market Prices. 19 2.7.2 Comparison of Volatilities. 20 2.8 Conclusion ...................................... 21 3 Markov Tree: Continuous Model 25 3.1 Introduction.................................... 25 3.2 Markov Tree Generation and Computational Tractability . ............. 26 3.2.1 Persistentrandomwalk. 27 3.2.2 Number of states in a tree of fixed depth . ..... 28 3.2.3 Markov tree probability mass function . ....... 28 3.3 Continuous Approximation of the Markov Tree . ........ 30 3.3.1 Recursion................................... 30 3.3.2 Exact solution in Fourier space .
    [Show full text]
  • A Predictive Model Using the Markov Property
    A Predictive Model using the Markov Property Robert A. Murphy, Ph.D. e-mail: [email protected] Abstract: Given a data set of numerical values which are sampled from some unknown probability distribution, we will show how to check if the data set exhibits the Markov property and we will show how to use the Markov property to predict future values from the same distribution, with probability 1. Keywords and phrases: markov property. 1. The Problem 1.1. Problem Statement Given a data set consisting of numerical values which are sampled from some unknown probability distribution, we want to show how to easily check if the data set exhibits the Markov property, which is stated as a sequence of dependent observations from a distribution such that each suc- cessive observation only depends upon the most recent previous one. In doing so, we will present a method for predicting bounds on future values from the same distribution, with probability 1. 1.2. Markov Property Let I R be any subset of the real numbers and let T I consist of times at which a numerical ⊆ ⊆ distribution of data is randomly sampled. Denote the random samples by a sequence of random variables Xt t∈T taking values in R. Fix t T and define T = t T : t>t to be the subset { } 0 ∈ 0 { ∈ 0} of times in T that are greater than t . Let t T . 0 1 ∈ 0 Definition 1 The sequence Xt t∈T is said to exhibit the Markov Property, if there exists a { } measureable function Yt1 such that Xt1 = Yt1 (Xt0 ) (1) for all sequential times t0,t1 T such that t1 T0.
    [Show full text]
  • A Stochastic Processes and Martingales
    A Stochastic Processes and Martingales A.1 Stochastic Processes Let I be either IINorIR+.Astochastic process on I with state space E is a family of E-valued random variables X = {Xt : t ∈ I}. We only consider examples where E is a Polish space. Suppose for the moment that I =IR+. A stochastic process is called cadlag if its paths t → Xt are right-continuous (a.s.) and its left limits exist at all points. In this book we assume that every stochastic process is cadlag. We say a process is continuous if its paths are continuous. The above conditions are meant to hold with probability 1 and not to hold pathwise. A.2 Filtration and Stopping Times The information available at time t is expressed by a σ-subalgebra Ft ⊂F.An {F ∈ } increasing family of σ-algebras t : t I is called a filtration.IfI =IR+, F F F we call a filtration right-continuous if t+ := s>t s = t. If not stated otherwise, we assume that all filtrations in this book are right-continuous. In many books it is also assumed that the filtration is complete, i.e., F0 contains all IIP-null sets. We do not assume this here because we want to be able to change the measure in Chapter 4. Because the changed measure and IIP will be singular, it would not be possible to extend the new measure to the whole σ-algebra F. A stochastic process X is called Ft-adapted if Xt is Ft-measurable for all t. If it is clear which filtration is used, we just call the process adapted.The {F X } natural filtration t is the smallest right-continuous filtration such that X is adapted.
    [Show full text]
  • Local Conditioning in Dawson–Watanabe Superprocesses
    The Annals of Probability 2013, Vol. 41, No. 1, 385–443 DOI: 10.1214/11-AOP702 c Institute of Mathematical Statistics, 2013 LOCAL CONDITIONING IN DAWSON–WATANABE SUPERPROCESSES By Olav Kallenberg Auburn University Consider a locally finite Dawson–Watanabe superprocess ξ =(ξt) in Rd with d ≥ 2. Our main results include some recursive formulas for the moment measures of ξ, with connections to the uniform Brown- ian tree, a Brownian snake representation of Palm measures, continu- ity properties of conditional moment densities, leading by duality to strongly continuous versions of the multivariate Palm distributions, and a local approximation of ξt by a stationary clusterη ˜ with nice continuity and scaling properties. This all leads up to an asymptotic description of the conditional distribution of ξt for a fixed t> 0, given d that ξt charges the ε-neighborhoods of some points x1,...,xn ∈ R . In the limit as ε → 0, the restrictions to those sets are conditionally in- dependent and given by the pseudo-random measures ξ˜ orη ˜, whereas the contribution to the exterior is given by the Palm distribution of ξt at x1,...,xn. Our proofs are based on the Cox cluster representa- tions of the historical process and involve some delicate estimates of moment densities. 1. Introduction. This paper may be regarded as a continuation of [19], where we considered some local properties of a Dawson–Watanabe super- process (henceforth referred to as a DW-process) at a fixed time t> 0. Recall that a DW-process ξ = (ξt) is a vaguely continuous, measure-valued diffu- d ξtf µvt sion process in R with Laplace functionals Eµe− = e− for suitable functions f 0, where v = (vt) is the unique solution to the evolution equa- 1 ≥ 2 tion v˙ = 2 ∆v v with initial condition v0 = f.
    [Show full text]
  • Ergodicity, Decisions, and Partial Information
    Ergodicity, Decisions, and Partial Information Ramon van Handel Abstract In the simplest sequential decision problem for an ergodic stochastic pro- cess X, at each time n a decision un is made as a function of past observations X0,...,Xn 1, and a loss l(un,Xn) is incurred. In this setting, it is known that one may choose− (under a mild integrability assumption) a decision strategy whose path- wise time-average loss is asymptotically smaller than that of any other strategy. The corresponding problem in the case of partial information proves to be much more delicate, however: if the process X is not observable, but decisions must be based on the observation of a different process Y, the existence of pathwise optimal strategies is not guaranteed. The aim of this paper is to exhibit connections between pathwise optimal strategies and notions from ergodic theory. The sequential decision problem is developed in the general setting of an ergodic dynamical system (Ω,B,P,T) with partial information Y B. The existence of pathwise optimal strategies grounded in ⊆ two basic properties: the conditional ergodic theory of the dynamical system, and the complexity of the loss function. When the loss function is not too complex, a gen- eral sufficient condition for the existence of pathwise optimal strategies is that the dynamical system is a conditional K-automorphism relative to the past observations n n 0 T Y. If the conditional ergodicity assumption is strengthened, the complexity assumption≥ can be weakened. Several examples demonstrate the interplay between complexity and ergodicity, which does not arise in the case of full information.
    [Show full text]
  • A Study of Hidden Markov Model
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2004 A Study of Hidden Markov Model Yang Liu University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Mathematics Commons Recommended Citation Liu, Yang, "A Study of Hidden Markov Model. " Master's Thesis, University of Tennessee, 2004. https://trace.tennessee.edu/utk_gradthes/2326 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled "A Study of Hidden Markov Model." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Master of Science, with a major in Mathematics. Jan Rosinski, Major Professor We have read this thesis and recommend its acceptance: Xia Chen, Balram Rajput Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled “A Study of Hidden Markov Model.” I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Mathematics.
    [Show full text]
  • Introduction to Stochastic Processes - Lecture Notes (With 33 Illustrations)
    Introduction to Stochastic Processes - Lecture Notes (with 33 illustrations) Gordan Žitković Department of Mathematics The University of Texas at Austin Contents 1 Probability review 4 1.1 Random variables . 4 1.2 Countable sets . 5 1.3 Discrete random variables . 5 1.4 Expectation . 7 1.5 Events and probability . 8 1.6 Dependence and independence . 9 1.7 Conditional probability . 10 1.8 Examples . 12 2 Mathematica in 15 min 15 2.1 Basic Syntax . 15 2.2 Numerical Approximation . 16 2.3 Expression Manipulation . 16 2.4 Lists and Functions . 17 2.5 Linear Algebra . 19 2.6 Predefined Constants . 20 2.7 Calculus . 20 2.8 Solving Equations . 22 2.9 Graphics . 22 2.10 Probability Distributions and Simulation . 23 2.11 Help Commands . 24 2.12 Common Mistakes . 25 3 Stochastic Processes 26 3.1 The canonical probability space . 27 3.2 Constructing the Random Walk . 28 3.3 Simulation . 29 3.3.1 Random number generation . 29 3.3.2 Simulation of Random Variables . 30 3.4 Monte Carlo Integration . 33 4 The Simple Random Walk 35 4.1 Construction . 35 4.2 The maximum . 36 1 CONTENTS 5 Generating functions 40 5.1 Definition and first properties . 40 5.2 Convolution and moments . 42 5.3 Random sums and Wald’s identity . 44 6 Random walks - advanced methods 48 6.1 Stopping times . 48 6.2 Wald’s identity II . 50 6.3 The distribution of the first hitting time T1 .......................... 52 6.3.1 A recursive formula . 52 6.3.2 Generating-function approach .
    [Show full text]