Hidden Gibbs Models: Theory and Applications
– DRAFT –
Evgeny Verbitskiy
Mathematical Institute, Leiden University, The Netherlands
Johann Bernoulli Institute, Groningen University, The Netherlands
November 21, 2015 2 Contents
1 Introduction1 1.1 Hidden Models / Hidden Processes...... 3 1.1.1 Equivalence of “random” and “deterministic” settings. 4 1.1.2 Deterministic chaos...... 4 1.2 Hidden Gibbs Models...... 6 1.2.1 How realistic is the Gibbs assumption?...... 8
I Theory9
2 Gibbs states 11 2.1 Gibbs-Bolztmann Ansatz...... 11 2.2 Gibbs states for Lattice Systems...... 12 2.2.1 Translation Invariant Gibbs states...... 13 2.2.2 Regularity of Gibbs states...... 15 2.3 Gibbs and other regular stochastic processes...... 16 2.3.1 Markov Chains...... 17 2.3.2 k-step Markov chains...... 18 2.3.3 Variable Length Markov Chains...... 19 2.3.4 Countable Mixtures of Markov Chains (CMMC’s)... 20 2.3.5 g-measures (chains with complete connections).... 20 2.4 Gibbs measures in Dynamical Systems...... 27 2.4.1 Equilibrium states...... 28
3 Functions of Markov processes 31 3.1 Markov Chains (recap)...... 31 3.1.1 Regular measures on Subshifts of Finite Type...... 33 3.2 Function of Markov Processes...... 33 3.2.1 Functions of Markov Chains in Dynamical systems.. 35 3.3 Hidden Markov Models...... 36 3.4 Functions of Markov Chains from the Thermodynamic point of view...... 38
3 4 CONTENTS
4 Hidden Gibbs Processes 45 4.1 Functions of Gibbs Processes...... 45 4.2 Yayama...... 45 4.3 Review: renormalization of g-measures...... 47 4.4 Cluster Expansions...... 47 4.5 Cluster expansion...... 49
5 Hidden Gibbs Fields 53 5.1 Renormalization Transformations in Statistical Mechanics.. 54 5.2 Examples of pathologies under renormalization...... 55 5.3 General Properties of Renormalised Gibbs Fields...... 60 5.4 Dobrushin’s reconstruction program...... 60 5.4.1 Generalized Gibbs states/“Classification” of singularities 61 5.4.2 Variational Principles...... 64 5.5 Criteria for Preservation of Gibbs property under renormal- ization...... 70
II Applications 73
6 Hidden Markov Models 75 6.1 Three basic problems for HMM’s...... 75 6.1.1 The Evaluation Problem...... 77 6.1.2 The Decoding or State Estimation Problem...... 78 6.2 Gibbs property of Hidden Markov Chains...... 78
7 Denoising 81
A Prerequisites 89 A.1 Notation...... 89 A.2 Measure theory...... 89 A.3 Stochastic processes...... 92 A.4 Ergodic theory...... 93 A.5 Entropy...... 94 A.5.1 Shannon’s entropy rate per symbol...... 94 A.5.2 Kolmogorov-Sinai entropy of measure-preserving sys- tems...... 95 Chapter 1
Introduction
Last modified on April 3, 2015 Very often the true dynamics of a physical system is hidden from us: we are able to observe only a certain measurement of the state of the underly- ing system. For example, air temperature and the speed of wind are easily observable functions of the climate dynamics, while electroencephalogram provides insight into the functioning of the brain.
Partial Observability Noise Coarse Graining
Observing only partial information naturally limits our ability to describe or model the underlying system, detect changes in dynamics, make pre- dictions, etc. Nevertheless, these problems must be addressed in practical
1 2 CHAPTER 1. INTRODUCTION
situations, and are extremely challenging from the mathematical point of view. Mathematics has proved to be extremely useful in dealing with imperfect data. There are a number of methods developed within various mathe- matical disciplines such as
• Dynamical Systems
• Control Theory
• Decision Theory
• Information Theory
to address particular instances of this general problem – dealing with imperfect knowledge. Remarkably, many problems have a common nature. For example, cor- recting signals corrupted by noisy channels during transmission, analyz- ing neuronal spikes, analyzing genomic sequences, and validating the so- called renormalization group methods of theoretical physics, all have the same underlying mathematical structure: a hidden Gibbs model. In this model, the observable process is a function (either deterministic or ran- dom) of a process whose underlying probabilistic law is Gibbs. Gibbs mod- els have been introduced in Statistical Mechanics, and have been highly successful in modeling physical systems (ferromagnets, dilute gases, poly- mer chains). Nowadays Gibbs models are ubiquitous and are used by researchers from different fields, often implicitly, i.e., without knowledge that a given model belongs to a much wider class. A well-established Gibbs theory provides an excellent starting point to develop Hidden Gibbs theory. A subclass of Hidden Gibbs models is formed by the well-known Hidden Markov Models. Several examples have been considered in Statistical Mechanics and the theory of Dynamical Systems. However, many basic questions are still unanswered. Moreover, the relevance of Hidden Gibbs models to other areas, such as Information Theory, Bioinformatics, and Neuronal Dynamics, has never been made explicit and exploited. Let us now formalise the problem, first by describing the mathematical paradigm to model partial observability, effects of noise, coarse-graining, and then, introducing the class of Hidden Gibbs Models. 1.1. HIDDEN MODELS / HIDDEN PROCESSES 3
1.1 Hidden Models / Hidden Processes
Let us start with the following basic model, which we will specify further in the subsequent chapters.
The observable process {Yt} is a function of the hidden time- dependent process {Xt}.
We allow the process {Xt} to be either
• a stationary random (stochastic) process,
• a deterministic dynamical process
Xt+1 = f (Xt).
For simplicity, we will only consider processes with discrete time, i.e., t ∈ Z or Z+. However, many of what we discuss applies to continuous time processes as well.
The process {Yt} is a function of {Xt}. The function can be
• deterministic Yt = f (Xt) for all t,
• or random: Yt ∼ PXt (·), i.e. Yt is chosen according to some probabil- ity distribution, which depends on {Xt}.
Remark 1.1. If not stated otherwise, we will implicitly assume that in case Yt is a random function of the underlying hidden process Xt, then Yt is chosen independently for every t. For example,
Yt = Xt + Zt, where the “noise” {Zt} is a sequence of independent endemically dis- tributed random variables. In many practical situations discussed below one can easily allow for the “dependence", e.g., {Zt} is a Markov process. Moreover, as we will see latter, the case of a random function can reduced to the deterministic case, hence, any stationary process {Zt} can be used to model noise. However, despite the fact the models are equivalent (see next subsection), it is often convenient to keep implicit dependence on the noise parameters. 4 CHAPTER 1. INTRODUCTION
1.1.1 Equivalence of “random” and “deterministic” settings.
There is one-to-one correspondence between stationary processes and measure-preserving dynamical systems.
Proposition 1.2. Let (Ω, A, µ, T) be a measure-preserving dynamical system. Then for any measurable ϕ : Ω → R,
(ω) t Yt = ϕ(T ω), ω ∈ Ω, t ∈ Z+ or Z,(1.1.1)
is a stationary process. In the opposite direction, any real-valued stationary pro- cess {Yt} gives rise to a measure preserving dynamical system (Ω, A, µ, T) and a measurable function φ : Ω → R such that (1.1.1) holds.
Exercise 1.3. Prove Proposition 1.2.
1.1.2 Deterministic chaos
Chaotic systems are typically defined as systems where trajectories de- pend sensitivelyon the initial conditions, that is, if small difference in ini- tial conditions results, over time, in substantial differences in the states: small causes can produce large effects. Simple observables (functions) {Yt} of trajectories of chaotic dynamical systems {Xt}, Xt+1 = f (Xt), can be indistinguishable from “random” processes.
Example 1.4. Consider a piece-wise expanding map f : [0, 1] → [0, 1],
f (x) = 2x mod 1,
and the following observable φ : ( 0, x ∈ [0, 1 ] =: I , ( ) = 2 0 φ x 1 1, x ∈ ( 2 , 1] := I1.
Since the map f is expanding: for x, x0 close,
d( f (x), f (x0)) = 2d(x, x0),
the dynamical system is chaotic. Furthermore, one can easily show that n the Lebesgue measure on [0, 1] is f -invariant, and the process Yn = φ( f (x)) is actually a Bernoulli process.
Theorem 1.5 (Sinai’s Theorem on Bernoulli factors). 1.1. HIDDEN MODELS / HIDDEN PROCESSES 5
At the same time, often a time series looks "completely random", while it has a very simple underlying dynamical structure. In the figure below, the Gaussian white noise is compared visually vs the so-called determin- istic Gaussian white noise. The reconstruction techniques (c.f., Chapter ??) easily allow to distinguish the time series, as well as to identify the underlying dynamics.
Figure 1.1: (a) Time series of the Gaussian and deterministic Gaussian white noise. (b) Corresponding probability densities of single observations. (c) Reconstruction plots Xn+1 vs Xn.
Many vivid examples of close resemblance between the "random" and "deterministic" processes can be found in the literature. For example, the following figure is taken from Daniel T. Kaplan, Leon Glass, Physica D 64, 431-453 (1993) demonstrates visual similarity between several natural and surrogate models. 6 CHAPTER 1. INTRODUCTION
1.2 Hidden Gibbs Models
The key assumption of the model is that the observable process is a function, either deterministic or random, of the underlying process, whose governing probability distribution belongs to the class of Gibbs distributions.
(i) The most famous example covered by the proposed dynamic probabilis- tic models are the so-called Hidden Markov Models (HMM). Suppose {Xn} is a Markov process, assuming values in a finite state space A. In HMM’s, the observable output process Yn with values in B, is dependent only on Xn, and the value Yn ∈ B is chosen according to the so-called emission distributions:
Πij = P(Yn = j | Xn = i), i ∈ A, j ∈ B,(1.2.1) for each n independently. The Hidden Markov models can be found in a broad range of applications: from speech and handwriting recognition, to 1.2. HIDDEN GIBBS MODELS 7 bioinformatics and signal processing.
(ii) A Binary Symmetric Channel (BSC) is a widely used model of a communication channel in coding theory and information theory. In this model, a transmitter sends a bit (a zero or a one), and the receiver receives a bit. Typically, the bit is transmitted correctly, but, with a 0 1 − e 0 small probability e, the bit will be e inverted or flipped.We can represent the action of the channel as 1 1 − e 1
Yn = Xn ⊕ Zn, where ⊕ is the exclusive OR opera- tion, and Zn is the state of the channel: 0 – transmit "as is", 1 – flip. The channel can be memoryless: in this case, {Zn} is a sequence of independent Bernoulli random variables with P(Zn = 1) = e, or the channel can have memory, e.g., if {Zn} is Markov (Gilbert-Elliott channel, suitable to model burst errors). If {Xn} is a binary Markov process, then this model is a particular example of a Hidden Markov model.
(iii) Neuronal spikes are sharp pronounced deviations from the base- line in recordings of neuronal activity. Often, the electrical neuronal ac- tivity is transformed into 0/1 processes, indicating the presence of ab- sence of spikes. These temporal binary processes – spike trains, are clearly functions of the underlying neuronal dynamics. In the field of Neural Coding/Decoding, the relationship between the stimulus and the result- ing spike responses is investigated. Another, particularly interesting type of spike processes originates from the so-called interictal (between the seizures) spikes. The presence of interictal spikes is used diagnostically as a sign of epilepsy. 8 CHAPTER 1. INTRODUCTION
Figure 1.2: (a) The spike trains emitted by 30 neurons in the visual area of a monkey brain are shown for a period of 4 seconds. The spike train emitted by each of the neurons is recorded in a separate row. (source W.Maas, http://www.igi.tugraz.at/maass/). (b) EEG recordings of a pa- tient with Rolandic epilepsy provided by the Epilepsy Centre Kempen- haeghe. Epileptic interictal spikes are clearly visible in the last 3 epochs.
(iv) The decimation is a classical example of renormalization transforma- tion in Statistical Mechanics. Consider a d-dimensional lattice system Ω = d AZ , with the single-spin space A. The image of a spin configuration T2 x ∈ Ω, under the decimation trans- −→ formation with spacing b, is again a spin configuration y in Ω, y = Tb(x), given by d yn = xbn for all n ∈ Z . Under the decimation transformation, only every bd-th spin survives. Fig- ure on the right depicts the action of the decimation transformation T2 on the two-dimensional integer lattice Z2.
1.2.1 How realistic is the Gibbs assumption?
ADVANTAGES OF GIBBS ASSUMPTION Part I
Theory
9
Chapter 2
Gibbs states
Last modified on September 23, 2015 Existing literature on rigorous mathematical theory of Gibbs states is quite extensive. The classical references are the books of Ruelle [18] and Georgii [7, 8]. In the context of dynamical systems, the book of Bowen [11, 12] provides a nice introduction to the subject.
2.1 Gibbs-Bolztmann Ansatz
The Gibbs distribution prescribes the probability of the event that the system with the finite state space S is in state x ∈ S, to be equal to 1 1 P(x) = exp − H(x) ,(2.1.1) Z kT where H(x) is the energy of the state x, k is the Boltzmann constant, T is the temperature of the system, and 1 Z = ∑ exp − H(y) y∈S kT is a normalising factor, known as the partition function. Exercise. Show that the probability measure P on S given by (2.1.1) is precisely the unique probability measure on S which maximizes the fol- lowing functional, defined on probability measures on S by
P 7→ − ∑ P(x) log P(x) − β ∑ H(x)P(x) = Entropy − β · Energy, x∈S x∈S where β = 1/kT is the inverse temperature.
11 12 CHAPTER 2. GIBBS STATES
2.2 Gibbs states for Lattice Systems
Equation (2.1.1) – known as the Gibbs-Boltzmann Ansatz – makes perfect sense for systems where the number of possible states is finite. Typically for systems with infinitely many states, each state occurs with probability zero, and a new interpretation of (2.1.1) is required. This problem was successfully solved in late 1960’s by R.L. Dobrushin [2], O. Lanford and D. Ruelle [3]. The key is to identify Gibbs states as those probability measures on the infinite state space which have very specific finite volume conditional probabilities, parametrized by boundary conditions, and given by expressions similar to (2.1.1). The definition of Gibbs measures is applicable to rather general lattice sys- tems Ω = AL, where A is a finite or a compact spin-space, and L is a lattice. For simplicity, let us assume that A is finite, and L is an integer lattice Zd, d ≥ 1. Elements ω of Ω are functions on Zd with values in A, equivalently, at each site n d in Z , “sits" a spin ωn with a value in A. If ω ∈ Ω, and Λ is d a subset of Z , ωΛ will denote the restriction of ω to Λ, i.e., Λ d ωΛ ∈ A . Fix a finite volume Λ is Z , and some configura- ωΛ σΛc tion ωΛc outside of Λ, we will refer to ωΛc as the boundary conditions. Gibbs measures are defined by prescribing the con- ditional probability of observing configuration σΛ in the finite volume Λ, given the boundary conditions ωΛc 1 −βHΛ(σΛω c ) P(σΛ | ωΛc ) = e Λ =: γΛ(σΛ|ωΛc ),(2.2.1) Z(ωΛc ) 1 where β = kT is the inverse temperature, HΛ is the corresponding energy, and the normalising factor – partition function Z, now depends on the boundary conditions ωΛc . The analogy with (2.1.1) is clear.
Naturally, the family of conditional probabilities Γ = {γΛ(·|σΛc )}, called specification, indexed by finite sets Λ and boundary conditions σΛc , can- not be arbitrary, and must satisfy some consistency conditions. In order to ensure the consistency of the family of probability kernels {γΛ}, the energies HΛ must be of a very special form, namely:
HΛ(σΛωΛc ) = ∑ Ψ(V, σΛωΛc ),(2.2.2) V∩Λ6=∅ where the sum is taken over all finite subsets V of Zd, and the collection of functions
Ψ = {Ψ(V, ·) : Ω → R : V ⊂ Zd, |V| < ∞} 2.2. GIBBS STATES FOR LATTICE SYSTEMS 13 called the interaction, has the following properties:
• (locality): for all η ∈ Ω, Ψ(V, η) = Ψ(V, ηV), meaning that the con- tribution of volume V to the total energy only depends on the spins in V.
• (uniform absolute convergence):
|||Ψ||| = sup ∑ sup |Ψ(V, ηV)| < +∞. n∈Zd V3n η∈Ω
d Definition. A probability measure P on Ω = AZ is called a Gibbs state for the interaction Ψ = {Ψ(Λ, ·) | Λ b Zd} (denoted by P ∈ G(Ψ) if (2.2.1) holds P-a.s., equivalently, if the so-called Dobrushin-Lanford-Ruelle equations hold for all f ∈ L1(Ω, P) ( f ∈ C(Ω, R)) and all Λ b Zd Z Z f (ω)P(dω) = (γΛ f )(ω)P(dω), Ω Ω where (γΛ f )(ω) = ∑ γΛ(σΛ|ωΛc ) f (σΛωΛc ). Λ σΛ∈A This is equivalent to saying that
EP( f |FΛc ) = γΛ f ,
c where FΛc is the σ-algebra generated by spins in Λ .
Theorem 2.1. If Ψ is a uniformly absolutely convergent interaction, then the set of Gibbs measures for Ψ is not empty.
Thus the Dobrushin-Lanford-Ruelle (DLR) theory guarantees existence of at least one Gibbs state for any interaction Ψ as above, i.e., a Gibbs state consistent with the corresponding specification. It is possible that multiple Gibbs states are consistent with the same specification; in this case, one speaks of a phase-transition.
2.2.1 Translation Invariant Gibbs states
d Lattice (group) Zd acts on AZ by translations:
Zd 3 k → Sk, 14 CHAPTER 2. GIBBS STATES where S k(ω) = ω˜ with
d ω˜ n = ωn+k for all n ∈ Z ,
d i.e. configuration ω is shifted by k. Recall, that the measure P on Ω = AZ is called translation invariant if
− P(S kA) = P(A)
for every Borel set A. When are Gibbs states translation invariant? What are sufficient conditions for existence of translation invariant Gibbs states?
Definition 2.2. An interaction Φ = {φ(Λ, ·) | Λ b Zd} is called transla- tion invariant if
k k φΛ+k(ω) = φΛ(S ω)(φΛ+k = φΛ ◦ S )
d for all Λ b Zd, k ∈ Zd, and every ω ∈ Ω = AZ . Example 2.3 (Ising Model of interacting spins). The alphabet A = {−1, 1} represents two possible spin values ”up" and “down", and the interaction is defined as follows −hω , if Λ = {i}, i φΛ(ω) = −Jωiωj, if Λ = {i, j} and i ∼ j, 0, otherwise
The notation i ∼ j indicates that sites i and j are nearest neighbours, i.e., ||i − j||1 = 1. Parameterh is the strength of the external magnetic field, and the coupling constant J determines whether the interaction is ferromagnetic (J > 0) or anti-ferromagnetic (J < 0). In a ferromagnetic case, spins “prefer” to be aligned, i.e., aligned configurations have higher probabilities. The spins also prefer to line up with the external field .
If φ is translation invariant, then so is the specification
k k γΛ(S σ|S ω) = γΛ+k(σ|ω).
Theorem 2.4. If γ = {γΛ(·|·)} is a translation invariant specification, then the set of translation invariant Gibbs measures for γ is not empty. 2.2. GIBBS STATES FOR LATTICE SYSTEMS 15
2.2.2 Regularity of Gibbs states
If the interaction Ψ is as above, then the corresponding specification Γ = {γΛ(·|·)} has the following important regularity properties:
(i) Γ is uniformly non-null or finite energy: the probability kernels γΛ(·|·) are uniformly bounded away from 0 and 1, i.e., for every finite set Λ, there exist positive constants αΛ, βΛ such that
0 < αΛ ≤ γ(σΛ|ωΛc ) ≤ βΛ(< 1)
for all σΛωΛc ∈ Ω. (ii) Γ is quasi-local: for any Λ, the kernels are continuous in product topology: i.e., as V % Zd
sup γΛ(ωΛ|ηV\ΛζVc ) − γΛ(ωΛ|ηV\ΛχVc ) → 0, ω,η,ζ,χ in other words, changing remote spins (spins in Vc), has diminishing effect of the distribution of the spins in Λ.
Consider now the inverse problem: suppose P is a probability measure on Ω. Can we decide whether it is Gibbs, equivalently, if it is consistent with a Gibbs specification Γ = ΓΨ for some Ψ. Naturally, for a measure to be Gibbs, it must, at least, be consistent with some uniformly non-null quasi-local specification Γ = {γΛ(·|·)}Λ:
P(σΛ | ωΛc ) = γΛ(σΛ|ωΛc ) (2.2.3) for P-a.a. ω. The necessary and sufficient conditions for (2.2.3) to hold are given by the following theorem.
d Theorem 2.5. A measure P on Ω = AZ with full support is consistent with some some uniformly non-null quasi-local specification if and only if for every d Λ b Z and all σΛ, P(σΛ|ωV\Λ) converge uniformly in ω as V % Zd.
Suppose now, the probability measure P passed the test of Theorem 2.5, and is thus consistent with some uniformly non-null quasi-local specifi- cation Γ = {γΛ(·|·)}Λ. Is it possible to find an interaction Ψ such that Γ = ΓΨ, i.e.,
1 Ψ −H (σΛω c ) Ψ γ(σΛ|ωΛc ) = e Λ Λ =: γ (σΛ|ωΛc ), Z(ωΛc ) 16 CHAPTER 2. GIBBS STATES with Ψ HΛ(σΛωΛc ) = ∑ Ψ(V, σΛωΛc ) V∩Λ6=∅ depending on the answer, the measure P will or will not be Gibbs. It turns out that the answer, given by Kozlov [5] and Sullivan [6], is always positive. Theorem 2.6. If Γ = {γ(·|·)} is a uniformly non-null quasi-local specification then there exists an uniformly absolutely convergent interaction Ψ such that
Ψ γ(σΛ|ωΛc ) = γ (σΛ|ωΛc ).
Equivalently, if P is consistent with some uniformly non-null quasi-local speci- fication, then there exists a uniformly absolutely convergent interaction Ψ, such that P is a Gibbs state for Ψ.
However, in case the uniformly non-null quasi-local specification is transla- tion invariant, the corresponding interaction is not necessarily translation invariant. There is a procedure to derive a translation invariant interaction, consistent with a given translation invariant specification, but the resulting interaction is not necessarily uniformly absolutely convergent, but does satisfy weaker summability condition. In fact it is an open problem in the theory of Gibbs states: Open problem: Does every translation invariant uniformly non-null quasi- local specification allow a Gibbs representation for some translation in- variant uniformly absolutely convergent interaction. The theory of Gibbs states is extremely rich [7, 18]. In particular, there is a dual possibility to characterize Gibbs states locally (via specifications) or globally (via variational principles). A variety of probabilistic techniques is available for Gibbs distributions: limit theorems, coupling techniques, cluster expansions, large deviations.
2.3 Gibbs and other regular stochastic processes
Gibbs probability distributions have been highly useful in describing phys- ical systems (ferromagnets, dilute gases, polymers). However, the intrin- sically multi-dimensional theory of Gibbs random fields is of course ap- plicable to one-dimensional random processes as well. To define a Gibbs random process {Xn} we have to prescribe two-sided conditional distribu- tions 1 1 +∞ +∞ P(X0 = a0|X−∞ = a−∞, X1 = a1 ), ai ∈ A,(2.3.1) 2.3. GIBBS AND OTHER REGULAR STOCHASTIC PROCESSES 17 i.e., conditional distribution of the present given the complete past and the complete future. Defining processes via the one-sided conditional probabilities 1 1 P(X0 = a0|X−∞ = a−∞), using only past values, is probably more popular. In fact, the full Gibbsian description has rarely been used outside the realm of Statistical Physics. Nevertheless, two-sided descriptions have definite advantages in some applications as we will demonstrate in subsequent chapters. On the other hand, the idea of using in some sense regular conditional probabilities to define random process is very natural and has been em- ployed on multiple occasions in various fields. Let us now review some of the most popular regular one-sided models. As we have seen in preceding sections, Gibbs measures are characterized by regular (quasi-local and non-null) specifications: two-sided conditional probabilities. At the same time, many probabilistic models have introduced in the past 100 years which require some form of regularity of one-sided conditional probabilities. In many cases, it turns out that one-sided regularity also im- plies the two-sided regularity, and hence these processes are also Gibbs.1 Let us review some of these models.
2.3.1 Markov Chains
Introduced by Markov in early 1900’s.
Definition 2.7. The process {Xn}n≥0 with values in A, |A| < ∞, is Markov if it has the so-called Markov property:
P(Xn = an|Xn−1 = an−1, Xn−2 = an−2,... ) = P(Xn = an|Xn−1 = an−1) (2.3.2) for all n ≥ 1 and all an, an−1, · · · ∈ A, i.e., only immediate past has influ- ence on the present.
Clearly, (2.3.2) is a very strong form of regularity of conditional probabili- ties.
1From a practical point of view, studying functions of “one-sided regular” processes is also important. In fact, the basic question: what happens to functions of regular processes. 18 CHAPTER 2. GIBBS STATES
We will mainly be interested in stationary Markov processes, or equiva- lently, in time-homogeneous chains:
P(Xn = an|Xn−1 = an−1) = pan−1,an
depends only on symbols an−1, an, and not n. Time-homogeneous Markov chains are parametrized by the transition probability matrices: non-negative stochastic matrices P = pa a0 , pa,a0 ≥ 0, pa,a0 = 1. , a,a0∈A ∑ a,a0
And hence
n−1 ( = = = ) = ( = | = ) · ( = ) P X0 a0, X1 a1,..., Xan an ∏ P Xi+1 ai+1 Xi ai P X0 a0 i=1
(telescoping).
The law P of the Markov process (chain) {Xn} is shift-invariant if the initial distribution π = P(X0 = a1) ... P(X0 = am) , A = {a1,..., am},
satisfies πP = π.
Remark 2.8. The law P of a stationary (shift-invariant) Markov chain Z+ gives us a measure on Ω = A = {ω = (ω0, ω1,... ) : ωi ∈ A}. Any translation invariant measure on Ω = AZ+ can be extended in a unique way to a measure on Ω = AZ (by translation invariance we can determine measure of any cylindric event).
0 Exercise 2.9. Suppose P = (pa,a )a,a0∈A is a strictly positive stochastic matrix. Then there exists a unique initial distribution π which gives rise to a stationary Markov chain. Moreover, the corresponding stationary law P, when extended to Ω = AZ, is Gibbs. Identify a Gibbs interaction Ψ of P.
2.3.2 k-step Markov chains
Sometimes Markov chains can be non-adequate as models of processes of interest, e.g., when process has longer “memory”. For these situations, 2.3. GIBBS AND OTHER REGULAR STOCHASTIC PROCESSES 19
Markov chains with longer memory — k-step Markov chain have been introduced. They are characterized by the k-step Markov property:
P(Xn = an|Xn−1 = an−1, Xn−2 = an−2,... ) = P(Xn = an|Xn−1 = an−1,..., Xn−k = an−k) for all n ≥ k and all an, an−1, · · · ∈ A. One can show that any stationary process can be well approximated by k-step Markov chains as k → ∞. (Markov measures are dense among all shift-invariant measures.) The main drawback of using k-step Markov chains in applications is the neces- sity to estimate large (= |A|k · (|A| − 1)) number of parameters (= elements = 0 of P p~a,a ~a∈Ak,a0∈A).
2.3.3 Variable Length Markov Chains
Variable Length Markov Chains (VLMC’s), also known as probabilistic tree sources in Information Theory, are characterized that conditional proba- bilities P(Xn = an|Xn−1 = an−1, Xn−2 = an−2,... ) depend on finitely many preceding symbols, but the number of such sym- bols in varying, depending on the context. More specifically, there exists a function − ` : A N → Z+ (or, {0, 1, . . . , N}) such that
P(Xn = an|Xn−1 = an−1, Xn−2 = an−2,... ) n−1 n−1 = (Xn = an|X = a ), P n−`(an−1,an−2,... ) n−`(~a) if `(~a) = 0, then P(Xn = an|X Conditional probabilities (2.3.3) can be visualized with a help of a probabilistic tree : each terminal node carries (is assigned) a probability distribu- tion on {0, 1} and for a given context (a−1, a−2,...) the actual distribution is found by performing the corresponding walk by following the prescribed edges on the tree until a terminal node is reached. 2.3.4 Countable Mixtures of Markov Chains (CMMC’s) We have ∞ | = (0) + (k) | −1 P (a0 a<0) λ0P (a0) ∑ λkP a0 a−k ,(2.3.4) k=1 (k) where P is an order-k transition probability and ∑k≥0 λk = 1. In such chains (2.3.4), the next symbol can be generated using the com- bination of two independent random steps. Firstly, with probabilities λk, the k-the Markov chain is selected, and then the next symbol is chosen according to P(k). Thus, each transition depends on finite, but random number of preceding symbols. Note that conditional probabilities (2.3.4) are continuous in the following sense. −1 −1 −k−1 −k−1 P X0 = a0|X−k = a−k, X−∞ = b−∞ − = | −1 = −1 −k−1 = −k−1 ≤ → → P X0 a0 X−k a−k, X−∞ c−∞ ∑ λm 0 as k ∞. m>k 2.3.5 g-measures (chains with complete connections) The class of g-measures has been Introduced by M. Keane in 1972 [17]. These measures are characterized by regular one-sided conditional prob- abilities. In fact, similar definitions has already been given by various authors under a variety of names: e.g., chains with complete connections (cha´ines a` liaisous comple`tes) by Onicescu and Mihoc (1935), Doeblin- Fortet [14] established first uniqueness results, chains of infinite oder by Harris (1995). In our presentation we use the notation inspired by dynam- ical systems, might be strange from a probabilistic point of view, but is completely equivalent to a more familiar probabilistic point of defining processes, given the past, discussed in the previos sections. 2.3. GIBBS AND OTHER REGULAR STOCHASTIC PROCESSES 21 Let A be a finite set, and put Z+ Ω = A = {ω = (ω0, ω1,...) : ωn ∈ A for all n ∈ Z+} Denote by G = GΩ the class of continuous (in the product topology), positive functions g : Ω → [0, 1] which are normalized : for all ω ∈ AZ+ ∑ g (aω) = 1. a∈A where aω ∈ Ω is the sequence (a, ω0, ω1,...). Definition 2.11. A probability measure P on Ω = AZ+ is a called a g- measure if P (ω0|ω1, ω2,...) = g (ω0ω1 ··· ) (P − a.s.),(2.3.5) equivalently, if for any f ∈ C (Ω, R) one has Z Z " # f (ω) P ( dω) = ∑ f (a0ω) g (a0ω) P ( dω) .(2.3.6) Ω Ω a0 We will refer to (2.3.6) as one-sided Dobrushin-Lanford-Ruelle equations. The function g can be viewed an a one-sided quasi-local non-null specifi- cation. One can show [9] that the expression in square brackets in (2.3.6) has the following probabilistic interpretation: namely, as the conditional expectation of f with respect to a certain σ-algebra. If A is the σ-algebra −1 of Borel subsets of Ω, and F1 = S A = σ {ωk}k≥1 , then EP( f |F1)(ω) = ∑ f (a0ω1ω2 ...)g(a0ω1ω2 ...), a0∈A where for all continuous f , and more generally, for all f ∈ L1(Ω, P). All previous examples we considered – Markov chains, k-step Markov chains, VLMC’s with bounded memory, CMMC’s, fall into this class of g- measures. It is easy to see that the VLMC with infinite memory in Example 2.10: k P 1|0 1∗ = pk, k ≥ 0, (2.3.7) ∞ P (1|0 ) = p∞,(2.3.8) is also a g-measure provided p∞ = lim pk. k→∞ 22 CHAPTER 2. GIBBS STATES Theorem 2.12. For any g ∈ GΩ, there exists at least one translation invariant g-measure. Proof. Consider the transfer operator Lg acting on continuous functions C(Ω) defined as Lg f (ω) = ∑ g(aω) f (aω). a∈A The operator Lg : C(Ω) → C(Ω), and hence has a well-defined dual ∗ operator Lg acting on M(Ω) – the set probability measures on Ω, given by Z Z ∗ (Lg f ) dP = f d(LgP).(2.3.9) Ω Ω ∗ Moreover, Lg maps the set of translation invariant measures MS(Ω) into itself and is continuous. The Schauder-Tychonoff fixed point theorem states that any continuous map of a compact convex set has a fixed point. Since the set of translation invariant probability measures MS(Ω) is com- ∗ pact and convex, operator Lg has a fixed point, say P, and hence (2.3.6) ∗ follows from (2.3.9) since P = LgP, and therefore, P is the g-measure. ∗ Exercise 2.13. Validate that indeed Lg preserves MS(Ω). Theorem 2.14 (Walters). If g ∈ GΩ, then any g-measure P has full support. Moreover, if g1, g2 ∈ GΩ are such there is a measure P which is a g1-measure, as well as g2-measure, then g1 = g2. Proof. It is sufficient to prove that any g-measure P assigns positive prob- n−1 ability to any cylinder set [c0 ]. Furthermore, since g is continuous and positive, κ = inf g(ω) > 0. ω∈Ω Since P is g-measure, for any continuous function f : Ω → R one has Z Z f (ω)P( dω) = ∑ f (a0ω)g(a0ω)P( dω), Ω Ω a0∈A and iterating the previous identity, one obtains Z Z n−1 n−1 n−1 f (ω)P( dω) = ∑ f (a0 ω) ∏ g(ai ω) P( dω). Ω Ω n−1 n i=0 ~a=(a0 )∈A Consider ( 1 if ωi = ci, i = 0, . . . , n − 1, f (ω) = I[cn−1](ω) = 0 0 otherwise. 2.3. GIBBS AND OTHER REGULAR STOCHASTIC PROCESSES 23 Therefore, Z Z n−1 n−1 n−1 P([c ]) = I n−1 (ω)P( dω) = g(c ω)P( dω) 0 [c0 ] ∏ i Ω Ω i=0 Z ≥ κn P( dω) = κn > 0. Ω The second claim follows from the claim that if g1 and g2 have a common g-measure, then g1 = g2 (P − a.s.), and hence everywhere on Ω. Theorem 2.15. A fully supported measure P on Ω = AZ+ is a g-measure for some g ∈ G(Ω) if and only if the sequence P([ω0 ... ωn]) P(ω0|ω1 ... ωn) = P([ω1 ... ωn] converges uniformly in ω as n → ∞. Proof. Suppose P is g-measure. Using expression for probabilities of cylin- ders obtained in the proof of previous theorem one has − R n−1 n− P(cn 1) (∏ g(c 1ω))P( dω) 0 = Ω i=0 i . ( n−1) R ( n−1 ( n−1 )) ( ) P c1 Ω ∏i=1 g ci ω P dω Thus, n−1 n−1 n−1 P(c0|c1 ) ∈ inf g(c0 ω), sup g(c0 ω) . ω ω Since g is continuous on Ω = AZ+ and hence uniformly continuous, the n-th variation of g n−1 n−1 varn(g) = sup sup g(c0 ω) − g(c0 ω˜ ) → 0 n−1 ω,ω ˜ ∈Ω c0 as n → ∞, and thus n−1 P(c0|c1 )/g(c) ∈ [1 − δn, 1 + δn], where δn → 0. n In the opposite direction, gn(c) = P(c0|c1 ) are local functions, since they depend only on finitely many coordinates, and thus gn’s are also continu- ous. This sequence of continuous functions on a compact set is assumed to converge uniformly on a compact metric space ω. Therefore, the limit 24 CHAPTER 2. GIBBS STATES function g is also continuous. By the Martingale Convergence Theorem the limit must be equal to P(c0|c1, c2,...) = g(c0, c1,...) P − a.e., and hence P is a g-measure. As we have seen, since g is continuous, n−1 n−1 varn(g) = sup |g(c0 ω) − g(c0 ωe )| → 0 c,ω,ωe as n → ∞. Many properties of g-measures depend on how quickly varn(g) → 0. For example, if g has exponentially decaying variation: n varn(g) ≤ cθ for some c > 0, θ ∈ (0, 1), or more generally, g has summable variation: ∞ ∑ varn(g) < ∞, n=1 then there exists a unique g-measure with good mixing properties. Theorem 2.16. Suppose g ∈ GΩ has summable variations, then g admits a unique g-measure P. Moreover, P (viewed as a measure on AZ) is Gibbs for some uniformly absolutely summable interaction ψ. Proof. For any f ∈ C(Ω) and every n ≥ 1, one has n−1 n n−1 n−1 Lg f (ω) = ∑ ∏ g(ai ω) f (a0 ω). n−1 n i=0 a0 ∈A k−1 k−1 Consider arbitrary ω, ωe ∈ Ω such that ω0 = ωe0 , k ∈ N. Then n−1 n−1 Ln ( ) − Ln ( ) ≤ ( n−1 ) − ( n−1 ) · | ( n−1 )| g f ω g f ωe ∑ ∏ g ai ω ∏ g ai ωe f a0 ω n−1 n i=0 i=0 a0 ∈A n−1 n−1 n−1 n−1 + ∑ ∏ g(ai ω)| · | f (a0 ω) − f (a0 ωe )| n−1 n i=0 a0 ∈A n−1 n−1 ≤ || || ( n−1 ) − ( n−1 ) + ( ) f C(Ω) ∑ ∏ g ai ω ∏ g ai ωe varn+k f , n−1 n i=0 i=0 a0 ∈A 2.3. GIBBS AND OTHER REGULAR STOCHASTIC PROCESSES 25 since n−1 n−1 ∑ ∏ g(ai ω) = 1 n−1 n i=0 a0 ∈A for all n ≥ 1 and every ω, since g is normalized. Let is now derive a uniform estimate of n−1 n−1 ( n−1 ) = ( n−1 ) − ( n−1 ) Cn a0 , ω, ωe : ∏ g ai ω ∏ g ai ωe i=0 i=0 − − n−1 ∏n 1 g(an 1ω) = g(an−1ω) i=0 i − 1 . ∏ i e n−1 ( n−1 ) i=0 ∏i=0 g ai ωe One can continues as follows, − − ∏n 1 g(an 1ω) n−1 i=0 i = exp [log g(an−1ω) − log g(an−1ω)] ∈ [D−1, D ] n−1 ( n−1 ) ∑ i i e k k ∏i=0 g ai ωe i=0 k−1 k−1 where, taking into account that ω0 = ωe0 , ∞ Dk = ∑ varj(log g) → 0 as k → ∞, j=k+1 because if g has summable variation, then so is log g (see Exercise 2.17 0 Dk −Dk below). Therefore, for Dk = max(e − 1, 1 − e ) → 0, one has n−1 n−1 0 n−1 Cn(a0 , ω, ωe ) ≤ Dk ∏ g(ai ωe ), i=0 and hence n n 0 Lg f (ω) − Lg f (ωe ) ≤ Dk|| f ||C(Ω) + vark( f ). n The sequence of functions fn = Lg f is uniformly continuous, i.e., for every e > 0, there exists k ∈ N such that n n Lg f (ω) − Lg f (ωe ) < e k−1 k−1 −k for all n and every ω, ωe ∈ Ω such that ω0 = ωe0 , i.e., ρ(ω, ωe ) ≤ 2 . Moreover, the sequence { fn} is uniformly bounded n ||Lg f ||C(Ω) ≤ || f ||C(Ω). Hence, by the Arzelá-Ascoli theorem, the sequence { fn} has a uniformly converging subsequence { fnj }: nj Lg f ⇒ f∗ ∈ C(Ω). 26 CHAPTER 2. GIBBS STATES n Now we are going to show that in fact f∗ is constant, and Lg f → f∗ as n → ∞. For any continuous function h let α(h) = min h(ω). ω∈Ω It is easy to see that for any f ∈ C(Ω) one has n α( f ) ≤ α(Lg f ) ≤ ... ≤ α(Lg f ) ≤ ..., nj n Moreover, taking into account that Lg f → f , and monotonicity of α(Lg f ), we conclude that n α( f∗) = lim α(L f ) = α(Lg f∗). n g Furthermore, suppose ω ∈ Ω is such that α( f∗) = α(Lg f∗) = Lg f∗(ω) but then Lg f∗(ω) = g(c0ω) f∗(a0ω) = inf f (ω), ∑ ω∈Ω e c0∈A e and hence for each c0, f∗(c0ω) = inf f (ωe ) = α( f∗). ωe ∈Ω n−1 n Repeating the argument, we conclude that for any c0 ∈ A , n−1 f∗(c0 ω) = α( f∗), i.e., f∗ attains its minimal value on every cylinder. Since f∗ is continuous, n f∗ thus must be constant, and then Lg f ⇒ f∗ as n → ∞. In conclusion, n for every f ∈ C(Ω), Lg f ⇒ f∗ = const. In other words, we have a linear functional V : C(Ω) → R defined as V( f ) = f∗, which is non-negative: f ≥ 0 implies V( f ) ≥ 0, normalized V1 = 1. By the Riesz representation theorem, there exists a probability measure P on Ω such that Z V( f ) = f dP. Ω ∗ Since obviously V( f ) = V(Lg f ), we have P = LgP, and hence P is the g-measure. If Pe is another g-measure, then integrating Z n Lg f → f dP Ω 2.4. GIBBS MEASURES IN DYNAMICAL SYSTEMS 27 with respect to Pe, we conclude that Z Z f dPe = f dP Ω Ω for every continuos f , and hence Pe = P. Exercise 2.17. Validate that indeed if g ∈ GΩ has summable variation, then log g has summable variation as well. Remark 2.18. (i) Various uniqueness conditions have been obtained in the past, see [19] for review. The major progress has been achieved Johansson and Öberg (2002) who proved uniqueness of g-measures for the class of g-functions with square summable variation: 2 ∑ varn(g) < ∞, n≥1 see also [16] for a recent result, strengthening and subsuming the previous results. (ii) Bramson and Kalikow [13] constructed a first example of a continuous positive normalized g-function with multiple g-measures. Berger et al [10] showed that 2 + e summability of variations is not sufficient for uniqueness of g-measures. (iii) One can try to generalize the theory of g-measures by considering functions g which are not necessarily continuous, but the available theory is still quite limited [9, 15]. 2.4 Gibbs measures in Dynamical Systems d Among all Gibbs measures on a lattice system Ω = AZ , a special subclass is formed by translation invariant measures. The lattice group Zd acts on n d Ω by shifts: (S ω)k = ωk+n for all ω, k, n ∈ Z . The Gibbs measure P is invariant under this action, if P(S−nC) = P(C) for any event C and all n. Shortly after the founding work of Dobrushin, Lanford, and Ruelle, Sinai [4] extended the notion of invariant Gibbs measures to more general phase- spaces like manifolds, i.e., spaces lacking the product structure of lattice systems, thus making the notion of Gibbs states applicable to invariant measures of Dynamical Systems. In the subsequent works, it was shown that the important class of the natural invariant measures (now known as the Sinai-Ruelle-Bowen measures) of a large class of chaotic dynamical systems are Gibbs. That sparked a great interest in these objects in the 28 CHAPTER 2. GIBBS STATES Dynamical Systems and Ergodic Theory. In Dynamical Systems literature one can find several definitions (not always equivalent) of Gibbs states. For example, the following definition due to Bowen [11] is very popular. Suppose f is a continuous homeomorphism of a compact metric space (X, d), and P is an f -invariant Borel probability measure on X. Then P is called Gibbs in the Bowen sense (abbreviated to Bowen-Gibbs) for a continuous potential ψ : X → R, if for some P ∈ R and every sufficiently small e > 0, there exists C = C(e) > 1 such that the inequalities 1 P y ∈ X : d( f j(x), f j(y)) < e ∀j = 0, . . . , n − 1 ≤ ≤ C C n−1 j exp −nP + ∑j=0 ψ( f (x)) hold for all x ∈ X and every n ∈ N. In other words, the set of points staying e-close to x for n-units of time, has measure roughly e−Hn(x) with n−1 j Hn = nP − ∑j=0 ψ( f (x)). Capocaccia [1] gave the most general definition of Gibbs invariant measures of multidimensional actions (dynamical sys- tems) on arbitrary spaces, hence extending the Dobrushin-Lanford-Ruelle (DLR) construction substantially. Remark 2.19. For symbolic systems Ω = AZ or AZ+ with the dynamics given by the left shift S : Ω → Ω, the above definition can be simplified: a translation invariant measure P is called Bowen-Gibbs for potential ψ ∈ C(Ω, R) if there exists constants C > 1 and P such that for any ω ∈ Ω and all n ∈ N, one has 1 P ω ∈ Ω : ωj = ωj ∀j = 0, . . . , n − 1 ≤ e e ≤ C (2.4.1) C n−1 j exp −nP + ∑j=0 ψ(S (ω)) . 2.4.1 Equilibrium states Suppose ψ : Ω → R is a continuous function, again, for simplicity assume d d that Ω = AZ or AZ+ . Define the topological pressure of ψ as h Z i P(ψ) = sup h(P) + ψdP , P∈M1(Ω,S) where supremum is taken over the set of translation invariant probability measures on Ω. A measure P ∈ M1(Ω, S) is called an equilibrium state for potential ψ ∈ C(Ω) if Z P(ψ) = h(P) + ψdP. For a given potential ψ, denote the set of equilibrium states ES ψ(Ω, S). For symbolic systems, ES ψ(Ω, S) 6= ∅ for any continuous ψ. 2.4. GIBBS MEASURES IN DYNAMICAL SYSTEMS 29 Theorem 2.20 (Variational Principles). The following holds d • Suppose Ω = AZ and Ψ is a translation invariant uniformly absolute convergent interaction. Then a translation invariant measure P is Gibbs for Ψ if and only if P is an equilibrium state for 1 ψ(ω) = − Ψ(V, ω). ∑ |V| 0∈VbZd • Suppose Ω = AZ+ and g is a positive continuous normalized function on Ω, i.e., ∑ g(a0ω) = 1 a0∈A for all ω ∈ AZ+ . Then a translation invariant measure P is a g-measure if and only if P is an equilibrium state for ψ = log g. • Suppose Ω = AZ+ or Ω = AZ, a translation invariant measure P, which is Bowen-Gibbs for ψ, is also an equilibrium state for ψ. In the last case, it is not true that for an arbitrary continuous function ψ, the corresponding equilibrium states are Bowen-Gibbs. However, under various additional assumptions of the “smoothness” of ψ, equilibrium states are Bowen-Gibbs. Theorem 2.21 (Effectively (Sinai, 1972), see also Ruelle’s book Chapter). Suppose P is a translation invariant g-measure for a function g ∈ G(AZ+ ) n with varn(g) ≤ Cθ , for some C > 0, θ ∈ (0, 1), and all n ∈ N. Then P is Gibbs for some translation invariant uniformly absolutely convergent interaction φ = {φΛ(·)}ΛbZ. Z Idea: Consider ϕ = − log g ∈ C(A ), but ϕ depends only on (ω0, ω1, ··· ) ∈ Z+ A . Let us try to find UAC interaction ϕΛn (·), Λn = [0, n] ∩ N, (φΛ ≡ 0 if Λ is not an interval) such that ∞ n ϕ(ω) = − ∑ φ([0, n], ω0 ). n=0 If we can, then P would be Gibbs for Φ. Why? By Variational Principle, any Gibbs measure for φ is also an equi- = − 1 ( ·) librium state for ψ ∑V30 |V| φ V, , and vice versa. = − 1 ( ·) = − ∞ ([ ] ·) = − ( ·) Claim: ψ ∑V30 |V| φ V, and ϕ ∑n=0 φ 0, n , ∑0=minV φ V, have equal sets of equilibrium states reason being Z Z ψ dP = ϕ dP 30 CHAPTER 2. GIBBS STATES for every translation invariant measure. Z ∞ 1 0 Z ψ d = − φ([0, n], ωn+i) d P ∑ + ∑ i P n=0 n 1 i=−n ∞ 1 Z Z = − (n + 1) φ([0, n], ωn) d = ϕ d . ∑ + 0 P P n=0 n 1 n n Easy fact to check: varn(g) ≤ Cθ implies that varn(ϕ) ≤ C˜θ as well. Let ϕ˜ = −ϕ. Put −→ φ0(ω0) = ϕ˜(ω0, 0 ), 1 1 −→ −→ φ{0,1}(ω0) = ϕ˜(ω0, 0 ) − ϕ˜(ω0, 0 ), . . n n −→ n−1 −→ φ{0,1,··· ,n}(ω0 ) = ϕ˜(ω0 , 0 ) − ϕ˜(ω0 , 0 ). −→ Here 0 is any fixed sequence. Clearly N −→ n N ∀ Z+ φ{0,··· ,n}(ω0 ) = ϕ˜(ω0 , 0 ) −−−→ ϕ˜(ω)( ω ∈ A ). ∑ N→ n=0 ∞ Moreover ˜ n−1 ||φ{0,1,··· ,n}(·)||∞ ≤ varn−1(ϕ˜) ≤ Cθ , and hence ∞ ∑ ||φV(·)||∞ ≤ ∑ (n + 1)||φ{0,1,··· ,n}(·)||∞ < ∞ V30 n=0 as well. Chapter 3 Functions of Markov processes Last modified on October 26, 2015 3.1 Markov Chains (recap) We will consider Markov chains {Xn} with n ∈ Z+ (or n ∈ Z) with values in a finite set A. Time-homogeneous Markov chains are defined/characterized by probability transition matrices P = p p = (X = j | X = i) ij i,j∈A , ij P n+1 n . where P is a stochastic matrix: • pij ≥ 0 for all i, j ∈ A, • ∑ pij = 1 for all i ∈ A. j∈A If initial probability distribution π = (π1,..., πM), πi ≥ 0 for all i ∈ A and ∑i∈A πi = 1 satisfies πP = π, then the process {Xn}n≥0 defined by (a) choosing X0 ∈ A according to π, (b) choosing Xn+1 ∈ A according to pi,· : if Xn = i, then Xn+1 = j with probability pij independently of all previous choices, 31 32 CHAPTER 3. FUNCTIONS OF MARKOV PROCESSES is a stationary process. Its translation invariant law P on AZ+ is given by k−1 ( = = = ) = ( ) P Xn an, Xn+1 an+1,..., Xn+k an+k π an ∏ pan+1,an+i+1 i=0 k+1 for all n ≥ 1 and (an,..., an+k) ∈ A . If P is a strictly positive matrix : pij > 0 ∀i, j, then there exists a unique invariant π = πP. The corresponding measure P on AZ+ will have full support : every cylindric event {Xn = an,..., Xn+k = an+k} will have positive probability: it follows from the fact that π > 0 as well.