Markov Chain Monte Carlo (Mcmc) Methods
Total Page:16
File Type:pdf, Size:1020Kb
MARKOV CHAIN MONTE CARLO (MCMC) METHODS 0These notes utilize a few sources: some insights are taken from Profs. Vardeman’s and Carriquiry’s lecture notes, some from a great book on Monte Carlo strategies in scientific computing by J.S. Liu. EE 527, Detection and Estimation Theory, # 4c 1 Markov Chains: Basic Theory Definition 1. A (discrete time/discrete state space) Markov chain (MC) is a sequence of random quantities {Xk}, each taking values in a (finite or) countable set X , with the property that P {Xn = xn | X1 = x1,...,Xn−1 = xn−1} = P {Xn = xn | Xn−1 = xn−1}. Definition 2. A Markov Chain is stationary if 0 P {Xn = x | Xn−1 = x } is independent of n. Without loss of generality, we will henceforth name the elements of X with the integers 1, 2, 3,... and call them “states.” Definition 3. With 4 pij = P {Xn = j | Xn−1 = i} the square matrix 4 P = {pij} EE 527, Detection and Estimation Theory, # 4c 2 is called the transition matrix for a stationary Markov Chain and the pij are called transition probabilities. Note that a transition matrix has nonnegative entries and its rows sum to 1. Such matrices are called stochastic matrices. More notation for a stationary MC: Define (k) 4 pij = P {Xn+k = j | Xn = i} and (k) 4 fij = P {Xn+k = j, Xn+k−1 6= j, . , Xn+1 6= j, | Xn = i}. These are respectively the probabilities of moving from i to j in k steps and first moving from i to j in k steps. Definition 4. We say that a MC is irreducible if, for each i and j, there exists a k (possibly depending upon i and j) such that (k) pij > 0 for finite k. Or, in words, a chain is irreducible if it is possible to eventually get from any state i to any other state j in finite number of steps. Example: The chain with transition matrix 0 1 0 1 1 P = 2 0 2 0 1 0 EE 527, Detection and Estimation Theory, # 4c 3 is irreducible. But, the chain with transition matrix 1 0 0 P = 0 1 0 0 0 1 is reducible. Consider this block structure for the transition matrix: P1 0 P = ,P1,P2 are 2 × 2 matrices 0 P2 where the overall chain is reducible, but its pieces (sub-chains) P1 and P2 could be irreducible. Definition 5. We say that the ith state of a MC is transient if ∞ X (k) fii < 1 k=0 and say that this state is persistent (or recurrent which is the term used as well to describe this phenomenon) if ∞ X (k) fii = 1. k=0 (0) (Note that fii = 0.) A chain is called persistent if all of its states are persistent. In words, a state is transient if once in EE 527, Detection and Estimation Theory, # 4c 4 it, there is some possibility that the chain will never return. A state is persistent (recurrent) if once in it, the chain will with certainty be in it again. Definition 6. We say that state i of a MC has period t if pii = 0 unless k = νt (where ν is an integer, i.e. k is an integer multiple of t) and t is the largest integer with this property. The state is aperiodic if no such t > 1 exists. An MC is called aperiodic if all of its states are aperiodic. Many sources (including Chapter 15 of the 3rd Edition of Feller vol. 1) present a number of useful simple results about MCs. Among them are the following. Theorem 1. All states of an irreducible MC are of the same type with regard to persistence (recurrence) and periodicity. Theorem 2. A finite-state-space irreducible MC is persistent (recurrent). Theorem 3. A state i is persistent (recurrent) iff ∞ X (k) pii = ∞. k=0 This is an important theorem! EE 527, Detection and Estimation Theory, # 4c 5 Proof. Interestingly, we can prove this theorem using the z transform (which, in this context, is known as the probability generating function). (0) Note that pii = 1 and, for n ≥ 1, we have (n) pii = P {Xn = i | X0 = i} = P {X1 = i | X0 = i} · P {Xn−1 = i | X0 = i} | {z(1) } | ({zn−1) } fii pii n X + P {Xk = i, Xk−1 6= i, . , X1 6= i | X0 = i} k=2 | {z(k) } fii · P {Xn−k = i | X0 = i} . | (n{z−k) } pii (0) Note also that fii = 0. Combining the above facts, we write a general formula that holds for all n ≥ 0: n (n) X (k) (n−k) pii = δn,0 + fii pii . k=0 Now, take the z transform of the above expression: Pii(z) = 1 + Fii(z) Pii(z) EE 527, Detection and Estimation Theory, # 4c 6 and, consequently, 1 Pii(z) = . 1 − Fii(z) P∞ (k) State i is persistent (recurrent) if k=0 fii = 1, which is equivalent to Fii(z) = 1 z=1− which further implies Pii(z) = +∞. z=1− P∞ (k) Conversely, state i is transient if k=0 fii < 1, which is equivalent to Fii(z) < 1 z=1− which further implies Pii(z) < +∞. z=1− 2 Note: We can interpret Pii(z) as the expected number z=1− of visits to state i, starting from state i. Indeed, this expected EE 527, Detection and Estimation Theory, # 4c 7 number of visits is h ∞ i ∞ X X (k) E δXk,i | X0 = i = pii = Pii(z) . z=1− k=0 k=0 Clearly, if state i is persistent (recurrent), we will keep visiting it over and over again, and the expected number of visits is infinite! Definition 7. We define the time of first “hitting” state i0 as Ti0 = min{n : n ≥ 1,Xn = i0}. Comments: Clearly, state i0 is persistent (recurrent) if P {Ti0 < ∞ | X0 = i0} = 1. Definition 8. A persistent (recurrent) state i0 is positive persistent (recurrent) if ∞ X (n) E[Ti0 | X0 = i0] = nfi i < ∞ | {z } 0 0 mean recurrence time n=0 and it is null recurrent if E[Ti0 | X0 = i0] = ∞. EE 527, Detection and Estimation Theory, # 4c 8 So, the expected number of steps that it takes for our chain to revisit a positive-recurrent state is finite. Lemma 1. Suppose that state i persistent (recurrent). Then, this state is positive persistent (recurrent) if and only if πi = lim(1 − z)Pii(z) > 0 z→1 and then 1 1 π = = . i ∞ (n) P E[Ti | X0 = i] n=0 nfii Proof. From the proof of Theorem 3, we know 1 Pii(z) = 1 − Fii(z) and, therefore, 1 − F (z) 1 ii = . 1 − z (1 − z)Pii(z) Now 1 − F (z) ∞ ii 0 X (n) lim = Fii(z) = nfii = E [Ti | X0 = i]. z→1 1 − z z=1 n=0 2 EE 527, Detection and Estimation Theory, # 4c 9 Note: This lemma almost proves the convergence of averages of transition probabilities, namely n 1 X (n) 1 pii → as n → +∞ n + 1 E[Ti | X0 = i] k=0 Consider P∞ (k) −k Pn (k) −k −1 k=0 pii z k=0 pii z (1 − z )Pii(z) = ∞ = lim n P −k n→∞ P −k k=0 z k=0 z for any z such that |z| ≥ 1. If we could take the limit as z → 1 under the summations, we would get, using the above lemma: n 1 X (k) 1 πi = lim · pii = . n→∞ n + 1 E[Ti | X0 = i] k=0 (This proof is not complete, since we did not verify that we can take the limit as z → 1 under the summations but the result holds.) The following theorems formalize the key results, part of whose proof we have hinted above. Theorem 4. Suppose that an MC is irreducible, aperiodic and persistent (recurrent). Suppose further that, for each state i, the mean recurrence time is finite, i.e. ∞ X (k) E[Ti0 | X0 = i0] = kfii < ∞ k=0 EE 527, Detection and Estimation Theory, # 4c 10 implying that the entire chain is positive recurrent! Then, an invariant/stationary distribution for the MC exists, i.e. there P exist πj with πj > 0 and j πj = 1 such that X πj = πipij (equation of full balance) i or, in the matrix notation T T T π P = π , π = [π1, π2,...] . In words, if the chain is started with initial distribution {πj} (of the states), then, after one transition, it is in states 1, 2, 3,... with probabilities {π1, π2, π3,...}. Further, this distribution {πj} satisfies n 1 X (k) πj = lim p , ∀i n→∞ n + 1 ij k=0 and 1 1 π = = . j ∞ (k) P E[Tj | X0 = j] k=0 kfjj Comments: (A little) verification that, for stationary distributions, after one transition, the chain is in states 1, 2, 3,... with probabilities EE 527, Detection and Estimation Theory, # 4c 11 {π1, π2, π3,...}: X P {X1 = j} = P {X1 = j, X0 = i} i X = P {X1 = j | X0 = i} P {X0 = i} = πj! | {z } | {z } i pij πi Here is how we can compute: X E[g(X1) | X0 = i] = pij g(j) j where g(·) is a real function. Or, in matrix form (for a finite-state-space MC): E[g(X1) | X0 = 1] g(1) E[g(X1) | X0 = 2] g(2) . = P . . . . E[g(X1) | X0 = N] g(N) EE 527, Detection and Estimation Theory, # 4c 12 Converse to Theorem 4: Theorem 5. An irreducible, aperiodic MC for which there P exists {πj} with πj > 0 and πj = 1 such that (k) πj = lim pij , ∀i k→∞ must be persistent (recurrent) with 1 π = .