Lecture 15: Markov Chains and Martingales

Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 15: Markov Chains and Martingales This material is not covered in the textbooks. These notes are still in development. 15.1 ⋆ Markov chains Most authors, e.g., Samuel Karlin [9, p. 27] or Joseph Doob [6, p. 170], define a Markov chain to be a discrete time stochastic process where the random variables are discrete, and where the following Markov property holds: ··· For every t1 < t2 < < tn < tn+1, the conditional distribution of Xtn+1 given Xtn ,...,Xt1 is the same as that given Xtn . That is, ( ) P X = x X = x ,X = x − ,...,X = x tn+1 n+1 tn n tn−1 n 1 t1 1 ( ) = P Xtn+1 = xn+1 Xtn = xn . That is, the future depends on the past only through the present, not the entire history. The value of a random variable is usually referred to as the state of the chain. There are many examples where the numerical magnitude of Xt is not really of interest, it is just an ID number for the state. For many purposes we simply number the states 1, 2, 3,... , and the value of Xt is interpreted as the number of the state. Here are some examples of Markov chains: • A deck of n cards is one of n! states, each state being an order of the deck. Shuffling is a random experiment that changes the state. Assign each order an ID number, and let X0 denote the original state, and Xt denote the state after t shuffles. Clearly this is a Markov chain, where the numerical magnitude of Xt is not of interest. If you are interested in the details of card shuffling, I highly recommend the paper byDave Bayer and Persi Diaconis [1] and its references. Among other things they argue that it takes at least 7 riffle shuffles to get an acceptable degree of randomness. • A random walk (on a lattice) is a Markov chain. • Let Xt denote the fortune (wealth) of a gambler after t $1 bets. If the bets are independent, then this is a Markov chain. • The branching process: Suppose an organism lives one period and produces a random number X progeny during that period, each of whom then reproduces the next period, etc. The population Xn after n generations is a Markov chain. • Queueing: Customers arrive for service each period according to a probability distribution, and are served in order, which takes one period. The state of the system is the length of the queue, which is a Markov chain. • In information theory, see, e.g., Thomas Cover and Joy Thomas [4, p. 34], the term Markov chain can refer to a sequence of just three random variables, X, Y , Z if the joint probability can be written as p(x y, z) = p(x y). 15–1 Ma 3/103 Winter 2017 KC Border Markov Chains and Martingales 15–2 15.2 ⋆ Markov Chains and Transition Matrices A Markov chain is time-invariant, or stationary, if the distribution of Xt+s Xt does not depend on t. (Some authors, e.g., Kemeny and Snell [10, Definition 2.1.3, p. 25] or Norris [13, p. 2], make time-invariance part of the definition of a Markov chain. Others, such as9 Karlin[ , pp. 19–20, 27], do not.) When Xt = i and Xt+1 = j , we say the that the chain makes a transition from state i to state j at time t + 1, and we often use the notation i → j to indicate a transition event. This may be a little confusing, since i → j does not indicate which t we are talking about. But for a time-invariant Markov chain, t doesn’t matter in that the probability of this event does not depend on t. In this case we can define the transition probability ( ) p(i, j) = P Xt+1 = j Xt = i , which is independent of t. We may also write pij or p(i → j) for p(i, j). For a time-invariant m-state Markov chain the m × m matrix P of transition probabilities [ ] [ ] P = p(i, j) = pij is called the transition matrix for the chain. (It’s even possible to consider infinite matrices, but we won’t do that here.) ∑ m For each row i of the transition matrix P , the row sum j=1 pij must be equal to one. A nonnegative square matrix with this property is called a stochastic matrix, and there is a vast literature on such matrices. 15.2.1 Two-step transition probabilities The transition matrix tells everything about the evolution of the m-state time-invariant Markov chain from its initial state X0. If pij is the probability of transitioning from state i to state j in one step, what is the probability of transitioning from i to j in exactly two steps? That is, what is ( ) (2) pij = P Xt+2 = j Xt = i ? By definition this is just ( ) P (Xt+2 = j & Xt = i) P Xt+2 = j Xt = i = . (1) P (Xt = i) The intermediate state Xt+1 must take on one of the values k = 1, . , m. So the event (Xt+2 = j & Xt = i) is the disjoint union m ∪ (Xt = i & Xt+1 = k & Xt+2 = j) . k=1 Thus we may write ∑ ( ) m k=1 P (Xt = i & Xt+1 = k & Xt+2 = j) ′ P Xt+2 = j Xt = i = . (1 ) P (Xt = i) By the multiplication rule (Section 4.6), for each k, P (X = i & X = k & X = j) t t+1 t+2 ( ) ( ) = P (Xt = i) P Xt+1 = k Xt = i P Xt+2 = j Xt+1 = k & Xt = i . (2) v. 2017.02.08::14.09 KC Border Ma 3/103 Winter 2017 KC Border Markov Chains and Martingales 15–3 By the Markov property ( ) ( ) P Xt+2 = j Xt+1 = k & Xt = i = P Xt+2 = j Xt+1 = k . (3) Combining (1′), (2), and (3) gives ∑m (2) pij = pikpkj, k=1 but this just the i, j entry of the matrix P 2. (n) Similarly, the probability pij of transitioning from i to j in n steps is the i, j entry of the matrix P n. That is, calculating the distribution of future states is just an exercise in matrix multiplication. ( ) n P Xt+n = j Xt = i is the (i, j) entry of the matrix P . This provides a powerful tool for studying the behavior of a Markov chain. I recommend ACM/EE 116. Introduction to Stochastic Processes and Modeling if you want to learn more about this, and CS/EE 147. Network Performance Analysis for applications. 15.3 ⋆ Markov chains and graphs From now on we will consider only time-invariant Markov chains. The nature of reachability can be visualized by considering the set states to be a directed graph where the set of nodes or vertexes is the set of states, and there is a directed edge from i to j if P (i → j) = pij > 0. An arrow from node i to node j is used to indicate that the transition i → j can occur (with nonzero probability) in one step. A loop at a node i indicates that the transition i → i (remaining in state i) has nonzero probability. The edges of the graph are labeled with the probability of the transition. The state j is reachable from i if there is a path in the graph from i to j. For instance, the transition matrix P of Example 15.4.1 corresponds to the graph in Fig- ure 15.1. 1 2 2 1 2 3 3 3 1 1 2 1 2 3 4 5 2 1 1 2 1 3 3 3 2 Figure 15.1. The graph of a 5-state Markov chain. KC Border v. 2017.02.08::14.09 Ma 3/103 Winter 2017 KC Border Markov Chains and Martingales 15–4 15.4 ⋆ Irreducible Markov chains (n) We say that state j is reachable from i if pij > 0 for some n. If states i and j are mutually reachable, then we say they communicate, denoted i ↔ j. The relation ↔ is an equivalence relation and partitions the states. When every state communicates with every other state, the chain is called irreducible or indecomposable. 15.4.1 Example Here is an example of an irreducible 5-state transition matrix. Its graph is given in Figure 15.1. 1 1 0 0 0 2 2 1 2 0 0 0 3 3 P = 1 2 0 3 0 3 0 2 1 0 0 3 0 3 1 1 0 0 0 2 2 And here are a few successive powers (n-step transitions) 5 1 1 0 0 7 23 1 2 0 12 4 3 24 72 6 9 1 7 4 23 1 5 4 0 0 0 6 18 9 108 12 9 27 P 2 = 1 2 2 P 3 = 1 5 5 1 9 0 3 0 9 18 18 0 9 9 2 11 1 2 5 1 31 0 9 0 18 6 27 0 9 12 108 1 1 5 1 1 31 7 0 0 3 4 12 0 9 6 72 24 0.0952381 0.142857 0.285715 0.285714 0.190476 0.0952380 0.142858 0.285713 0.285715 0.190476 100 P = 0.0952382 0.142857 0.285715 0.285713 0.190476 0.0952380 0.142858 0.285713 0.285715 0.190476 0.0952381 0.142857 0.285715 0.285714 0.190476 0.0952381 0.142857 0.285714 0.285714 0.190476 0.0952381 0.142857 0.285714 0.285714 0.190476 200 P = 0.0952381 0.142857 0.285714 0.285714 0.190476 0.0952381 0.142857 0.285714 0.285714 0.190476 0.0952381 0.142857 0.285714 0.285714 0.190476 This last matrix has the following interesting property: for any i, i′, j, we have (200) ≈ (200) pij pi′j .

Load more