Lecture 5: Random Walks and Markov Chain

Spectral Graph Theory and Applications WS 2011/2012 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun 1 Introduction to Markov Chains Definition 5.1. A sequence of random variables X0;X1;::: ⊆ Ω is a Markov chain with state space Ω if for all possible states xt+1; xt; : : : ; x1 2 Ω we have Pr [ Xt+1 = xt+1 j Xt = xt ^ Xt−1 = xt−1 ^ · · · ^ X0 = x0 ] = Pr [ Xt+1 = xt+1 j Xt = xt ] : The Markov chain is called time-homogenous if the latter probability is independent of t. In this course, we will only consider finite Markov chains which are also time-homogenous. In this case, it is convenient to store all information about the transition in a Markov chain in the so-called transition matrix P which is a jΩj × jΩj matrix defined by: Px;y := Pr [ X1 = y j X0 = x ] : Properties of the transition matrix P: • all entries in P are in [0; 1], • every row sum equals 1. 0 0 1=2 1=2 0 0 1 B1=3 0 1=3 1=3 0 C B C P = B1=4 1=4 0 1=4 1=4C B C @ 0 1=2 0 0 1=2A 0 0 0 1 0 1 4 2 3 5 Figure 1: Example of a Markov chain corresponding to a random walk on a graph G with 5 vertices. A very important special case is the Markov chain that corresponds to a random walk on an undirected, unweighted graph. Here, the random walk picks each step a neighbor chosen uniformly at random and moves to that neighbor. Hence, the transition matrix is P = D−1 · A, 1 where D is the diagonal matrix with Di;i = deg(i) . The next lemma shows how to compute the probability for multiple steps. 1 Lecture 5: Random Walks and Markov Chain 2 Lemma 5.2. For any two states x; y 2 Ω and any round t 2 N, t Pr [ Xt = y j X0 = x ] = Px;y; where P0 is the identity matrix. Proof. Let µ0 be the (row-)unit-vector which has 1 at component x and 0 otherwise (the starting distribution). Then let µt be the distribution of the random walk at step t. Clearly, 1 X 0 µy = 1 · Px;y = µz · Pz;y; z2Ω and therefore, µ1 = µ0 · P; and more generally, µt = µt−1 · P = µ0 · Pt: Let us now return to the above example and see how the powers of P look like. 0 0 1=2 1=2 0 0 1 B1=3 0 1=3 1=3 0 C B C P = B1=4 1=4 0 1=4 1=4C B C @ 0 1=2 0 0 1=2A 0 0 0 1 0 00:17 0:24 0:32 0:17 0:091 B0:16 0:25 0:34 0:16 0:08C 10 B C P = B0:16 0:26 0:35 0:16 0:08C B C @0:17 0:24 0:32 0:17 0:09A 0:18 0:24 0:31 0:18 0:09 01=6 1=4 1=3 1=6 1=121 B1=6 1=4 1=3 1=6 1=12C 1 B C P = B1=6 1=4 1=3 1=6 1=12C B C @1=6 1=4 1=3 1=6 1=12A 1=6 1=4 1=3 1=6 1=12 Another important special case of a Markov chain is the random walk on edge-weighted graphs. Here, every (undirected) edge fx; yg 2 E has a non-negative weight wfx;yg. The transition matrix P is defined as wfx;yg Pfx;yg = ; wx where X wx = wfx;zg: z2N(x) Definition 5.3. A distribution π is called stationary, if πP = π. A distribution π is (time)-reversible w.r.t. P, if it satisfies for all x; y 2 Ω, πxPx;y = πyPy;x: Lemma 5.4. Let π be a probability distribution that is time-reversible w.r.t. P. Then π is a stationary distribution for P. Lecture 5: Random Walks and Markov Chain 3 Proof. Summing over all states y 2 Ω yields X X πxPx;y = πyPy;x = πy: x2Ω x2Ω Hence, if π is time-reversible w.r.t. P, then once the distribution π is attained, the chain moves with the same frequency from x to y than from y to x. Random walks on graphs and random walks on edge-weighted graphs are always reversible. (A simple example for a non-reversible Markov chain is a Markov chain for which there are two states with Px;y > 0 but Px;y = 0.) Let us now compute the stationary distribution for three important examples of Markov chains: deg(x) • For a random walk on an (unweighted) graph, πx = 2jEj : X deg(y) X deg(y) 1 deg(x) π = · P = · = : x 2jEj y;x 2jEj deg(y) 2jEj y2N(x) y2N(x) An even more easier way to verify this is to use Lemma 5.4. • If the transition matrix P is symmetric, then π = (1=n; : : : ; 1=n) is uniform. • For a random walk with non-negative edge-weights wfu;vg; fu; vg 2 E, let X wG = wx: x2V Then the Markov chain is reversible with πx = wx=wG (and hence π is the stationary distribution): wx wfx;yg wfx;yg wfy;xg wy wfy;xg πxPx;y = · = = = · = πyPy;x: wG wx wG wG wG wy Lecture 5: Random Walks and Markov Chain 4 1=8 5=8 rainy sunny 1 1=4 5 rainy sunny 2=3 1=4 1=2 3=4 1=3 2 3 foggy cloudy 1=3 foggy cloudy 2 1 1=6 Figure 2: The Markov chain representing the weather in Saarbrücken. The transition matrix for the weather chain (the states are ordered R; F; C; S). 05=8 1=4 0 1=81 B2=3 0 1=3 0 C P = B C @ 0 1=6 1=3 1=2A 1=4 0 3=4 0 By using the above formula for edge-weighted random walks, we can compute the stationary distribution: π = (8=21; 3=21; 6=21; 4=21) (note that 21 is the sum of all weights of the vertices, where every edge weight is counted twice, but every loop weight is counted once) t The naive approach to compute P numerically involves ≈ log2(t) matrix multiplications (fastest algorithm O(n2:376)). Using the reversibility, we now show how to do it using only a constant number of matrix multiplications (assuming that we can efficiently compute the eigenvectors { for very small t it might be actually more efficient to compute Pt directly). t Moreover, the approach will give us a precise formula for any entry Px;y as a function of t. Since the matrix P is reversible w.r.t. to π, it follows that the matrix 1=2 −1=2 Nx;y := πx Px;yπy ; is symmetric. Let us rewrite this using diagonal matrices: 1=2 −1=2 N = Dπ PDπ : Hence the matrix N can be diagonalized, i.e., there is an orthogonal matrix R consisting of the eigenvalues of N so that RNR−1 = D, where D is the diagonal matrix with eigenvalues λ1 > λ2 > ··· > λn of N. Therefore, we can compute powers of the matrix P in the following way: t t −1=2 +1=2 P = Dπ NDπ t −1=2 −1 +1=2 = Dπ R DRDπ −1=2 −1 t +1=2 1 = Dπ R · D · Dπ R Definition 5.5. A Markov chain is called aperiodic if for all states x 2 Ω: t gcd( t 2 N: Px;x > 0 ) = 1: Lecture 5: Random Walks and Markov Chain 5 0 0 1 0 0 1 B 0 0 1 0 C P = B C @1=2 0 0 1=2A 0 1 0 0 0 0 0 1 0 1 2 B1=2 0 0 1=2C P = B C @ 0 1 0 0 A 0 0 1 0 01=2 0 0 1=21 3 B 0 1 0 0 C P = B C @ 0 0 1 0 A 1=2 0 0 1=2 0 0 1 0 0 1 4 1 B 0 0 1 0 C P = P = B C @1=2 0 0 1=2A 0 1 0 0 1 4 1=2 1=2 2 3 t Figure 3: Example of a Markov chain with period 3. The set ft 2 N: P1;1 > 0g contains only multiples of 3. Definition 5.6. A Markov chain is irreducible if for any two states x; y 2 Ω there exists an t integer t such that Px;y > 0. Intuitively, a Markov chain is called irreducible if it is connected viewed as a graph. The following Markov chain corresponds to a random walk on the (unconnected) graph which consists of two disjoint cycles of length 2. Hence the transition matrix P below is not irreducible. 00 1 0 01 B1 0 0 0C P = B C @0 0 0 1A 0 0 1 0 Note that P has two stationary distributions: π = (1=2; 1=2; 0; 0) and π = (0; 0; 1=2; 1=2). Theorem 5.7 (Fundamental Theorem for Markov Chains). If a Markov chain is aperiodic and ireducible, then it holds for all states x; y 2 Ω that: t • there exists an integer T = T (x; y) 2 N so that for all t > T , Px;y > 0.

Lecture 5: Random Walks and Markov Chain

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support