Lecture 5: Random Walks and Markov Chain

Spectral Graph Theory and Applications WS 2011/2012 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun 1 Introduction to Markov Chains Definition 5.1. A sequence of random variables X0;X1;::: ⊆ Ω is a Markov chain with state space Ω if for all possible states xt+1; xt; : : : ; x1 2 Ω we have Pr [ Xt+1 = xt+1 j Xt = xt ^ Xt−1 = xt−1 ^ · · · ^ X0 = x0 ] = Pr [ Xt+1 = xt+1 j Xt = xt ] : The Markov chain is called time-homogenous if the latter probability is independent of t. In this course, we will only consider finite Markov chains which are also time-homogenous. In this case, it is convenient to store all information about the transition in a Markov chain in the so-called transition matrix P which is a jΩj × jΩj matrix defined by: Px;y := Pr [ X1 = y j X0 = x ] : Properties of the transition matrix P: • all entries in P are in [0; 1], • every row sum equals 1. 0 0 1=2 1=2 0 0 1 B1=3 0 1=3 1=3 0 C B C P = B1=4 1=4 0 1=4 1=4C B C @ 0 1=2 0 0 1=2A 0 0 0 1 0 1 4 2 3 5 Figure 1: Example of a Markov chain corresponding to a random walk on a graph G with 5 vertices. A very important special case is the Markov chain that corresponds to a random walk on an undirected, unweighted graph. Here, the random walk picks each step a neighbor chosen uniformly at random and moves to that neighbor. Hence, the transition matrix is P = D−1 · A, 1 where D is the diagonal matrix with Di;i = deg(i) . The next lemma shows how to compute the probability for multiple steps. 1 Lecture 5: Random Walks and Markov Chain 2 Lemma 5.2. For any two states x; y 2 Ω and any round t 2 N, t Pr [ Xt = y j X0 = x ] = Px;y; where P0 is the identity matrix. Proof. Let µ0 be the (row-)unit-vector which has 1 at component x and 0 otherwise (the starting distribution). Then let µt be the distribution of the random walk at step t. Clearly, 1 X 0 µy = 1 · Px;y = µz · Pz;y; z2Ω and therefore, µ1 = µ0 · P; and more generally, µt = µt−1 · P = µ0 · Pt: Let us now return to the above example and see how the powers of P look like. 0 0 1=2 1=2 0 0 1 B1=3 0 1=3 1=3 0 C B C P = B1=4 1=4 0 1=4 1=4C B C @ 0 1=2 0 0 1=2A 0 0 0 1 0 00:17 0:24 0:32 0:17 0:091 B0:16 0:25 0:34 0:16 0:08C 10 B C P = B0:16 0:26 0:35 0:16 0:08C B C @0:17 0:24 0:32 0:17 0:09A 0:18 0:24 0:31 0:18 0:09 01=6 1=4 1=3 1=6 1=121 B1=6 1=4 1=3 1=6 1=12C 1 B C P = B1=6 1=4 1=3 1=6 1=12C B C @1=6 1=4 1=3 1=6 1=12A 1=6 1=4 1=3 1=6 1=12 Another important special case of a Markov chain is the random walk on edge-weighted graphs. Here, every (undirected) edge fx; yg 2 E has a non-negative weight wfx;yg. The transition matrix P is defined as wfx;yg Pfx;yg = ; wx where X wx = wfx;zg: z2N(x) Definition 5.3. A distribution π is called stationary, if πP = π. A distribution π is (time)-reversible w.r.t. P, if it satisfies for all x; y 2 Ω, πxPx;y = πyPy;x: Lemma 5.4. Let π be a probability distribution that is time-reversible w.r.t. P. Then π is a stationary distribution for P. Lecture 5: Random Walks and Markov Chain 3 Proof. Summing over all states y 2 Ω yields X X πxPx;y = πyPy;x = πy: x2Ω x2Ω Hence, if π is time-reversible w.r.t. P, then once the distribution π is attained, the chain moves with the same frequency from x to y than from y to x. Random walks on graphs and random walks on edge-weighted graphs are always reversible. (A simple example for a non-reversible Markov chain is a Markov chain for which there are two states with Px;y > 0 but Px;y = 0.) Let us now compute the stationary distribution for three important examples of Markov chains: deg(x) • For a random walk on an (unweighted) graph, πx = 2jEj : X deg(y) X deg(y) 1 deg(x) π = · P = · = : x 2jEj y;x 2jEj deg(y) 2jEj y2N(x) y2N(x) An even more easier way to verify this is to use Lemma 5.4. • If the transition matrix P is symmetric, then π = (1=n; : : : ; 1=n) is uniform. • For a random walk with non-negative edge-weights wfu;vg; fu; vg 2 E, let X wG = wx: x2V Then the Markov chain is reversible with πx = wx=wG (and hence π is the stationary distribution): wx wfx;yg wfx;yg wfy;xg wy wfy;xg πxPx;y = · = = = · = πyPy;x: wG wx wG wG wG wy Lecture 5: Random Walks and Markov Chain 4 1=8 5=8 rainy sunny 1 1=4 5 rainy sunny 2=3 1=4 1=2 3=4 1=3 2 3 foggy cloudy 1=3 foggy cloudy 2 1 1=6 Figure 2: The Markov chain representing the weather in Saarbrücken. The transition matrix for the weather chain (the states are ordered R; F; C; S). 05=8 1=4 0 1=81 B2=3 0 1=3 0 C P = B C @ 0 1=6 1=3 1=2A 1=4 0 3=4 0 By using the above formula for edge-weighted random walks, we can compute the stationary distribution: π = (8=21; 3=21; 6=21; 4=21) (note that 21 is the sum of all weights of the vertices, where every edge weight is counted twice, but every loop weight is counted once) t The naive approach to compute P numerically involves ≈ log2(t) matrix multiplications (fastest algorithm O(n2:376)). Using the reversibility, we now show how to do it using only a constant number of matrix multiplications (assuming that we can efficiently compute the eigenvectors { for very small t it might be actually more efficient to compute Pt directly). t Moreover, the approach will give us a precise formula for any entry Px;y as a function of t. Since the matrix P is reversible w.r.t. to π, it follows that the matrix 1=2 −1=2 Nx;y := πx Px;yπy ; is symmetric. Let us rewrite this using diagonal matrices: 1=2 −1=2 N = Dπ PDπ : Hence the matrix N can be diagonalized, i.e., there is an orthogonal matrix R consisting of the eigenvalues of N so that RNR−1 = D, where D is the diagonal matrix with eigenvalues λ1 > λ2 > ··· > λn of N. Therefore, we can compute powers of the matrix P in the following way: t t −1=2 +1=2 P = Dπ NDπ t −1=2 −1 +1=2 = Dπ R DRDπ −1=2 −1 t +1=2 1 = Dπ R · D · Dπ R Definition 5.5. A Markov chain is called aperiodic if for all states x 2 Ω: t gcd( t 2 N: Px;x > 0 ) = 1: Lecture 5: Random Walks and Markov Chain 5 0 0 1 0 0 1 B 0 0 1 0 C P = B C @1=2 0 0 1=2A 0 1 0 0 0 0 0 1 0 1 2 B1=2 0 0 1=2C P = B C @ 0 1 0 0 A 0 0 1 0 01=2 0 0 1=21 3 B 0 1 0 0 C P = B C @ 0 0 1 0 A 1=2 0 0 1=2 0 0 1 0 0 1 4 1 B 0 0 1 0 C P = P = B C @1=2 0 0 1=2A 0 1 0 0 1 4 1=2 1=2 2 3 t Figure 3: Example of a Markov chain with period 3. The set ft 2 N: P1;1 > 0g contains only multiples of 3. Definition 5.6. A Markov chain is irreducible if for any two states x; y 2 Ω there exists an t integer t such that Px;y > 0. Intuitively, a Markov chain is called irreducible if it is connected viewed as a graph. The following Markov chain corresponds to a random walk on the (unconnected) graph which consists of two disjoint cycles of length 2. Hence the transition matrix P below is not irreducible. 00 1 0 01 B1 0 0 0C P = B C @0 0 0 1A 0 0 1 0 Note that P has two stationary distributions: π = (1=2; 1=2; 0; 0) and π = (0; 0; 1=2; 1=2). Theorem 5.7 (Fundamental Theorem for Markov Chains). If a Markov chain is aperiodic and ireducible, then it holds for all states x; y 2 Ω that: t • there exists an integer T = T (x; y) 2 N so that for all t > T , Px;y > 0.

Lecture 5: Random Walks and Markov Chain

Superprocesses and Mckean-Vlasov Equations with Creation of Mass

A Class of Measure-Valued Markov Chains and Bayesian Nonparametrics

Markov Process Duality

Lecture 19: Polymer Models: Persistence and Self-Avoidance

1 Introduction Branching Mechanism in a Superprocess from a Branching

8 Convergence of Loop-Erased Random Walk to SLE2 8.1 Comments on the Paper

1 Markov Chain Notation for a Continuous State Space

Simulation of Markov Chains

Chapter 8: Markov Chains

Markov Random Fields and Stochastic Image Models

Random Walk in Random Scenery (RWRS)

Particle Filtering Within Adaptive Metropolis Hastings Sampling