EE365: Markov Chains

EE365: Markov Chains Markov chains Transition Matrices Distribution Propagation Other Models 1 Markov chains 2 Markov chains I a model for dynamical systems with possibly uncertain transitions I very widely used, in many application areas I one of a handful of core effective mathematical and computational tools I often used to model systems that are not random; e.g., language 3 Stochastic dynamical system representation xt+1 = f(xt; wt) t = 0; 1;::: I x0; w0; w1;::: are independent random variables I state transitions are nondeterministic, uncertain I combine system xt+1 = f(xt; ut; wt) with policy ut = µt(xt) 4 Example: Inventory model with ordering policy xt+1 = [xt − dt + ut][0;C] ut = µ(xt) I xt 2 X = f0;:::;Cg is inventory level at time t I C > 0 is capacity I dt 2 f0; 1;:::g is demand that arrives just after time t I ut 2 f0; 1;:::g is new stock added to inventory I µ : X ! f0; 1;:::;Cg is ordering policy I [z][0;C] = min(max(z; 0);C) (z clipped to interval [0;C]) I assumption: d0; d1;::: are independent 5 Example: Inventory model with ordering policy xt+1 = [xt − dt + ut][0;C] ut = µ(xt) I capacity C = 6 I demand distribution pi = Prob(dt = i) p = 0:7 0:2 0:1 I policy: refill if xt ≤ 1 C − x if x ≤ 1 µ(x) = 0 otherwise I start with full inventory x0 = C 6 Simulation 6 5 4 xt 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 t I given initial state x0 I for t = 0;:::;T − 1 I simulate process noise wt := sample(p) I update the state xt+1 = f(xt; wt) 7 Simulation I called a particle simulation I gives a sample of the joint distribution of (x0; : : : ; xT ) I useful I to approximately evaluate probabilities and expectations via Monte Carlo simulation I to get a feel for what trajectories look like (e.g., for model validation) 8 Graph representation p0 p1 p2 p0 p1 p2 0 1 2 p0 3 p0 4 p0 5 p0 6 p0 p1 p1 p1 p1 p1 p2 p2 p2 p2 p2 I called the transition graph I each vertex (or node) corresponds to a state I edge i ! j is labeled with transition probability Prob(xt+1 = j j xt = i) 9 The tree 2 0.5 0.5 0.5 2 3 2 0.5 0.5 0.5 0.4 0.6 2 3 1 3 1 3 0.6 0.5 0.5 0.4 0.6 1 0.4 0.6 0.4 1 2 3 1 3 2 1 3 I each vertex of the tree corresponds to a path I e.g., Prob(x3 = 1 j x0 = 2) = 0:5 × 0:5 × 0:4 + 0:5 × 0:6 × 0:4 10 The birth-death chain 0.1 0.1 0.1 0.1 0.6 0.6 0.6 0.6 0.6 0.4 1 2 3 4 5 6 0.7 0.3 0.3 0.3 0.3 0.3 self-loops can be omitted, since they can be figured out from the outgoing probabilities 11 Example: Metastability 0.65 0.65 0.65 0.65 0.65 0.01 0.99 0.35 1 2 3 4 5 6 7 8 0.65 0.35 0.35 0.35 0.35 0.99 0.01 0.35 8 6 4 2 0 0 20 40 60 80 100 120 140 160 180 200 12 Transition Matrices 13 Transition matrix representation n×n we define the transition matrix P 2 R Pij = Prob(xt+1 = j j xt = i) I P 1 = 1 and P ≥ 0 elementwise I a matrix with these two properties is called a stochastic matrix I if P and Q are stochastic, then so is PQ 14 Transition matrix representation p0 p1 p2 p0 p1 p2 0 1 2 p0 3 p0 4 p0 5 p0 6 p0 p1 p1 p1 p1 p1 p2 p2 p2 p2 p2 transition matrix 2 0 0 0 0 0:1 0:2 0:7 3 6 0 0 0 0 0:1 0:2 0:7 7 6 7 60:1 0:2 0:7 0 0 0 0 7 6 7 P = 6 0 0:1 0:2 0:7 0 0 0 7 6 7 6 0 0 0:1 0:2 0:7 0 0 7 6 7 4 0 0 0 0:1 0:2 0:7 0 5 0 0 0 0 0:1 0:2 0:7 15 Particle simulation given the transition matrix Given 1×n I the distribution of initial states d 2 R n×n I the transition matrix P 2 R , with rows p1; : : : ; pn Algorithm: I choose initial state x0 := sample(d) I for t = 0;:::;T − 1 I find the distribution of the next state dnext := pxt I sample the next state xt+1 := sample(dnext) 16 Definition of a Markov chain sequence of random variables xt :Ω !X is a Markov chain if, for all s0; s1;::: and all t, Prob(xt+1 = st+1jxt = st; : : : ; x0 = s0) = Prob(xt+1 = st+1jxt = st) I called the Markov property I means that the system is memoryless I xt is called the state at time t; X is called the state space I if you know current state, then knowing past states doesn't give additional knowledge about next state (or any future state) 17 The joint distribution the joint distribution of x0; x1; : : : ; xt factorizes according to Prob x0 = a; x1 = b; x2 = c; : : : ; xt = q = daPabPbc ··· Ppq because the chain rule gives Prob(xt = st; xt−1 = st−1; : : : ; x0 = s0) = Prob(xt = st j xt−1 = st−1; : : : ; x0 = s0) Prob(xt−1 = st−1; : : : ; x0 = s0) abbreviating xt = st to just xt, we can repeatedly apply this to give Prob(xt; xt−1; : : : ; x0) = Prob(xt j xt−1; : : : ; x0) Prob(xt−1 j xt−2; : : : ; x0) ··· Prob(x1 j x0) Prob(x0) = Prob(xt j xt−1) Prob(xt−1 j xt−2) ··· Prob(x1 j x0) Prob(x0) 18 The joint distribution I the joint distribution completely specifies the process; for example X E f(x0; x1; x2; x3) = f(a; b; c; d) daPabPbcPcd a;b;c;d2X I in principle we can compute the probability of any event and the expected value of any function, but this requires a sum over nT terms I we can compute some expected values far more efficiently (more later) 19 Distribution Propagation 20 Distribution propagation πt+1 = πtP I here πt denotes distribution of xt, so (πt)i = Prob(xt = i) 2 I can compute marginals π1; : : : ; πT in T n operations I a useful type of simulation . 21 Example: Distribution propagation p0 p1 p2 p0 p1 p2 0 1 2 p0 3 p0 4 p0 5 p0 6 p0 p1 p1 p1 p1 p1 1 1 1 0.8 0.8 0.8 0.6 t = 0 p2 p20.6 t = 1 p2 p2 p0.62 t = 2 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 1 1 1 0.8 0.8 0.8 0.6 t = 3 0.6 t = 4 0.6 t = 5 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 1 1 1 0.8 0.8 0.8 0.6 t = 9 0.6 t = 10 0.6 t = 20 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 22 Example: Reordering I what is the probability we reorder at time t? I compute πt by distribution propagation, and use T Prob(xt 2 f0; 1g) = πt 1 1 0 0 0 0 0 0.09 0.08 0.07 0.06 E g(xt) 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 35 40 45 50 t 23 Evaluating expectation of separable function T +1 I suppose f : X ! R is separable, i.e., f(x0; : : : ; xT ) = f0(x0) + ··· + fT (xT ) where ft : X! R I then we have E f(x) = π0f0 + ··· + πT fT 2 I using distribution propagation, we can compute exactly with T n operations 24 Powers of the transition matrix as an example of the use of the joint distribution, let's show k (P )ij = Prob xt+k = j j xt = i this holds because Prob xt+k = st+k and xt = st Prob xt+k = st+k j xt = st = Prob xt = st X ds0 Ps0s1 Ps1s2 ··· Pst+k−1st+k s0;s1;:::;st−1;st+1;:::;st+k−1 = Prob xt = st X X ds0 Ps0s1 ··· Pst−1st Pstst+1 ··· Pst+k−1st+k s0;s1;:::;st−1 st+1;:::;st+k−1 = Prob xt = st 25 Example: k step transition probabilities two ways to compute Prob(xt = j j x0 = i) t I direct method: sum products of probabilities over n state sequences t I using matrix powers: Prob(xt = j j x0 = i) = (P )ij I we can compute this in 3 I O(n t) arithmetic operations (by multiplying matrices to get P ) 3 I O(n ) arithmetic operations (by first diagonalizing P ) 26 Other Models 27 Time-varying Markov chains I we may have a time-varying Markov chain, with one transition matrix for each time (Pt)ij = Prob(xt+1 = j j xt = i) I suppose Prob(xt = a) 6= 0 for all a 2 X and t then the factorization property that there exists stochastic matrices P0;P1;::: and a distribution d such that Prob x0 = a; x1 = b; x2 = c; : : : ; xt = q = da(P0)ab(P1)bc ··· (Pt−1)pq is equivalent to the Markov property I the theory of factorization properties of distributions is called Bayesian net- works or graphical models (with Markov chains the simplest case) 28 Dynamical system representation revisited xt+1 = f(xt; wt) If x0; w0; w1;::: are independent then x0; x1;::: is Markov 29 Dynamical system representation To see that x0; x1;::: is Markov, notice that Prob(xt+1 = st+1 j x0 = s0; : : : ; xt = st) = Prob f(st; wt) = st+1 j x0 = s0; : : : ; xt = st = Prob f(st; wt) = st+1 j φt(x0; w0; : : : ; wt−1) = (s0; : : : ; st) = Prob f(st; wt) = st+1 I (x0; : : : ; xt) = φt(x0; w0; : : : ; wt−1) follows from xt+1 = f(xt; wt) I last line holds since wt ?? (x0; w0; : : : ; wt−1) and x ?? y =) f(x) ?? g(y) x0; x1;::: is Markov because the above approach also gives Prob(xt+1 = st+1 j xt = st) = Prob f(st; wt) = st+1 30 State augmentation we often have models xt+1 = f(xt; wt) where w0; w1;::: is Markov I w0; w1;::: satisfy wt+1 = g(wt; vt) where v0; v1;::: are independent xt I construct an equivalent Markov chain, with state zt = wt I dynamics zt+1 = h(zt; vt) where f(z ; z ) h(z; v) = 1 2 g(z2; v) 31 Markov k xt+1 = f(xt; xt−1; : : : ; xt−k+1; wt) I called a Markov-k model I assume (x0; x1; : : : ; xk−1); w0; w1; w2;::: are independent I initial distribution on (x0; x1; : : : ; xk−1) I Markov-k property: for all s0; s1;::: , Prob(xt+1 = st+1jxt = st; : : : ; x0 = s0) = Prob(xt+1 = st+1jxt = st; xt−1 = st−1; : : : ; xt−k+1 = st−k+1) 32 Markov k xt+1 = f(xt; xt−1; : : : ; xt−k+1; wt) k k I equivalent Markov system, with state zt 2 Z = X ; notice that jZj = jX j 2 xt 3 6 xt−1 7 z = 6 7 t 6 .

Load more