EE365: Markov Chains

EE365: Markov Chains Markov chains Transition Matrices Distribution Propagation Other Models 1 Markov chains 2 Markov chains I a model for dynamical systems with possibly uncertain transitions I very widely used, in many application areas I one of a handful of core effective mathematical and computational tools I often used to model systems that are not random; e.g., language 3 Stochastic dynamical system representation xt+1 = f(xt; wt) t = 0; 1;::: I x0; w0; w1;::: are independent random variables I state transitions are nondeterministic, uncertain I combine system xt+1 = f(xt; ut; wt) with policy ut = µt(xt) 4 Example: Inventory model with ordering policy xt+1 = [xt − dt + ut][0;C] ut = µ(xt) I xt 2 X = f0;:::;Cg is inventory level at time t I C > 0 is capacity I dt 2 f0; 1;:::g is demand that arrives just after time t I ut 2 f0; 1;:::g is new stock added to inventory I µ : X ! f0; 1;:::;Cg is ordering policy I [z][0;C] = min(max(z; 0);C) (z clipped to interval [0;C]) I assumption: d0; d1;::: are independent 5 Example: Inventory model with ordering policy xt+1 = [xt − dt + ut][0;C] ut = µ(xt) I capacity C = 6 I demand distribution pi = Prob(dt = i) p = 0:7 0:2 0:1 I policy: refill if xt ≤ 1 C − x if x ≤ 1 µ(x) = 0 otherwise I start with full inventory x0 = C 6 Simulation 6 5 4 xt 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 t I given initial state x0 I for t = 0;:::;T − 1 I simulate process noise wt := sample(p) I update the state xt+1 = f(xt; wt) 7 Simulation I called a particle simulation I gives a sample of the joint distribution of (x0; : : : ; xT ) I useful I to approximately evaluate probabilities and expectations via Monte Carlo simulation I to get a feel for what trajectories look like (e.g., for model validation) 8 Graph representation p0 p1 p2 p0 p1 p2 0 1 2 p0 3 p0 4 p0 5 p0 6 p0 p1 p1 p1 p1 p1 p2 p2 p2 p2 p2 I called the transition graph I each vertex (or node) corresponds to a state I edge i ! j is labeled with transition probability Prob(xt+1 = j j xt = i) 9 The tree 2 0.5 0.5 0.5 2 3 2 0.5 0.5 0.5 0.4 0.6 2 3 1 3 1 3 0.6 0.5 0.5 0.4 0.6 1 0.4 0.6 0.4 1 2 3 1 3 2 1 3 I each vertex of the tree corresponds to a path I e.g., Prob(x3 = 1 j x0 = 2) = 0:5 × 0:5 × 0:4 + 0:5 × 0:6 × 0:4 10 The birth-death chain 0.1 0.1 0.1 0.1 0.6 0.6 0.6 0.6 0.6 0.4 1 2 3 4 5 6 0.7 0.3 0.3 0.3 0.3 0.3 self-loops can be omitted, since they can be figured out from the outgoing probabilities 11 Example: Metastability 0.65 0.65 0.65 0.65 0.65 0.01 0.99 0.35 1 2 3 4 5 6 7 8 0.65 0.35 0.35 0.35 0.35 0.99 0.01 0.35 8 6 4 2 0 0 20 40 60 80 100 120 140 160 180 200 12 Transition Matrices 13 Transition matrix representation n×n we define the transition matrix P 2 R Pij = Prob(xt+1 = j j xt = i) I P 1 = 1 and P ≥ 0 elementwise I a matrix with these two properties is called a stochastic matrix I if P and Q are stochastic, then so is PQ 14 Transition matrix representation p0 p1 p2 p0 p1 p2 0 1 2 p0 3 p0 4 p0 5 p0 6 p0 p1 p1 p1 p1 p1 p2 p2 p2 p2 p2 transition matrix 2 0 0 0 0 0:1 0:2 0:7 3 6 0 0 0 0 0:1 0:2 0:7 7 6 7 60:1 0:2 0:7 0 0 0 0 7 6 7 P = 6 0 0:1 0:2 0:7 0 0 0 7 6 7 6 0 0 0:1 0:2 0:7 0 0 7 6 7 4 0 0 0 0:1 0:2 0:7 0 5 0 0 0 0 0:1 0:2 0:7 15 Particle simulation given the transition matrix Given 1×n I the distribution of initial states d 2 R n×n I the transition matrix P 2 R , with rows p1; : : : ; pn Algorithm: I choose initial state x0 := sample(d) I for t = 0;:::;T − 1 I find the distribution of the next state dnext := pxt I sample the next state xt+1 := sample(dnext) 16 Definition of a Markov chain sequence of random variables xt :Ω !X is a Markov chain if, for all s0; s1;::: and all t, Prob(xt+1 = st+1jxt = st; : : : ; x0 = s0) = Prob(xt+1 = st+1jxt = st) I called the Markov property I means that the system is memoryless I xt is called the state at time t; X is called the state space I if you know current state, then knowing past states doesn't give additional knowledge about next state (or any future state) 17 The joint distribution the joint distribution of x0; x1; : : : ; xt factorizes according to Prob x0 = a; x1 = b; x2 = c; : : : ; xt = q = daPabPbc ··· Ppq because the chain rule gives Prob(xt = st; xt−1 = st−1; : : : ; x0 = s0) = Prob(xt = st j xt−1 = st−1; : : : ; x0 = s0) Prob(xt−1 = st−1; : : : ; x0 = s0) abbreviating xt = st to just xt, we can repeatedly apply this to give Prob(xt; xt−1; : : : ; x0) = Prob(xt j xt−1; : : : ; x0) Prob(xt−1 j xt−2; : : : ; x0) ··· Prob(x1 j x0) Prob(x0) = Prob(xt j xt−1) Prob(xt−1 j xt−2) ··· Prob(x1 j x0) Prob(x0) 18 The joint distribution I the joint distribution completely specifies the process; for example X E f(x0; x1; x2; x3) = f(a; b; c; d) daPabPbcPcd a;b;c;d2X I in principle we can compute the probability of any event and the expected value of any function, but this requires a sum over nT terms I we can compute some expected values far more efficiently (more later) 19 Distribution Propagation 20 Distribution propagation πt+1 = πtP I here πt denotes distribution of xt, so (πt)i = Prob(xt = i) 2 I can compute marginals π1; : : : ; πT in T n operations I a useful type of simulation . 21 Example: Distribution propagation p0 p1 p2 p0 p1 p2 0 1 2 p0 3 p0 4 p0 5 p0 6 p0 p1 p1 p1 p1 p1 1 1 1 0.8 0.8 0.8 0.6 t = 0 p2 p20.6 t = 1 p2 p2 p0.62 t = 2 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 1 1 1 0.8 0.8 0.8 0.6 t = 3 0.6 t = 4 0.6 t = 5 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 1 1 1 0.8 0.8 0.8 0.6 t = 9 0.6 t = 10 0.6 t = 20 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 22 Example: Reordering I what is the probability we reorder at time t? I compute πt by distribution propagation, and use T Prob(xt 2 f0; 1g) = πt 1 1 0 0 0 0 0 0.09 0.08 0.07 0.06 E g(xt) 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 35 40 45 50 t 23 Evaluating expectation of separable function T +1 I suppose f : X ! R is separable, i.e., f(x0; : : : ; xT ) = f0(x0) + ··· + fT (xT ) where ft : X! R I then we have E f(x) = π0f0 + ··· + πT fT 2 I using distribution propagation, we can compute exactly with T n operations 24 Powers of the transition matrix as an example of the use of the joint distribution, let's show k (P )ij = Prob xt+k = j j xt = i this holds because Prob xt+k = st+k and xt = st Prob xt+k = st+k j xt = st = Prob xt = st X ds0 Ps0s1 Ps1s2 ··· Pst+k−1st+k s0;s1;:::;st−1;st+1;:::;st+k−1 = Prob xt = st X X ds0 Ps0s1 ··· Pst−1st Pstst+1 ··· Pst+k−1st+k s0;s1;:::;st−1 st+1;:::;st+k−1 = Prob xt = st 25 Example: k step transition probabilities two ways to compute Prob(xt = j j x0 = i) t I direct method: sum products of probabilities over n state sequences t I using matrix powers: Prob(xt = j j x0 = i) = (P )ij I we can compute this in 3 I O(n t) arithmetic operations (by multiplying matrices to get P ) 3 I O(n ) arithmetic operations (by first diagonalizing P ) 26 Other Models 27 Time-varying Markov chains I we may have a time-varying Markov chain, with one transition matrix for each time (Pt)ij = Prob(xt+1 = j j xt = i) I suppose Prob(xt = a) 6= 0 for all a 2 X and t then the factorization property that there exists stochastic matrices P0;P1;::: and a distribution d such that Prob x0 = a; x1 = b; x2 = c; : : : ; xt = q = da(P0)ab(P1)bc ··· (Pt−1)pq is equivalent to the Markov property I the theory of factorization properties of distributions is called Bayesian net- works or graphical models (with Markov chains the simplest case) 28 Dynamical system representation revisited xt+1 = f(xt; wt) If x0; w0; w1;::: are independent then x0; x1;::: is Markov 29 Dynamical system representation To see that x0; x1;::: is Markov, notice that Prob(xt+1 = st+1 j x0 = s0; : : : ; xt = st) = Prob f(st; wt) = st+1 j x0 = s0; : : : ; xt = st = Prob f(st; wt) = st+1 j φt(x0; w0; : : : ; wt−1) = (s0; : : : ; st) = Prob f(st; wt) = st+1 I (x0; : : : ; xt) = φt(x0; w0; : : : ; wt−1) follows from xt+1 = f(xt; wt) I last line holds since wt ?? (x0; w0; : : : ; wt−1) and x ?? y =) f(x) ?? g(y) x0; x1;::: is Markov because the above approach also gives Prob(xt+1 = st+1 j xt = st) = Prob f(st; wt) = st+1 30 State augmentation we often have models xt+1 = f(xt; wt) where w0; w1;::: is Markov I w0; w1;::: satisfy wt+1 = g(wt; vt) where v0; v1;::: are independent xt I construct an equivalent Markov chain, with state zt = wt I dynamics zt+1 = h(zt; vt) where f(z ; z ) h(z; v) = 1 2 g(z2; v) 31 Markov k xt+1 = f(xt; xt−1; : : : ; xt−k+1; wt) I called a Markov-k model I assume (x0; x1; : : : ; xk−1); w0; w1; w2;::: are independent I initial distribution on (x0; x1; : : : ; xk−1) I Markov-k property: for all s0; s1;::: , Prob(xt+1 = st+1jxt = st; : : : ; x0 = s0) = Prob(xt+1 = st+1jxt = st; xt−1 = st−1; : : : ; xt−k+1 = st−k+1) 32 Markov k xt+1 = f(xt; xt−1; : : : ; xt−k+1; wt) k k I equivalent Markov system, with state zt 2 Z = X ; notice that jZj = jX j 2 xt 3 6 xt−1 7 z = 6 7 t 6 .

EE365: Markov Chains

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support