Small Time Approximations for a Markov Process

Small Time Approximation for a Markov Process

Frank Massey

1. Background. A Markov process is a probabilistic model of a system that evolves with time. Markov processes are used to model such things as inventories, waiting lines, epidemics, population shifts and radioactivity. For background information on Markov processes see Ross [4, ch. 5] and Cinlar [1, ch. 8].

At any time the system can be in any of a certain collection of states. The states are denoted either by X1, …, Xn or simply by the integers 1, 2, …, n. As time passes the system makes transitions from one state to another. These transitions have a probabilistic aspect and many questions regarding the behavior of the system can be answered by means of the transition probabilities pij(t). For a particular i and j the transition probability pij(t) is the probability that the state is Xj at time t given that the state is Xi at time t = 0. For more on the pij(t) see Ross [4, section 6.4].

The transition probabilities are somewhat complicated functions of time. For modeling purposes it is easier to describe a Markov process by means of the transition rates qij. If i  j then the rate qij has the property that if the state is Xi at time t then the probability that the state is Xj at time t + h is approximately qijh for small h. One has qij = 0 if a direct transition from Xi to Xj is not possible. If the state is Xi at time t then the probability that the state is also Xi at time t + h is approximately 1 - qih for small h where qj = is the transition rate out of state i. For more on the qij see Ross [4, section 6.4]. Some examples of Markov processes that we shall use for illustration are the following.

Example 1 (Machine condition). (See Ross [4, pp. 364-366]) At any time a machine is

either busy or idle. Let the state X1 represent the machine being busy and X2 represent the

machine being idle. Suppose  represents the transition rate from X1 to X2 and  represents

the transition rate from X1 to X2, i.e. q12 = q1 =  and q21 = q2 = . This can be represented by the following diagram.

X1 (((( X2

For example, if  = 0.01 (transitions per machine per minute) and  = 0.05 (transitions per machine per minute) then this would correspond to the situation where if one were given a large number N of busy machines then approximately 0.01N would become idle in the next

1 minute and if one were given N idle machines then approximately 0.05N would become busy in the next minute.

Example 2 (Birth processes). This is a Markov process where the only transitions are from

each state Xj to state Xj+1. This situation can be described by qij = 0 if j  i + 1 and qi,i+1 = qi. Diagrammatically we have

 X1 ¾® X2 ¾® ¾® Xn

 An important example is a radioactive decay chain X1  X2   Xn where each Xj is a radioactive nucleus that decays into Xj+1 with decay rate qj.

2. Basic properties of a Markov process. The transition matrix

P = P(t) = {pij(t): 1  i, j  n} is the matrix whose elements are the transition probabilities pij(t). The pij(t) satisfy the Kolmogorov forward differential equations

(2.1) = - qjpij + for j = 1, 2, …, n; see Ross [4, p. 367]. This system can be written as

(2.2) = PQ

where Q is the generator matrix Q = {qij: 1  i, j  n} with qii = - qi; see Cinlar

[1, pp. 254-256]. Since pii(0) = 1 and pij(0) = 0 for j  i, one has P(0) = I.

The solution to the equation (2.2) along with P(0) = I is

(2.3) P = etQ = where the right hand equality is the definition of etQ; see Ross [4, pp. 388-389] and Cinlar [1, p. 255]. It follows from (2.3) that PQ = QP. Combining this with (2.2) gives

(2.4) = QP

Writing this out gives the Kolmogorov backward differential equations:

(2.5) = - qipij + for j = 1, 2, …, n,

To get a more explicit formula let the eigenvalues of Q be -1, -2, …-n and let v1, v2, …vn be the eigenvectors. If the r are distinct

2 etQ = SetDS -1 = where etD is the diagonal matrix with the exponentials e-rt on the diagonal, S is the matrix

-1 whose columns are the vr and Tr = vrur where the ur are the rows of S ; see Strang

[5, pp. 275-277]. If an eigenvalue -r is repeated then there may be a power of t

-rt tQ multiplying Tre in the formula for e ; see Coddington [2, pp. 76-77].

Some properties of the eigenvalues and eigenvectors of the generator matrix Q of a Markov process are well known. Let v be the column vector with n components all of which are one. It is not hard to see that Qv = 0. Therefore zero is an eigenvalue of Q with eigenvector v, so we can take 1 = 0 and v1 = v. Therefore

tQ (2.6) P(t) = e = T1 +

where the matrix T1 = vu1 has the property that all of its rows are equal to u1. It follows that

(2.7) pij = j +

th where cijr is the ij element of Tr and u1 = (1, …, n).

Example 1 (continued). In this example Q = . In that case 1 = 0, 2 = - ( + ) and S = . From this it is not hard to show that (2.6) becomes

etQ = + e-( + )t.

Therefore (2.7) becomes

-( + )t -( + )t p11 = + e p12 = - e (2.8) -( + )t -( + )t p21 = - e p22 = + e

Example 2 (continued). In this case the generator matrix is

(2.9) G =

Since G is upper triangular, the eigenvalues of G are its diagonal elements, i.e. zero and the

-qi. Then (2.1) becomes

= - q1p11

= - qjp1j + qj-1p1,j-1 for j = 2, 3, …, n,

3 -q1t -qjt  -q1t -q2t  -qjt So p11 = e . Also, p1j = e * qj-1p1,j-1. Therefore p1j = q1 qj-1 e * e * * * e .  -qit -qi+1t  -qjt Similarly, pij = qi qj-1 e * e * * * e if i ≤ j and pij = 0 if i > j. For the case i = 1 this can be written as p1n(t) = En(t)/qn where

(2.10) En(t) = with cr =

Gershgorin’s theorem says the eigenvalues of an nn matrix A = {aij: 1  i, j  n} all lie in one of the circles C1, …, Cn where Ci is the circle with center aii and radius ri = ; see Strang [5, p. 366]. When applied to the generator matrix Q of a Markov process this theorem says the eigenvalues of Q all lie in the circle with center -qmax and radius qmax., where qmax is the largest of the qi. In particular all the eigenvalues of Q have negative real part or are zero. Thus the numbers r have positive real part or are zero.

It is known that the pij(t) tend to limits ij as t ; see Ross [4, pp. 368 – 370] and Cinlar [1, pp. 261 - 265]. In the case that 0 is not a repeated eigenvalue this is evident from (2.7) and the fact that the eigenvalues of Q other than zero have negative real part. In this case ij = j depends only on j and not i. These limits ij are called the steady state probabilities and represent the long run probability the system is in state j given that it starts in state i.

Let t* be the largest of the values 1/Re[r] where -r varies over the non-zero eigenvalues of Q. It takes an amount of time on the order of t* for pij(t) to approach ij.

The behavior of pij(t) for times up to the order of t* is called the transient behavior. For short range questions the transient behavior of interest. As time goes for zero to infinity pij(t) goes from zero or one (depending on whether j  i or j = i) to ij. Other than that it is hard to easily determine much about the transient behavior of pij(t) from the formula

(2.7) without actually calculating the cijr and plotting pij(t). We are interested in approximations to the pij(t) that would make it easier to determine this behavior.

3. Graphs, matrix powers and matrix exponentials. In this section let

Q = {qij: 1  i, j  n} be an arbitrary square matrix, not necessarily one that is the generator matrix of a Markov process. Let P = P(t) = {pij(t): 1  i, j  n} be defined from

4 Q by (2.3) so that (2.1), (2.2), (2.4) and (2.5) hold. Also, if f(t) is a function of t, let f (r) (t) =  rf/ tr.

The weighted graph associated with Q has vertices 1, 2, …, n. There is an edge e = (i, j) = ij from i to j with weight we = qij if qij  0. If we have a Markov process the weighted graph associated with it is the weighted graph associated with its transition matrix Q. The vertices of the graph are the states of the Markov process. If i  j there is an edge from i to j if a direct transition from i to j is possible. There is an edge from i to i if a direct transition from i to some other state is possible.

A path  from i to j is a sequence  = {i = i0, i1, ..., im = j} of states such that there is an edge from each state in the sequence to the next. We write i → j if there is a path from i to j. Let m = m be the length of the path and   w = wioi1wi1i2 wim-1im = qioi1qi1i2 qim-1im be the product of the weights of the edges in the path. The path is simple if no vertex is repeated. If  = {j = im, im+1, ..., ir = k} is a path from j to k then  = {i = i0, i1, ..., im, im+1, ..., ir = k} is the concatenation of  and  which is a path from i to k. Note that m = m + m and w = ww . The distance, mij, from i to j is the length of a shortest path from i to j. Such a path must be simple. We put mii = 0 and mij = ∞ if there is no path from i to j. There is an edge from i to j if and only if mij =

1. It is not hard to see that mik  mij + mjk, and equality holds if and only if j is along a shortest path from i to k. If there is an edge from j to k, then mij  mik - 1 for any i.

Smij = {  :  is a path from i to j of length m)

with Smij =  if there is no path from i to j of length m. Let

wmij =

Wm = {wmij: 1  i, j  n}

with wmij = 0 if there is no path from i to j of length m. Note that

5 (3.1) Sm+1,ij = {k: Smik ( ( & there is an edge from k to j} {( (Smik} (k,j)

(3.2) wm+1,ij =

where  (k,j) represents the concatenation of  and the edge (k,j). Let Sij = Smijij be the set of shortest paths from i to j. If mij = 1 then Sij contains only the edge (i, j). If mij > 1 then

(3.3) Sij = {k: mik = mij - 1 and there is an edge from k to j} {( (Sik} (k,j)

m The following proposition relates the wmij to Q .

m m Proposition 1. Wm = Q , i.e. Q = {wmij: 1  i, j  n}.

Proof. It is clearly true for m = 1. Suppose it is true for m. Since Qm+1 = QmQ

m+1 Q ij = = =

= = wm+1,ij

where we have used the induction hypothesis and (3.2). //

Example 1 (continued). The edges of the associated graph are (1, 1), (1, 2), (2, 1) and (2, 2).

X1 (((( X2

A path from i to j has the form  = {i = i0, i1, ..., im = j} where ik is 1 or 2 for 1 ≤ k ≤ m-1.

Note that mij = 1 if i  j.

m-1 First consider the case i  j. In this case wσ = (-1) qi1qi2 qim-1. The reason for the factor of (-1)m-1 in front is as follows. All together there are m edges in the path . One of the edges is (i,j). After that each (j,i), if any, is another (i,j). Suppose there is a of these pairs. This leaves

m-2a-1 edges that are (i,i) or (j,j). These edges have a weight of –q1 or –q2. So the factor in front is (-1)m-2a-1 which is the same as (-1)m-1.

m Next consider the case i = j. Let k  i. In this case wσ = (-1) qi1qi2 qim-1. In this case the reason for the factor of (-1)m-1 in front is as follows. Again there are m edges in the path . After that each (i,k), if any, is another (k,i). Suppose there is a of these pairs. This leaves m-2a edges that are (i,i) or (j,j). These edges have a weight of –q1 or –q2. So the factor in front is (-1)m-2a which is the same as (-1)m.

6 In the case i = 1 and j = 2 one has

S1,12 = { {1, 2} }

S2,12 = { {1, 1, 2}, {1, 2, 2} }

S3,12 = { {1, 1, 1, 2}, {1, 1, 2, 2}, {1, 2, 1, 2}, {1, 2, 2, 2} }

S1+r,12 = { {1, i1, ..., i1+r = 2}: ik is 1 or 2 for 1 ≤ k ≤ r }

w1,12 = λ

w2,12 = - λ (λ + μ)

2 2 2 w3,12 = λ (λ + 2λμ + μ )= λ (λ + μ)

r r r w1+r,12 = (-1) λ = (-1) λ (λ + μ)

In the case i = j = 1 one has

S1,11 = { {1, 1} }

S2,11 = { {1, 1, 1}, {1, 2, 1} }

S3,11 = { {1, 1, 1, 1}, {1, 1, 1, 2}, {1, 2, 1, 1}, {1, 2, 2, 1} }

S1+r,11 = { {1, i1, ..., i1+r = 1}: ik is 1 or 2 for 1 ≤ k ≤ r }

w1,11 = - λ

w2,11 = λ (λ + μ)

2 2 2 w3,11 = - λ (λ + 2λμ + μ )= λ (λ + μ)

r-1 r-1 r w1+r,11 = (-1) λ = (-1) λ (λ + μ)

Example 2 (continued). The only edges are (k, k) and (k, k+1) for k = 1, 2, …, n-1.

 X1 ¾® X2 ¾® ¾® Xn

If i ≤ j then a path from i to j has the form  = {i = i0, i1, ..., im = j} where ik is either ik-1 or ik-1 + m-(j-i) 1 and im-1 ≤ n-1. If we let gk + 1 be the number of times k is repeated in  then wσ = (-1)  gi gj λi λj-1 λi λj . If i > j there is no path from i to j.

Thus mij = j – i if i ≤ j and mij = ∞ if i > j.

Suppose we fix i < j ≤ n-1 and let μ = mij = j-i. Then

Sμij = { {i, i+1, …, j-1, j} }

Sμ+1,ij = { {i, i+1, …, k-1, k, k, k+1, … , j-1, j}: i ≤ k ≤ j}

Sμ+r,ij = { {i = i0, i1, ..., iμ+r = j}: ik-1 ≤ ik ≤ ik-1 + 1}

So  wμij = qi qj-1   wμ+1,ij = - qi qj-1 (qi + + qj)

7 r  wμ+r,ij = (-1) qi qj-1

The following proposition relates the wmij to P(t).

Proposition 2. (i) P(r)(t) = P(t)Qr = QrP(t) and P(r)(0) = Qr.

(r) (ii) pij (t) = =

(r) (iii) pij (0) = wrij

(iv) Fix i and j and let  = mij. If j  i then

(r) (3.4) pij (0) = 0 for 0  r   – 1

() (3.5) pij (0) = wij

(3.6) pij(t) =

= + where

a = pik()wμ,kj

= wij + pik()[wμ,kj - wij] - pik()wij

= wμ,ikpkj() and

b = pik()wμ+1,kj

= wμ+1,ij + pik()[wμ+1,kj - wμ+1,ij] - pik()wμ+1,ij

= wμ+1,ikpkj() where  and  are between 0 and t and k is on some path from i to j.

(v) pij(t) =

8 Proof. (i) follows from (2.2), (2.4) and the fact that P(0) = I. (ii) and (iii) follow from (i) and Proposition 1. (3.4) and (3.5) follow from (iii) and the definition of mij. In order to prove (3.6) we use Taylor’s theorem which says

f(t) = + where  is between 0 and t. If we apply this with f(t) = pij(t) and r = -1 and r =  and use (3.4) and (3.5) we get

pij(t) =

= + where  and  are between 0 and t. Combining this with (ii) and the fact that gives (3.6). (v) follows from (2.3), (iii) and (iv). //

Example 1 (continued). If we apply Propostion 2 (iv) with i = 1 and j = 2 we have w1,12 = λ and w1,22 = - μ. So

a = w1,12 + p12()[w1,22 - w1,12] = λ - p12()(λ + μ) < λ

2 b = w2,12 + p12()[w2,22 – w2,12] = - λ(λ + μ) + p12()λ(λ + μ) > - λ(λ + μ)

2 λt - λ(λ + μ)t /2 < p12(t) < λt

4. Small t. One situation in which it easy to approximate the pij(t) is the case when t is small compared to the numbers 1/qij. We use the graph associated with the Markov process described in section 3.

Proposition 3. Fix i and j and let  = mij and let c- and c+ be such that

- (4.1) c- = max{ min{ w+1,kj : i → k}, - [ wμ+1,ik ] }

+ (4.2) c+ = min{ max{ w+1,kj : i → k }, [ wμ+1,ik ] }

+ - and c = max{| c-,|, | c+ |}. Here x = max{x, 0} and x = max{-x, 0}. If j  i then

() (4.3)  pij (t) - 

() (4.4) | pij (t) - | 

9 Remarks: (4.4) says with a relative error no more than , so the approximation is good for small t.

Proof. Note that Proposition 2 (iv) says

() pij (t) - = where

b = pik()wμ+1,kj = wμ+1,ikpkj()

If we use the fact that = 1 and 0 ≤ pik(t) ≤ 1 then

min{ wμ+1,kj : i → k } ≤ b  max{ wμ+1,kj : i → k }

- + - [ wμ+1,ik ] ≤ b ≤ [ wμ+1,ik ]

This show that the numbers c- and c+ bound b from below and above and proves (4.3). The inequality (4.4) follows from (4.3). //

References.

[1] Cinlar, Erhan. Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs, New Jersey, 1975. [2] Coddington, Earl A. and Norman Levinson. Theory of Ordinary Differential Equations. McGraw-Hill, New York, 1955. [3] Massey, F. J. and J. Prentis. "Approximations for Radioactive Decay Series." 2007. [4] Ross, Sheldon M. Introduction to Probability Models, eighth edition. Academic Press, San Diego, 2003. [5] Strang, Gilbert. Linear Algebra and Its Applications, third edition. Harcourt Brace Jovanovich, Orlando, 1988.