<<

MAS programmes: Stochastic Models and Forecasting

1 Distribution theory and the Poisson process

1.1 Lifetime distributions Let the (.v.) T represent a lifetime, which might, for example, be the length of life of some animal or the to failure of some machine component. It follows that T is necessarily a non-negative r.v., that is, a r.v. that can take only non-negative values. The distribution of such a r.v. T may be referred to as its lifetime distribution. Assume further that T is a continuous r.v. with p.d.f. f. Because T is a non-negative r.v., we shall need to specify f(t) for t ≥ 0 only, since f(t) = 0 for t < 0. In general, the (cumulative) distribution function (d.f.) F of a continuous r.v. T with p.d.f. f is defined by Z t F (t) = P(T ≤ t) = f(u)du t ∈ R. −∞ When T represents a lifetime, and hence is positive,   0 t < 0 F (t) = .  R t 0 f(u)du t ≥ 0 The survivor function Q(t) of the lifetime distribution is defined by Z ∞ Q(t) = P(T > t) = f(u)du t ≥ 0. t Thus Q(t) is the probability that an individual survives beyond age t. Note that Q(0) = 1, Q(t) = 1 − F (t) t ≥ 0 and f(t) = F 0(t) = −Q0(t) t ≥ 0. (1)

Consider an individual who has survived to age t. What is the probability that he dies in the next interval of time of length δt? P(t < T < t + δt, T > t) P(t < T < t + δt|T > t) = P(T > t) P(t < T < t + δt) = P(T > t) f(t)δt + o(δt) = . Q(t)

1 The hazard function r(t) (alternatively the mortality rate, force of mortality, age-specific death rate, age-specific failure rate or failure rate function, h(t) or λ(t)) is the death rate at age t. More precisely, we define r(t) by

P(t < T < t + δt|T > t) r(t) = lim t ≥ 0. δt→0 δt Thus f(t) r(t) = t ≥ 0. (2) Q(t) From Equations (1) and (2),

−Q0(t) d r(t) = = − ln Q(t) t ≥ 0, (3) Q(t) dt which gives the hazard function in terms of the survivor function. Integrating Equation (3) and using the condition that Q(0) = 1,

µ Z t ¶ Q(t) = exp − r(u)du t ≥ 0, (4) 0 which gives the survivor function in terms of the hazard function.

Example: the exponential distributions. If T has an with pa- rameter λ then f(t) = λe−λt t > 0, Z ∞ Q(t) = λe−λudu = e−λt t > 0, t and λe−λt r(t) = = λ t > 0. e−λt The exponential distributions are the only distributions that have a constant hazard function. This fact expresses the “memoryless property” of the exponential distributions. No matter what the age of an item currently in use, the probability that it fails in the next interval of time of length δt is the same, λδt + o(δt).

• Exponential distributions may be used to model, at least approximately, lengths of tele- phone calls and service in some queueing systems. However, they would seem to be inappropriate to model human lifetime distributions (as far as we can tell in the current state of affairs)!

The memoryless property of exponential distributions is also expressed by the fact that if T has an exponential distribution then

P(T > t + s|T > t) = P(T > s) ≡ Q(s) s, t > 0. (5)

2 Equivalently, if the age of the item currently in use is t then the probability that it survives a further length of time at least s is Q(s), which does not depend upon t. The proof of Equation (5) is as follows. P(T > t + s, T > t) P(T > t + s|T > t) = P(T > t) P(T > t + s) Q(t + s) = = P(T > t) Q(t) e−λ(t+s) = = e−λs = Q(s). e−λt

Expectation The of the lifetime distribution, the expected length of the lifetime T , is given by

Z ∞ · Z x ¸ µ ≡ E(T ) = tf(t)dt = lim tf(t)dt . 0 x→∞ 0 It will always be the case that µ > 0. It will usually be the case that µ is finite, but it is possible that µ = +∞. We now find an alternative expression for µ that often turns out to be useful in applications that involve lifetimes.

Lemma 1.1.1 (weighted tail behaviour of distribution) If µ is finite then xQ(x) → 0 as x → ∞.

Proof. Z ∞ Z ∞ 0 ≤ xQ(x) = xf(t)dt ≤ tf(t)dt → 0 x x as x → ∞, since µ is finite. ¤

Theorem 1.1.2 (mean in terms of survivor function) Whether or not µ is finite, Z ∞ µ = Q(t)dt. 0 Proof. Integrating by parts,

Z x Z x x Q(t)dt = [tQ(t)]0 + tf(t)dt 0 0 Z x = xQ(x) + tf(t)dt. 0 Letting x → ∞, and using the result of Lemma 1.1.1 if µ is finite, and interchanging the order of integration on the L.H.S. otherwise, we obtain the result of Theorem 1.1.2. ¤

3 Definition 1.1.3 (Maxima and minima of independent lifetimes) Let T1,T2,...,Tn be independently distributed lifetimes, with Fi the distribution function and Qi the survivor function of Ti, i = 1, 2, . . . , n. Define the r.v.s V and W by

V = max{T1,T2,...,Tn} and W = min{T1,T2,...,Tn}.

• If, in some system, components with lifetimes T1,T2,...,Tn are placed in parallel, so that the system functions as long as at least one of the components is still functioning, then what is critical for the functioning of the system is the maximum lifetime V . If the components are placed in series, so that the system fails as soon as one of the components fails, then what is critical for the functioning of the system is the minimum lifetime W .

Now

P(V ≤ v) = P(T1 ≤ v, T2 ≤ v, . . . , Tn ≤ v)

= P(T1 ≤ v)P(T2 ≤ v) ... P(Tn ≤ v), since T1,T2,...,Tn are independently distributed. Thus the distribution function FV of V is given by Yn FV (v) = Fi(v) v ≥ 0. i=1 Similarly,

P(W > w) = P(T1 > w, T2 > w, . . . , Tn > w)

= P(T1 > w)P(T2 > w) ... P(Tn > w).

Thus the survivor function QW of W is given by Yn QW (w) = Qi(w) w ≥ 0. i=1

Equivalently, the distribution function FW of W is given by Yn FW (w) = 1 − (1 − Fi(w)) w ≥ 0. i=1

Example: the exponential distributions. If T has an exponential distribution with pa- rameter λ then Z Z ∞ ∞ 1 E(T ) = Q(t)dt = e−λtdt = . 0 0 λ

4 Let T1,T2,...,Tn be independently distributed lifetimes such that Ti has an exponential dis- tribution with λi, i = 1, . . . , n, and let

W = min{T1,T2,...,Tn}.

The survivor function QW of W is given by à ! Yn Xn QW (w) = exp(−λiw) = exp − λiw w > 0. i=1 i=1

Pn Thus W has an exponential distribution with parameter i=1 λi. If there are n identical components, functioning simultaneously and each having an expo- nential lifetime distribution with parameter λ, the length of time until the first failure of one of the components has an exponential distribution with parameter nλ.

1.2 Counting variables and probability generating functions Let N be a non-negative discrete random variable. The of such a r.v. is specified by its probability mass function, the sequence (pn : n ≥ 0), where

pn = P(N = n) n ≥ 0.

Such a r.v. N commonly represents a count, e.g., the number of accidents that have occurred in a given period of time, the number of individuals in some population at a given point of time, or the number of customers who have joined a queue by a given time.

1.2.1 Probability generating functions — a review of their properties

The probability (p.g.f.) G(z) of N or of (pn) is defined by X∞ n G(z) = pnz . n=0

Equivalently, ¡ ¢ G(z) = E zN . P∞ Because n=0 pn = 1, the p.g.f. is always defined for at least |z| < 1, and G(1) = 1. Assuming that µ ≡ E(N) and σ2 ≡ var(N) are both finite, ¯ ¯ dG(z)¯ µ = ¯ dz z=1 and ¯ d2G(z)¯ σ2 = ¯ + µ − µ2. 2 ¯ dz z=1

5 If N1,N2,...,Nr are independently distributed r.v.s with p.g.f.s G1,G2,...,Gr, respec- tively, and Sr = N1 + N2 + ... + Nr then the p.g.f. GSr of Sr is given by Yr

GSr (z) = Gi(z). i=1

1.2.2 Examples of p.g.f.s The Poisson distribution with parameter µ, so that µn p = e−µ n ≥ 0, n n! has G(z) = eµ(z−1). The with K and p, so that µ ¶ K p = pnqK−n n = 0, 1,...,K, n n where q = 1 − p, has G(z) = (pz + q)K . The shifted with parameter p, so that

n pn = q p n ≥ 0, where q = 1 − p, has p G(z) = . 1 − qz

1.2.3 Expectation -revisited. The mean, the expectation, of the r.v. N is given by " # X∞ Xr µ ≡ E(N) = npn = lim npn . r→∞ n=1 n=1 Note that 0 ≤ µ ≤ +∞. We now find an alternative expression for µ that is the analogue for positive discrete r.v.s of the expression that we found earlier for continuous lifetime distribu- tions.

6 Lemma 1.2.1 If µ is finite then rPr(N ≥ r) → 0 as r → ∞.

Proof. X∞ X∞ 0 ≤ rP(N ≥ r) = rpn ≤ npn → 0 n=r n=r as r → ∞, since µ is finite.

Theorem 1.2.2 Whether or not µ is finite, X∞ µ = P(N ≥ n). n=1

Proof. Xr Xr npn = n[P(N ≥ n) − P(N ≥ n + 1)] n=1 n=1 Xr Xr+1 = nP(N ≥ n) − (n − 1)P(N ≥ n) n=1 n=2 Xr = P(N ≥ n) − rP(N ≥ r + 1). n=1 Letting r → ∞, and using the result of Lemma 1.2.1 if µ is finite, and interchanging the order of summation on the L.H.S. otherwise, we obtain the result of Theorem 1.2.2.

1.3 The Poisson process In this and later sections we shall be considering families of r.v.s {N(t): t ≥ 0} which are known as stochastic processes (or random processes) in continuous time. The r.v. N(t) will typically represent the size of a population at time t, or the number of individuals in a queue at time t, or the number of events of a certain type that have occurred by time t. Thus, for each t ≥ 0, N(t) is a non-negative discrete r.v. — a r.v. of the kind discussed in Section 1.2. The continuous time variable t will take non-negative real values. In the present section, let N(t) represent the number of arrivals (or events) that occur during the time interval (0, t], where the arrivals are assumed to occur completely at random.

• For example, we might have in mind the modelling of the arrivals of customers to join a queue or the modelling of the occurrences of accidents of a certain kind in some industrial environment. How well such processes may be modelled by a Poisson process, as we are about to define it, is open to discussion. If a Geiger counter is placed in a location where radioactivity remains at a steady level, the radioactive particles hitting the counter and being recorded form a Poisson process.

7 The state for the process {N(t): t ≥ 0} is the set of all values that each of the r.v.s N(t) can take. In this case it is the set of all non-negative integers. Define N(0) = 0 and note that, for h > 0, N(t + h) − N(t) represents the number of arrivals in the time interval (t, t + h]. A Poisson process with rate (or intensity) λ (where λ > 0) may be defined by the following postulates:

1. The numbers of arrivals that occur in disjoint intervals of time are independently dis- tributed (the property of “independent increments”).

2. The distribution of the number of arrivals in the time interval (t, t + h] depends only on h and not on t (the property of “time stationarity”).

3. As h → 0, P(N(t + h) − N(t) = 1) = λh + o(h).

4. As h → 0, P(N(t + h) − N(t) ≥ 2) = o(h).

Postulates 3 and 4 together imply that, as h → 0,

P(N(t + h) − N(t) = 0) = 1 − λh + o(h).

Let pn(t) = P(N(t) = n) n ≥ 0. We have the initial conditions that

p0(0) = 1,

pn(0) = 0 n ≥ 1.

The forward equations are obtained by splitting up the time interval (0, t+h] into the intervals (0, t] and (t, t + h]. Firstly,

p0(t + h) = p0(t)[1 − λh + o(h)].

Rearranging this equation and letting h → 0, we obtain

dp (t) 0 = −λp (t) n = 0. (6) dt 0 For n ≥ 1, pn(t + h) = pn(t)[1 − λh + o(h)] + pn−1(t)[λh + o(h)] + o(h). Letting h → 0, dp (t) n = −λp (t) + λp (t) n ≥ 1. (7) dt n n−1

8 There are a number of different approaches to solving the set of differential difference equations (6) and (7). We shall adopt an approach based on the use of p.g.f.s: define G(z, t) to be the p.g.f. of N(t), X∞ n G(z, t) = pn(t)z . n=0 Multiplying the equations (6) and (7) by zn and summing over n,

X∞ dp (t) X∞ X∞ n zn = −λ p (t)zn + λ p (t)zn, dt n n−1 n=0 n=0 n=1 i.e., ∂G = −λ(1 − z)G. (8) ∂t The solution of Equation (8) for any given z is given by Z Z dG = − λ(1 − z)dt. G Integrating from 0 to t, ln G(z, t) − ln G(z, 0) = −λt(1 − z), so that G(z, t) = G(z, 0)e−λt(1−z). But the the initial conditions imply G(z, 0) = 1. Hence

G(z, t) = e−λt(1−z).

N(t) 6 3

2

1

0 - t S1 S2 S3  - - - T1 T2 T3

Thus N(t) has the Poisson distribution with parameter λt.

9 • This result provides one explanation for the common occurrence of the Poisson distribu- tion in the statistical analysis of data that represent counting variables. • If the arrival rate is λ, it makes intuitive sense that the expected number of arrivals in the time interval (0, t] is λt.

Define the sequence of r.v.s (Sn) by

Sn = inf{t ≥ 0 : N(t) = n} n ≥ 0.

Trivially, S0 = 0; but, for n ≥ 1, Sn represents the time at which the n-th arrival occurs. Define the sequence of r.v.s (Tn) by

Tn = Sn − Sn−1 n ≥ 1.

The r.v. T1 ≡ S1 represents the length of time to the first arrival; for n ≥ 2 the r.v. Tn represents the length of time between the (n − 1)-th and the n-th arrival. The Tn are positive continuous r.v.s of the type studied in Section 1.1. Note that

Sn = T1 + T2 + ... + Tn n ≥ 1. Note also the equivalence that

{Sn > t} = {N(t) ≤ n − 1} n ≥ 1. In particular, {T1 > t} = {N(t) = 0}.

It follows that the survivor function Q(t) for T1 is given by

−λt Q(t) = P(T1 > t) = P(N(t) = 0) = e t ≥ 0, the zero term of the Poisson distribution with parameter λt. Hence T1 has the exponential distribution with parameter λ. By the independent increments and time stationarity properties of the Poisson process, given any time point t, the length of time until the next arrival has an exponential distribution with parameter λ. Furthermore, it follows that the sequence (Tn) is a sequence of i.i.d. r.v.s, each having an exponential distribution with parameter λ. This property may be taken as an alternative characterization of a Poisson process.

Definition 1.3.1 (Poisson process) A Poisson process with rate λ is a process of arrivals in which inter-arrival times are inde- pendently and identically distributed, each having an exponential distribution with parameter λ.

The r.v. Sn, a sum of n i.i.d. exponentially distributed r.v.s, has a with p.d.f. λntn−1e−λt t ≥ 0. (n − 1)!

10 Appendix A The Definition The gamma function, denoted by Γ(·), is given by

Z ∞ Γ(α) = tα−1e−tdt 0 for α > 0.

Proposition (Generalization of function to non-integer values)

Γ(α + 1) = αΓ(α) α > 0. Proof Use integration by parts. R R Γ(α) = ∞ tα−1e−tdt = tα e−t|∞ − ∞ tα (−e−t)dt 0 R α 0 0 α 1 ∞ (α+1)−1 −t 1 = 0 + α 0 t e dt = α Γ(α + 1) ⇒ Γ(α + 1) = αΓ(α).

(Note that tαe−t −→ 0 as t −→ ∞ for α > 0). ¤

Lemma (More useful results) (i) Γ(1) = 1. (ii) Γ(n + 1) = n! for n ∈ Z+. 1 √ (iii) Γ( 2 ) = π. Proof R ∞ −t (i) Γ(1) = 0 e dt = 1. (ii) Set α = n in earlier proposition. Then

Γ(n + 1) = nΓ(n) = n(n − 1)Γ(n − 1) = ... = n!Γ(1) = n!

(iii) µ ¶ Z ∞ 1 − 1 −t Γ = t 2 e dt. 2 0 √ Put x = 2t. Then

¡ ¢ R √ 2 √ R 2 √ R 2 1 ∞ 2 x ∞ x 2 ∞ x − 2 − 2 − 2 Γ 2 = 0 x e xdx = 2 0 e dx = 2 −∞ e dx √ √ R 2 2 ∞ 1 − x √ √ = 2π × √ e 2 dx = π × 1 = π. 2 −∞ 2π | {z } p.d.f. of N(0,1) ¤

11 Appendix B Some Standard Discrete Distributions

Distribution Prob. Mass Func. Mean p.g.f. PX(k) E[X] var(X) GX(z)

Bernoulli(θ) θk(1 − θ)1−k θ θ(1 − θ) (1 − θ) + θz θ ∈ (0, 1) k = 0 or 1 ¡ ¢ n k n−k n Binomial(n,θ) k θ (1 − θ) nθ nθ(1 − θ) [(1 − θ) + θz] + n ∈ Z , θ ∈ (0, 1) k = 0, 1, . . . , n

−λ λk λ(z−1) Poisson(λ) e k! λ λ e λ > 0 k = 0, 1,...

k−1 1 1−θ θz Geometric(θ) θ(1 − θ) θ θ2 1−(1−θ)z θ ∈ (0, 1) k = 1, 2,...

Shifted k 1−θ 1−θ θ Geometric(θ) θ(1 − θ) θ θ2 1−(1−θ)z θ ∈ (0, 1) k = 0, 1, 2,...

Negative ¡ ¢ h in k−1 n k−n n n(1−θ) θz Binomial(n, θ) n−1 θ (1 − θ) θ θ2 1−(1−θ)z + n ∈ Z , θ ∈ (0, 1) k = n, n + 1,...

Discrete 1 n+1 n2−1 z(1−zn) Uniform(n) n 2 12 n(1−z) + n ∈ Z k = 1, 2, . . . , n

12 Appendix Some Standard Continuous Distributions

Distribution Prob. Density Func. Mean Variance m.g.f. fX(x) E[X] var(X) MX(t)

1 a+b (b−a)2 ebt−eat Uniform(a, b) b−a 2 12 (b−a)t −∞ < a < b < ∞ x ∈ (a, b]

Exponential −βx 1 1 β or Exp(β) βe β β2 β−t β > 0 x > 0

³ ´α βα α−1 −βx α α β Gamma(α, β) Γ(α)x e β β2 β−t α > 0, β > 0 0 < x < ∞

Normal 2 2 1 −(x−µ) 2 µt+ 1 σ2t2 N(µ, σ ) √ e 2σ2 µ σ e 2 2πσ2 µ ∈ R, σ2 > 0 −∞ < x < ∞

Standard Normal 2 2 1 − x t N(0, 1) √ e 2 0 1 e 2 2π −∞ < x < ∞

Γ(α+β) α−1 β−1 α αβ Beta(α, β) Γ(α)Γ(β)x (1 − x) α+β (α+β+1)(α+β)2 No simple α > 0, β > 0 0 < x < 1 /useful form

³ 1 ´ Cauchy(α, β) 2 Doesn’t exist Doesn’t exist Doesn’t exist πβ 1+ (x−α) β2 α ∈ R, β > 0 −∞ < x < ∞

13