Math 345 - Stochastic processes - Spring 2020

2 Poisson processes

2.1 Arrival processes Poisson processes are used to model continuous-time arrivals into a system (customers, orders, signals, packets), and can be thought of as continuous-time analogs of the Bernoulli processes. In a the arrival times are fixed and discrete, while for general arrival processes the arrival times themselves are random variables, and form a random process

Definition 2.1. An arrival process is a sequence of increasing rv’s 0 < S1 < S2 < . . . , where the inequality Sj < Sj+1 is meant in the sense that the difference Sj+1 − Sj is a positive rv X (i.e. FX (0) = 0). The rv’s Sj, j = 1, 2,... are called arrival epochs, and they give the times of first, second, etc. arrivals. One can use an arrival process to model any random repeating phenomenon, where the epochs Sj will represent the times when the phenomenon occurs. Notice that the process starts at time t = 0, and simultaneous arrivals are ruled out by strict inequalities between the epochs. However, one can assign an additional sequence of random variables to the arrival epochs to track multiple arrivals. For a continuous arrival process the arrival epochs are continuous rv’s, and hence the of an arrival at any fixed time is zero. To fully describe the arrival process one needs to specify the joint distribution of S1,S2,... . Instead of defining the arrival process via the arrival epochs, one can also give the interarrival times, just as in the case of the Bernoulli process. For example the first interarrival time is the time until the first arrival occurs, or X1 = S1, the second interarrival time is the time between the first and second arrivals, or X2 = S2 − S1, and in general one has Xj = Sj − Sj−1. Thus we have the following relation between the interarrival times and the arrival epochs:

X1 = S1 X = S − S n 2 2 1 X ...... and Sn = Xj. Xj = Sj − Sj−1 j=1 ...... In practice, the interarrival times are IID rv’s, and it’s easier to specify an arrival process via the interarrival times, rather than via the joint distribution of arrival epochs. An alternative way of characterizing an arrival process is through the aggregate number of arrivals, which in the case of continuous arrival processes will be a continuum family of discrete rv’s, denoted by {N(t); t > 0}, which track the total number of arrivals up to and including the time t. Thus N(t) = 0 means that no arrival occurred in the interval (0, t]; N(t) = 1 means that one arrival occured in the same interval, etc. We will refer to the family {N(t)} as a counting process. We will take N(0) = 0 with probability 1, since the probability of an arrival occurring at t = 0 is nill. Clearly the counting process {N(t); t > 0} satisfies the property

N(τ) ≥ N(t) for τ ≥ t > 0, which is again understood in the sense that the difference is a nonnegative rv. We also have the following equality of events

{Sn ≤ t} = {N(t) ≥ n}, since the event on the left means that the nth arrival happened at some epoch τ ≤ t, while the event on the right means that at least n arrivals occurred in the interval (0, t] (for a rigorous proof, show that the set on the left is a subsets of the set on the right and vice versa). Thus the joint CDF of the arrival epochs can be found from the joint CDF of the counting process, and vice versa. So an arrival process can be characterized by either arrival epoch, the counting process of aggregate arrivals, or the interarrival times. This then gives the freedom to use whichever description is most convenient for the problem at hand. A particular class of arrival processes are renewal processes.

Definition 2.2. An arrival process is called a renewal process, if the interarrival times are positive IID rv’s.

Poisson processes form a special subclass of renewal processes, which we discuss in detail next.

2.2 Poisson processes Definition 2.3. A Poisson process is a renewal process where the interarrival times are IID exponential rv’s, that is, for some λ > 0, called the rate of the process, the interarrival times are represented by IDD rv’s {Xj; j = 1, 2,... } with the common density

−λt fX (t) = λe , t ≥ 0.

We will shortly see that for an time interval of size t the expected number of arrivals will be λt, justifying the use of the term rate for λ. We first discuss an importatnt property of exponential rv’s which will be crucial for under- standing the significance of Poisson processes in modeling various processes (such as in queuing theory).

Definition 2.4. A rv X has the memoryless property, if X is a positive rv for which

Pr({X > t + x}) = Pr({X > x})Pr({X > t}), for all x, t ≥ 0. (1)

The memoryless property can be expressed via the use of the complimentary CDF of the rv, and will take the form c c c FX (t + x) = FX (x)FX (t). We observe that if (1) holds, then

Pr({X > t + X and X > t}) Pr({X > t + X}) Pr({X > t + x | X > t}) = = = Pr({X > x}). Pr({X > t}) Pr({X > t})

This can be interpreted as follows: If X measures the waiting time (say in units of minutes) until an arrival (say, of a bus), then the knowledge that an arrival hasn’t happened in the first t minutes doesn’t influence the probability of waiting an additional x units of time, i.e., the probability that one has to wait at least x more minutes is the same, whether one has already waited t minutes, or no. Hence the name of the property - memoryless. Notice also that the use of t + x on the left of (1) has nothing to do with x or t, it’s simply the numeric sum of the two. In other words, for a memoryless rv one for example will have

Pr({X > 7}) = Pr({X > 4})Pr({X > 3}) = Pr({X > 5})Pr({X > 2}) = Pr({X > 6})Pr({X > 1}) = ...

2 We next observe that an exponential rv is memoryless, as the complimentary CDF of an expo- nential rv is ∞ c −λt −λx Pr({X > x}) = FX (x) = λe dt = e , ˆx which will obviously satisfy the memoryless property. Conversely, if X is a memoryless continuous rv, then it has to have exponential PDF. Indeed, if we denote h(x) = ln(Pr({X > x}), then the memoryless property implies that

h(x + t) = h(x) + h(t).

It’s then easy to see that h(x) has to be a linear function over positive integers, since

h(k) = h(1) + ··· + h(1) = kh(1) for any k = 1, 2,.... | {z } k times This can be trivially extended to first fractional numbers 1/n and then to all rationals m/n. To extend this property to irrationals as well one can use the density of rationals on the real line and the additional property of h(x) of being nonincreasing, since Pr({X > x}) is, that is, h(y) ≤ h(x) for y > x. The monotonicity then implies that nothing crazy can happen over the irrationals, which are limit points for the set of rationals. Thus, h must be linear with a zero vertical intercept, since h(0) = 0. Then we must have h(x) = −λx for some λ > 0, as h must be zero or negative, since it’s a logarithm of numbers less than or equal to 1. But then this show that if X is memoryless, then there is a λ > 0, such that

c −λx FX (x) = Pr({X > x}) = e > 0, for all t ≥ 0.

−λx But then the CDF of X is FX (x) = 1 − e , and differentiating the last expression one can obtain the exponential PDF. We also observe, that if a memoryless process is discrete (think of restricting arrival times to k−1 integer values, for example), then it has to have the geometric PMF, that is PX (k) = p(1 − p) . One can see that a geometric rv is memoryless by computing its complimentary CDF,

∞ X p(1 − p)m F c (m) = Pr({X > m}) = p(1 − p)j−1 = = (1 − p)m, X 1 − (1 − p) j=m+1

c where we used the formula for the sum of a geometric progression. But then it’s clear that FX (m) = eln(1−p)·m, where ln(1 − p) acts like the −λ for the exponential rv. So following the same ideas as in the previous proof, one can show that a memoryless discrete rv must be geometric. We will shortly see that the Bernoulli process, which has IID geometrically distributed inter- arrival times, can be thought of as a discrete-time analog of the Poisson process, which has IID exponential rv’s as the interarrival times. Hence it’s not surprising that in both cases the interarrival times are memoryless. The memoryless property can be generalized to the entire Poisson process, and we explore this by looking at interarrival times starting from some cutoff time t.

Theorem 2.5. For a Poisson process of rate λ, and any given time t > 0, the length of the interval from t until the first arrival after t is a positive rv Z with CDF 1 − e−λz for z ≥ 0. Furthermore, Z is independent of both N(t) and the n arrival epochs before time t, where n = N(t). It is also independent of the set of rv’s {N(τ); 0 < τ ≤ t}.

3 c To prove the theorem, we will compute the complimentary CDF FZ = Pr({Z > z}) by condi- tioning on N(t) and the previous arrival epochs. We first condition on the event {N(t) = 0}, and since this means that no arrival has occurred before time t, then the next arrival after time t will be the first arrival in the entire Poisson process, and it’s time will be given by the first interarrival time X1:

Pr({Z > z | N(t) = 0}) = Pr({X1 > z + t | N(t) = 0)})

= Pr({X1 > z + t | X1 > t}) −λz = Pr({X1 > z}) = e where we used the fact that the events {N(t) = 0} and {X1 > t} are the same, and that X1 is an exponential rv, hence, memoryless. Notice that we used the conditioning to relate the calculation of the complimentary CDF of Z to the complimentary CDF of X1, and used the memoryless property of the latter. Next, we condition on {N(t) = n} and {Sn = τ}. Then the first arrival after t is the first arrival after τ = Sn, so Z > z will imply Xn+1 > z + (t − τ). Using this, we have

Pr({Z > z | N(t) = n, Sn = τ}) = Pr({Xn+1 > z + (t − τ) | N(t) = n, Sn = τ)})

= Pr({Xn+1 > z + (t − τ) | Xn+1 > t − τ, Sn = τ})

= Pr({Xn+1 > z + (t − τ)}) = Pr({Xn+1 > t − τ} −λz Pr{Xn+1 > z}) = e , where we used the equality of the events {N(t) = n, Sn = τ} and {Sn = τ, Xn+1 > t − τ}, as both indicate that the nth arrival happens at τ and no subsequent arrival happens until after time t, the independence of the (n + 1)th interarrival time on the nth arrival epoch, and the exponential distribution and of Xn+1. Clearly, conditioning on {N(τ) = n; 0 < τ ≤ t} will give the same result, and prove the independence from the set of rv’s N(τ) for 0 < τ ≤ t. We remark that the proof once again used the conditioning to relate the complimentary CDF of Z to the complimentary CDF of an interarrival time X and made use of the latter’s memorylessness. Thus, we see that the CDF of Z is independent of N(t) and arrival epochs before time t, and is exponential. We next define the subsequent interarrival times as follows: the time it takes for the first arrival after time t to occur will be renamed Z1 = Z; Z2 will denote the time it takes for the second arrival to occur after time t, and so on. From Z2 on these interarrival times will coincide with the original interarrival times. More specifically, given N(t) = n, Sn = τ, we will have Zm = Xm+n for m = 2, 3,... , and using the fact that Z1 = X1 − (t − τ) and memorylesness of X1, we see that conditional on {N(t) = n, Sn = τ} the rv’s Z1,Z2,... are exponential IID rvs’. But as this is independent of n and τ, as we saw above, then Z1,Z2,... are uncoditiionally IID. Poisson processes were defined via the interarrival times being IID exponential rv’s, and we saw above that the interarrival times after some cut-off time t are also IID exponential rv’s with the same rate λ. Thus starting the Poisson process at any time t is a probabilistic replica of the original process starting at t = 0, which is indicative of the memoryless property. We can also relate the memoryless property of the Poisson process to its counting process. Denoting the number of arrivals in the interval (t, t0] by Ne(t, t0) = N(t0)−N(t), the above discussion showed that Ne(t, t0) has the same distribution as N(t0 −t). We define this as a standalone property of counting processes.

Definition 2.6. A counting process {N(t); t > 0} has the stationary increment property if Ne(t, t0) = N(t0) − N(t) has the same CDF as N(t0 − t) for all t0 > t > 0.

4 So having the stationary increment property, it’s clear that in a Poisson process the distribution of the number of arrivals in some interval depends only on the size of the interval, and not its starting point. Definition 2.7. A counting process {N(t); t > 0} has the independent increment property, if for any positive integer k and every k-tuple of times 0 < t1 < t2 < ··· < tk, the k rv’s

N(t1), Ne(t1, t2),..., Ne(tk−1, tk) are independent. Using the memoryless property of the Poisson process, and an induction argument, it’s not hard to see that Poisson counting processes have the independent increment property. Indeed, let the independence of arrival increments hold for some k, and let t = tk be the last of the times in the k-tuple. But then the rv Z for the next arrival time after the time t = tk will be independent of all N(τ) for 0 < τ < t = tk, and hence also of all rv’s N(t1), Ne(t1, t2),..., Ne(tk−1, tk). But the subsequent interarrival times are also independent from these, and hence, the aggregate number of arrivals after the time t = tk, and hence the arrival increments will be independent from these previous arrival counts. Thus Ne(tk, tk+1) will be independent of the previous arrival increments for any tk+1 > tk, which finishes the inductions argument. We summarize the above in the following theorem. Theorem 2.8. Poisson counting processes have both the independent and stationary increment properties. If one considers only integer times, then it’s clear that Bernoulli processes have the same prop- erties.

2.3 of arrival epochs Poisson processes were defined through the interarrival times, which are IID exponential rv’s. To be able to answer practical questions, such as the expected arrival numbers, of an arrival occurring in a given interval, and so on, we will need the joint probability distribution of the arrival epochs {Sj; j = 1, 2,... }, and the counting process {N(t); t > 0}. As the arrival epochs are sums of the interarrival times, the probability distributions of which are known, we will first calculate the joint densities of Sj’s, and using these, will then calculate the mass functions of the rv’s N(t) via the relation of appropriate events. Recall that

S1 = X1,

S2 = X1 + X2,

S3 = X1 + X2 + X3 = S2 + X3, ......

−λt So the density of S1 is fS1 (t) = FX1 (t) = λe . The density of S2 can be calculated using a convolution as follows: t

fS2 (t) = (fX1 ∗ fX2 )(t) = fX (t)fX (t − y) dy ˆ0 t = λe−λyλe−λ(t−y) dy ˆ0 t = λ2e−λt dy = λ2te−λt. ˆ0

5 Similarly, using S3 = S2 + X3, we will have t

fS3 (t) = (fS2 ∗ fX3 )(t) = fS2 (y)fX (t − y) dy ˆ0 = λ2ye−λyλe−λ(t−y) dy ˆ t t2 = λ3e−λt y dy = λ2 e−λt. ˆ0 2 Using an inductive argument and similar convolution calculations, one can show that λntn−1e−λt f (t) = , Sn (n − 1)! which is Erlang’s density function. As the arrival epochs are not independent, we need their joint distribution functions in order to have a complete description of the probability distribution for the arrivals in the Poisson process. We compute the joint density functions by using conditioning on the previous arrival epochs. For the joint density of S1 = X1 and S2 = X1 + X2 we have

fS1S2 (s1, s2) = fS2 | S1 (s2 | s1)fS1 (s1)

= fX2 (s2 − s1)fX1 (s1) = λe−λ(s2−s1)λe−λs1 = λ2e−λs2 , which holds for 0 ≤ s1 ≤ s2. Notice that the expression of the joint density of S1 and S2 is independent of s1, and s1 is only accounted for in the restriction 0 ≤ s1 ≤ s2. Thus, given S2 = s2, the first arrival epoch S1 is uniformly distributed in the interval (0, s2] (all arrival times S1 = s1 are equiprobable, or rather all interval periods of same length are equiprobable, as S1 is a continuous rv, and the probability of S1 = s1 is zero for any s1). We can compute the marginal density of S2 by integrating fS1S2 (s1, s2) with respect to s1 over the interval [0, s2], which, given that it doesn’t depend on s1, will amount to multiplying the joint density by the length of the interval of integration s2. This will again result in the Erlang density for S2. In the same way we can compute the joint density of S1,...,Sn by conditioning Sn on all the previous arrival epochs, and using the fact Sn = Sn−1 + Xn along with an inductive argument. −λx Theorem 2.9. Let X1,X2,... be IID rv’s with the exponential density fX (x) = λe for x ≥ 0. Let Sn = X1 + ··· + Xn for any n ≥ 1. Then for any n ≥ 2 the joint density function of the rv’s n {Sj}j=1 is given by

n −λsn fS1S2...Sn (s1, s2, . . . , sn) = λ e , 0 ≤ s1 ≤ s2 ≤ · · · ≤ sn. (2) Indeed, the case n = 2 was shown above, and assuming the formula for the density holds for n − 1, we will have

fS1S2...Sn (s1, s2, . . . , sn) = fSn | S1S2...Sn−1 (sn | s1, s2, . . . , sn−1)fS1S2...Sn−1 (s1, s2, . . . , sn−1) n−1 −λsn−1 = fXn (sn − sn−1)λ e = λe−λ(sn−sn−1)λn−1e−λsn−1 = λne−λsn .

Similar to the joint density of S1 and Sn, we see that the expression of the joint densities of the arrival epochs are independent of all the arrival times except the last one. Thus, given Sn = sn, all the previous arrival epochs are uniformly distributed over the intervals satisfying the constraint 0 ≤ s1 ≤ s2 ≤ · · · ≤ sn. Using successive integration with respect to s1, s2, . . . sn we will then recover the marginal density of Sn, which will, of course, exactly coincide with the Erlang density computed previously by repeated convolutions of exponential densities.

6 2.4 Probability distribution of the counting process To find the probability distribution of the counting process {N(t); t > 0}, we will rely on the equality of the events {N(t) ≤ n} = {Sn+1 > t}. (3) The event on the left means that at most n arrival occurred up to time t, while the right hand side means that (n + 1)th arrival occured after time t. Using (3), we can relate the distribution function of N to the complementary CDF of Sn+1. The PMF of N(t) will then be calculated from the jumps of the CDF of N(t). As N(t) is a nonnegative rv, we have

∞ −λy −λt PN(t)(0) = FN(t)(0) = Pr({N(t) ≤ 0}) = Pr({S1 > t}) = λe dy = e . ˆt Next, we have

−λt −λt PN(t)(1) = F (1) − F (0) = Pr({N(t) ≤ 1}) − e = Pr({S2 > t}) − e ∞ λ2y1e−λy = dy − e−λt ˆt 1! ∞ λ2ye−λy ∞ λ2e−λy −λy = − + dy − e λ ˆt λ y=t = λte−λt + e−λt − e−λt = λte−λt,

−rt where we used integration by parts and the fact that limt→∞ Q(t)e = 0 for any polynomial Q(t), in order to get rid of the boundary term at infinity. We can similarly compute PN(t)(2), in order to gain further insight into the PMF of N(t),

PN(t)(2) = F (2) − F (1) = Pr({N(t) ≤ 2}) − F (1) = Pr({S2 > t}) − F (1) ∞ λ3y2e−λy = dy − F (1) ˆt 2! ∞ λ3y2e−λy ∞ λ3(2y)e−λy

= − + dy − F (1) 2!λ ˆt 2!λ y=t (λt)2e−λt (λt)2e−λt = + F (1) − F (1) = , 2! 2! where we once again used integration by parts to express the complimentary CDF of S3 as a boundary term plus the complimentary CDF of S2. Using an inductive argument and repeating the same type of integration by parts calculation, we can successively find all values of the PMF of N(t).

Theorem 2.10. For a Poisson process of rate λ and any t > 0, the PMF for the rv N(t), the aggregate number of arrivals in the interval (0, t], is given by the Poisson PMF

(λt)ne−λt P (n) = . (4) N(t) n!

2.5 Alternative definitions of Poisson processes Poisson processes were defined as renewal processes with exponentially distributed interarrival ∞ times {Xj}j=1. An alternative way to describe a Poisson process would be via arrival epochs

7 ∞ {Sj}j=1 with the joint density given by (2). Yet a third description would be via the counting process {N(t); t > 0}, which for a fixed t has the Poisson mass function (4), but also satisfied the stationary and independent increment properties.

Theorem 2.11. An arrival process is Poisson, iff its counting process {N(t); t > 0} has the Poisson PMF (4) and the stationary and independent increment properties.

Although stated as a theorem, the alternative description can be adopted as a new definition of Poisson processes. We already established the implication in one direction, namely that the counting process for a Poisson process has Poisson PMF and stationary and independent increment properties. The opposite implication, showing that a Poisson counting process with stationary and independent increment properties implies IID exponential interarrival times, is left as a homework exercise. We note that the Poisson PMF for {N(t); t > 0} by itself is not enough to imply that the associated process is Poisson, since one can change the joint distribution of N(t), and hence also of ∞ {Sj}j=1, without changing the marginal distributions, which may lead to interdependent interarrival times. We saw a similar effect for Bernoulli processes, where the for the aggregate arrivals by itself wasn’t enough to the make the process Bernoulli. Another alternative description for the Poisson process is offered by the increment properties of N(t) over infinitesimal time-increments. We first observe that for small δ > 0,

Pr({Ne(t, t + δ) = 0}) = Pr({N(δ) = 0}) = e−λδ ≈ 1 − λδ + o(δ), Pr({Ne(t, t + δ) = 1}) = Pr({N(δ) = 1}) = λδe−λδ ≈ λδ + o(δ), Pr({Ne(t, t + δ) ≥ 2}) = Pr({N(δ) ≥ 2}) ≈ o(δ), where we used linear approximations, and the terms o(δ) (pronounced little-o of δ) stand for terms that are at least quadratic in the small parameter δ1. The above approximations for the probabilities of arrival increments imply that the probability of having an arrival in an interval (t, t+δ] is λδ+o(δ), or approximately proportional to the size of the interval δ. One can use these approximations to c show that dFX (t)/dt = −λ for the complimentary CDF of the interarrival times, which will yield the exponential density. On the other hand, the stationary and independent increment properties can be used to show that the interarrival times are IID. Hence, the above approximations along with the stationary and independent increment properties are enough to yield the process Poisson, and can be used as an alternative definition for Poisson processes.

Theorem 2.12. An arrival process is Poisson, iff its counting process {N(t); t > 0} has the stationary and independent increment properties and further satisfies the following approximations for any t > 0 and small δ > 0,

Pr({Ne(t, t + δ) = 0}) = 1 − λδ + o(δ), Pr({Ne(t, t + δ) = 1}) = λδ + o(δ), (5) Pr({Ne(t, t + δ) ≥ 2}) = o(δ). One difficulty with adopting the above as an alternative definition for the Poisson process is that it’s not obvious that the conditions (5) are consistent, and that such a process even exists. Es- tablishing this consistency without relying on the original definition of the Poisson process requires substantial effort. However, using the original definition we derived the above conditions, and can now freely use them.

1The formal definition of o(δ) requires that o(δ)/δ → 0 as δ → 0.

8 2.6 Poisson process as a limit of shrinking Bernoulli processes Previously we have remarked that Bernoulli processes are discrete-time analogs of Poisson processes. Here we show how one can start with a Bernoulli process and via appropriate shrinking of the interarrival times arrive at a Poisson process via convergence. Let Y1,Y2,... be IID binary rv’s with PY (1) = p and PY (0) = 1−p. These rv’s describe arrivals at discrete time points t = 1, 2,..., with Yj = 1 meaning an arrival occurs at time t = j, while Yj = 0 means no arrival occurs at t = j. The time-increment between two successive arrival times is ∆t = j − (j − 1) = 1. We can shrink this to ∆t = 2−j, and look at a Bernoulli process with the discrete arrival time points

t = 2−j, 2 · 2−j, 3 · 2−j, . . . , k2−j,....

We will consider a sequence of such Bernoulli processes, indexed by j, with shrinking time intervals. To keep the arrival rate constant, we let p = λ2−j for the jth process. Then in a unit time there will be 2j arrival points, and the expected number of arrivals in the jth process per unit time is

2j 2j X X −j j −j 1 · Pr({Yk = 1}) = λ2 = 2 · λ2 = λ. k=1 k=1 One can visualize this shrinking procedure as dividing the arrival slots of the preceding Bernoulli process into two equal slots, and assigning each of these half-slots an arrival probability equal to half that of the previous bigger slot. Taking δ = 2−j, we see that the probability of one arrival in a time-increment of size δ is λδ, and the probability of no arrival is 1−λδ. So the Bernoulli processes satisfy (5) exactly, without the the quadratic error terms. As the goal is to make j converge to infinity, we will need to take arbitrary increment size δ. Clearly, the disjoint time-increments will have independent arrivals. However, the arrival increments will no longer be stationary, since an interval of size smaller than 2−j may or may not contain an arrival point of the form t = k2−j of the jth Bernoulli process, depending how it is placed relative to these arrival points. Nevertheless, for a fixed δ the number of arrival points t = k2−j, with k = 1, 2,... in a time-increment of size δ is either bδ2jc or 1 + bδ2jc, and in the limit j → ∞ the increments will be both independent and stationary. Here we made use of the floor function f(x) = bxc = greatest integer less than x (for positive numbers this coincides with the integer part of the number). th The counting process Nj(t) for the j Bernoulli process is then

bt2j c X Nj(t) = Yi, i=1 where the summation index runs over i = 1, 2, . . . , t2j, since the arrival points in the interval (0, t] are 2−j, 2 · 2−j, 3 · 2−j,..., bt2jc · 2−j.

For each fixed t > 0, Nj(t) is a discrete rv describing the aggregate number of arrivals up to time t, and as we have shown previously, has the binomial PMF

 j  bt2 c j P (t)(n) = pn(1 − p)bt2 c−n, with p = λ2−j. Nj n

This binomial PMF will converge to the Poisson PMF as j → ∞.

9 Theorem 2.13 (Poisson’s theorem). Consider the sequence of shrinking Bernoulli processes with arrival probability λ2−j and arrival slots of size 2−j. Then for any fixed time t > 0 and number of arrivals n, the counting PMF PNj (t)(n) approaches the Poisson PMF with the same rate λ as j → ∞, that is, lim PN (t)(n) = PN(t)(n). j→∞ j

Moreover, for any integer k > 0 and a k−tupple of times 0 < t1 < t2 < ··· < tk, the joint CDF of Nj(t1),Nj(t2),...,Nj(tk) converges to the joint CDF of N(t1),N(t2),...,N(tk) as j → ∞. The above theorem states that the shrinking Bernoulli process converges to the Poisson process in the sense of joint distributions for the associated counting processes. The proof of the theorem is a straightforward limit calculation. Indeed,

 j  bt2 c n bt2j c−n lim PN (t)(n) = lim p (1 − p) j→∞ j j→∞ n  j  bt2 c j = lim (λ2−j)n(1 − λ2−j)bt2 c−n j→∞ n  j   −j n bt2 c λ2 −j j = lim eln(1−λ2 )·bt2 c j→∞ n 1 − λ2−j bt2jc  λ2−j n = lim e−λt, j→∞ n 1 − λ2−j where we used the fact that ln(1 − λ2−j) = −λ2−j + o(2−j) and eo(2−j ) = 1 + o(2−j), to simplify the exponential term. Writing out the terms of the combination above, and using the fact

 λ2−j  lim bt2j − ic · = λt, for i = 0, 1, 2, . . . , n − 1, j→∞ 1 − λ2−j we will have j j j  −j n n bt2 c · bt2 − 1c · · · · · bt2 − n + 1)c λ2 −λt (λt) −λt lim PN (t)(n) = lim e = e . j→∞ j j→∞ n! 1 − λ2−j n! The proof for the convergence of the joint CDF’s follows from the above and the stationary and independent increment properties of the Bernoulli and Poisson processes. We first observe that the joint PMF for Nj(t1),Nj(t2),...,Nj(tk) is

PN (t )N (t )...N (t )(n1, n2, . . . , nk) = P (n1, n2 − n1, . . . , nk − nk−1) j 1 j 2 j k Nj (t1)Nfj (t1,t2)...Nfj (tk−1,tk) k Y = PN (t )(n1) P (nl − nl−1). j 1 Nfj (tl−1,tl) l=2 Similarly, the joint distribution for the Poisson process will be

P (n1, n2, . . . , nk) = P (n1, n2 − n1, . . . , nk − nk−1) N(t1)N(t2)...N(tk) N(t1)Ne(t1,t2)...Ne(tk−1,tk) k Y = P (n1) P (nl − nl−1). N(t1) Ne(tl−1,tl) l=2 But using the convergence of the PMF, we will also have the convergence of the arrival increments Nfj(tl−1, tl) to Ne(tl−1, tl) for l = 2, . . . , k, and the convergence of the joint PMF’s, and hence, also the joint CDF’s, will follow.

10 The fact of the convergence of the shrinking Bernoulli process to the Poisson process gives an intuitive way of thinking about the continuous-time arrival process via the limit of shrinking discrete-time arrival processes. This convergence, however, is rather cumbersome to exploit, and as we have all the probability distributions of the continuous time Poisson process, it will be easier to analyze the Poisson process directly.

11