1 Bernoulli Processes

Home , Bernoulli process, Stochastic process

Math 345 - Stochastic processes - Spring 2020

1 Bernoulli processes

1.1 Random processes Definition 1.1. A random or stochastic process is an infinite collection of rv’s defined on a common probability model. If the process contains countably many rv’s, then they can be indexed by positive integers, X1,X2,... , and the process is called a discrete-time random process. If there are continuum many rv’s, then they can be indexed by a positive real number t, {Xt; t ≥ 0}, and the process is called a continuous-time random process.

A discrete-time random process assigns a sequence of numbers to every outcome ω ∈ Ω,

ω 7→ (X1(ω),X2(ω),...,Xn(ω),... ), while a continuous-time discrete process assigns a function deﬁned on the half-line [0, +∞),

ω 7→ Xt(ω) : [0, ∞) → R.

Of course the sequence can be thought of as a function deﬁned on the set of natural numbers N. The value of the random process on a particular outcome ω is called a sample path of the process. When studying the random process one may choose to abstract to a probability model in which the outcomes are the sample paths, and events are sets of sample paths. Some examples of processes that can be modeled by random processes are repeated experiments, arrivals or departures (of customers, orders, signals, packets, etc.) and random walks (over a line, in a plane, in a 3D space).

1.2 Bernoulli processes One can make a simple nontrivial random process by considering a sequence of IID binary rv’s.

Definition 1.2. A Bernoulli process is a sequence Z1,Z2,... of IID binary rv’s. The independence here is understood in the sense that for any n > 0, the rv’s {Z1,Z2,...,Zn} are independent. Let p = Pr({Z = 1}) and q = 1 − p = Pr({Z = 0}). One can think of the Bernoulli process as a model describing the arrival of customers at discrete times n = 1, 2,... . Then at a specific time n = j a customer will arrive with probability p (Zj = 1), and no customer will arrive with probability q (Zj = 0). Here we are assuming that at most one arrival occurs at each discrete time instance. Instead of tracking the arrivals of the customers, one can track the interarrival times in this process. The first interarrival time X1 will be the time it takes the first customer to arrive. So  1 if Z1 = 1; prob = p  2 if Z = 0,Z = 1; prob = p(1 − p)  1 2  2 3 if Z1 = 0,Z2 = 0,Z3 = 1; prob = p(1 − p) X1 = ...   m−1 m if Z1 = 0,...,Zm−1 = 0,Zm = 1; prob = p(1 − p)  ...

As we see, X1 has the geometric PMF

m−1 PX1 (m) = p(1 − p) , m ≥ 1. The second interarrival time X2 is the time between the first and second arrivals, and similarly th th we define Xj to be the interarrival time between the (j − 1) and j arrivals. As the arrival rv’s are IID, clearly X2 will have the same probability distribution as X1, since the entire process can be chopped off at the first arrival time, and the second interarrival time can be computed from there similar to the above. Using induction to generalize this argument, we see that the interarrival times are IID geometric rv’s X1,X2,... , and the Bernoulli process can be caracterized by this sequence ∞ instead of the sequence of binary IID arrival rv’s {Zj}j=1. An example of a sample path of the Bernoulli process in terms of the arrival rv’s and the interarrival times is given below

{Zj} : 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1,... (1) {Xj} : 2, 1, 3, 2, 4, 1, 1,...

In addition to the arrival rv’s Zj and the interarrival time rv’s Xj, one may be also interested in tracking the aggregate number of arrivals up to the time t = n. These aggregate numbers of arrivals will be given by the rv’s

Sn = Z1 + Z2 + ··· + Zn, n ≥ 1.

Sn takes positive integer values, and Sn = k means that in the ﬁrst n discrete time instances arrivals occurred in exactly k of them. So the probability of Sn = k will be n Pr({S = k}) = P (k) = pk(1 − p)n−k, k = 1, 2, . . . , n, n Sn k

n where the combination k given the number of ways in which k time instances can be chosen from n, pk is the probability that in those k instances arrivals occur, while (1 − p)n−k is the probability that in the remaining n − k instances no arrival occurs. Here we are, of course, relying on the independence of the arrival rv’s Zj; j = 1, 2,... . So the aggregate number of arrivals has the biniomial distribution. The rv’s {Sn, n ≥ 1} having binomial distribution is not enough for the process with these aggregate arrivals to be a Bernoulli process, since the arrival rv’s {Zj, j ≥ 1} may not be independent. Indeed, one can for example give a joint PMF for {Z1,Z2,Z3} that yields them not independent, while making Sj, j = 1, 2, 3 binomial. Then taking {Zj, j ≥ 4} to be IID will guarantee that the rest of the aggregate arrival rv’s are binomial as well, but the independence of the Zj’s will no longer hold. For a particular example, let PZ (1) = PZ (0) = 1/2, and deﬁne the joint PMF PZ1Z2Z3 as follows

PZ1Z2Z3 (0, 0, 0) = PZ1Z2Z3 (1, 1, 1) = PZ1Z2Z3 (0, 0, 1) = PZ1Z2Z3 (1, 1, 0) = 1/8,

PZ1Z2Z3 (1, 1, 0) = PZ1Z2Z3 (0, 0, 1) = 1/4,

PZ1Z2Z3 (1, 0, 0) = PZ1Z2Z3 (0, 1, 1) = 0.

Then one can compute the PMF’s for Sj, j = 1, 2, 3, to see that they are binomial, but obviously {Z1,Z2,Z3} will not be independent.

1.3 Asymptotics of the binomial distribution As we saw above, the sum of IID binary distributions has a binomial PMF. We can use this known PMF of a sum of IID rv’s as an illustrative example of the Chernoﬀ bound that we discussed earlier. But ﬁrst, we give the asymptotics of the binomial PMF based on the Stirling’s bounds for the factorial n!, √ nn √ nn 1 2πn ≤ n! ≤ 2πn e 12n . e e

2 To state the asymptotics for the binomial PMF, we ﬁrst denote pe = k/n, which will be interpreted as the relative frequency of 1’s in the n-tupple Z1,Z2,...,Zn. We also deﬁne the Kullback- Liebler divergence (relative entropy) by

p 1 − p D(p || p) = p ln e + (1 − p) ln e . e e p e 1 − p

If we treat D(p || p) as a function of p with a ﬁxed p, then observe that D(p || p) = 0 = d D(p, p) = 0, e e dpe d2 while 2 D(p, p) > 0. So D(p || p) is a concave up function that takes its minimum value at p = p, dpe e e thus, D(pe|| p) ≥ 0, and the equality holds only when pe = p. The Kullback-Liebler divergence can be thought of as a measure of how diﬀerent p and pe are. With this notation, we have the following bounds.

Theorem 1.3. Let PSn (k) be the PMF of the binomial distribution for an underlying binary PMF PZ (1) = p > 0, PZ (0) = 1 − p > 0. Then for each integer pne , 1 ≤ pne ≤ n − 1, s 1 P (pn) < e−nD(pe|| p), sn e 2πnpe(1 − pe) s 1 1 P (pn) > 1 − e−nD(pe|| p). sn e 12npe(1 − pe) 2πnpe(1 − pe)

Notice that for ﬁxed p, pe the lower and upper bounds in the theorem are asymptotically the same as n → ∞, so the bounds are asymptotically tight, and as n → ∞, we have s 1 P (pn) ∼ e−nD(pe|| p), as n → ∞, uniformly for 0 < p < 1. (2) Sn e e 2πnpe(1 − pe)

The uniformness of the bounds is important here, since one cannot choose any pe for an arbitrary n, but needs to make sure pne is an integer, since Sn is integer-valued. The bound itself says that if pe 6= p, then the probability of having a relative frequency pe of arrivals within n discrete time instances (or, equivalently, having k = pne arrivals) decreases exponentially in n. Let us now apply the Chernoﬀ bound to the Binomial rv as the sum of n IID binary rv’s. Notice that for the IID binary rv’s Z1,Z2,... with the PMF PZ (1) = p, PZ (0) = q = 1 − p deﬁned above, the mean is E[Z] = p and the MGF is

Zr 1·r 0·r r gZ (r) = E[e ] = pe + qe = q + pe , for − ∞ < r < ∞.

r The semi-invariant MGF will then be γX (r) = ln(q + pe ). If we take a = pe in the Chernoﬀ bound, then the optimal exponent for this value will be

µX (p) = inf [γX (r) − pr]. e r≥0 e

But the minimum of the expression γX (r) − pre will be assumed when d γ (r) = p. dr X e This will happen if per pq = p or er = e , where q = 1 − p. r e e e q + pe pqe

3 Observe that for pe > p, for which we expect the Chernoff bound to hold, one has for the optimal r, pq p(1 − p) er = e = e > 1, hence r > 0, pqe p(1 − pe) r as expected. Substituting this value of e back into µX (r) and performing some algebraic simplifi- cations, one will get p q µX (r) = peln + qeln = −D(pe|| p). pe qe Substituting this into the Chernoff bound, we get

nµX (r) −nD(p || p) Pr({Sn ≥ npe}) ≤ e = e e .

But Pr({Sn ≥ npe}) ≥ Pr({Sn = npe}), and comparing the Chernoﬀ bound to the above asymptotic bounds for the binomial PMF, we see that Pr({Sn ≥ npe}) will be bounded by the same exponentially decaying bound from above and below. We can record this fact by

ln Pr({S ≥ np}) lim n e = −D(p || p), p > p. n→∞ n e e So the optimized Chernoﬀ bound (over r ∈ I(X), r > 0) is exponentially tight for sums of binary IID rv’s, in the sense that no better (faster decaying) exponential bound can exist. This turns out to be true for any general IID rv’s.

1.4 Central Limit Theorem for binary IID rv’s Using the asymptotic bounds discussed in the previous section one can prove the Central Limit Theorem (CLT) in the case of binary IID rv’s. Recall that the CLT says that the renormalized sample average of IID rv’s converges to the Gaussian rv in distribution. In the case of binary IID rv’s one can restate the CLT in the following form.

Theorem 1.4. Let {Zj, j ≥ 1} be a sequence of binary IID rv’s with p = PZ (1) > 0 and q = 1 − p = PZ (0) > 0. Let Sn = Z1 + ··· + Zn for any n ≥ 1 and α be a ﬁxed constant satisfying 1 2 α 2 < α < 3 . Then there are constants C, n0 such that for any k satisfying |k − np| ≤ n one has

3α−2 2 1 ± Cn − (k−np) P (k) = √ e 2npq , for n ≥ n , Sn 2πnpq 0 where the equality is to be interpreted as an upper/lower bound depending on the sign in the ±. 3α−2 2 Note that n → 0 as n → ∞ for α < 3 , so the ratio of the upper and lower bounds approaches 1 with rate n3α−2 uniformly in k in the range |k − np| ≤ nα.

The above theorem is for the PMF of Sn and not the PDF, as the CLT requires, but using the bounds established by the theorem one can show via a Riemann sum approximation to the integral that z 2 0 Sn − np 1 − y lim Pr z ≤ √ √ ≤ z = √ e 2 dy, n→∞ pq n ˆz0 2π which will ﬁnish the proof of the CLT for binary IID rv’s.