PROBABILITY MODELS 47

12. Probability generating functions For a non-negative discrete X, the probability generating function contains all possible information about X and is remarkably useful for easily deriving key properties about X. Definition 12.1 (Probability generating function). Let X 0 be a discrete random variable on 0, 1, 2,... and let p := P X = k , k = 0, 1, 2,.... The probability≥ generating function of X is { } k { } X X ∞ k (21) GX(z):= Ez := z pk. k=0 For example, for X taking values 1 with probability 1/2, 2 with probability 1/3, and 3 with probability 1/6, we have 1 1 1 G (z) = z + z2 + z3. X 2 3 6 Paraphrasing Herb Wilf, we say that the probability generating function “hangs the distri- bution of X on a clothesline.” Notice that GX(1) = 1 and GX(0) = p0 for any random variable X. We summarize further properties of probability generating functions in the following theorem. Theorem 12.2 (Properties of generating functions). For non-negative discrete random variables X, Y 0, let GX and GY be their probability generating functions, respectively. Then ≥ a b (i) If Y = a + bX for scalars a, b R, then GY(z) = z GX(z ). ∈ (ii) If X and Y are independent, then GX+Y(z) = GX(z)GY(z). (iii) P X = n = G(n)(0)/n!, for every n 0, 1, 2,... , where G(n)(z):= dnG/dzn. { } ∈ { } (iv) G(n)(1) = EX n, for every n 0, 1, 2,... . In particular, G (1) = EX. X ↓ ∈ { } X0 Proof. These properties all follow easily from the properties of expectations. For (i), a, b R ∈ and so za+bX = zazbX; whence, a bX a b X a b Ez z = z E(z ) = z GX(z ). For (ii), if X and Y are independent, then so are zX and zY; therefore, X+Y X Y X Y GX+Y(z):= Ez = Ez z = Ez Ez = GX(z)GY(z).

For (iii), we see that if n = 0, 1, 2,..., then the terms containing p0,..., pn 1 will become 0 after taking the nth derivative, because they are the coefficients of a monomial− in z of degree less than n. The nth derivative of GX(z) is then X (n) ∞ n n k GX (z):= pkk↓ z − . k=n n Therefore, only the first term pnn↓ does not depend on z and so (n) G (0) = n!pn. (n) From the above expression, it is apparent that GX (z) evaluated at z = 1 gives the nth factorial of X, establishing (iv).  The next theorem hints at why generating functions are so important. 48 HARRY CRANE

Theorem 12.3 (Uniqueness theorem). Let X, Y 0 be discrete random variables. Then X and Y ≥ have the same distribution if and only if GX = GY. 12.1. Generating functions of some common distributions. We now catalog generating functions of some common distributions, from which properties that we already know (and sometimes spent some effort deriving) become obvious. To begin with the most trivial distribution, for some k = 0, 1, 2,..., suppose X = k with k probability one, then GX(z) = z . 12.1.1. . For 0 < p < 1, let X have the Bernoulli distribution with parameter p. Then p0 = 1 p and p = p and − 1 X 0 1 GX(z):= Ez = (1 p)z + pz = (1 p) + pz. − − For a Binomial random variable Y with parameter (n, p), we can express Y = X + + Xn, 1 ··· where X1,..., Xn are i.i.d. Bernoulli(p). Therefore, by Theorem 12.2(ii), we have Yn n GY(z) = GX + +Xn (z) = GX (z) = (1 p + pz) . 1 ··· j − j=1

Apparently, taking the first derivative of GY(z), we get n 1 G0 (z) = n(1 p + pz) − p, Y − n which has G (0) = n(1 p)n 1p = p(1 p)n 1 and G (1) = n(1 p + p)n 1p = np. Y0 − − 1 − − Y0 − − 12.1.2. . Let X have the Poisson distribution with parameter λ > 0. Then k λ pk := λ e− /k! and X ∞ k λ k GX(z) = λ e− z /k! k=0 X λ+λz ∞ k λz = e− (λz) e− /k! k=0 λ(1 z) = e− − . This is a very nice generating function, because we can easily express the nth derivative of GX(z) by (n) n λ(1 z) GX (z) = λ e− − ; and, therefore, (n) n λ(1 0) P X = n = G (0)/n! = λ e− − /n! and { } X n (n) n λ(1 1) n EX↓ = G (1) = λ e− − = λ . Theorem 12.4 (Superposition and Thinning properties of Poisson RVs). The Poisson distri- bution has the following special properties.

Superposition property: Let X ,..., Xn be independent Poisson random variables with • 1 Xj Poisson(λj), for j = 1,..., n. Then Y Poisson(λ1+ +λn), where Y := X1+ +Xn. Thinning∼ property: Let X Poisson(λ)∼and, given X =···n, let B Binomial(n, p···). Then • the unconditional distribution∼ of B is Poisson with parameter λp. ∼ PROBABILITY MODELS 49

Proof. The superposition property is a consequence of the exponential form of the generating function for the Poisson distribution. For X1,..., Xn with parameters λ1, . . . , λn, respectively, we see that n Y Pn λj(1 z) j=1 λj(1 z) GX + +Xn (z) = e− − = e− − . 1 ··· j=1 Proof of the thinning property is left as a homework exercise.  12.1.3. Geometric and Negative Binomial distributions. Let X have the Geometric distribution with success probability 0 < p < 1. Then p := (1 p)k 1p and k − − X ∞ k 1 k GZ(z) = (1 p) − pz − k=1 X∞ = pz ((1 p)z)k − k=0 = pz/(1 (1 p)z). − − Furthermore, for a positive integer r > 0, let Y have the Negative Binomial distribution with parameter (p, r). Then, for X1,..., Xr i.i.d. Geometric(p) random variables, !r pz GY(z) = GX + +Xr (z) = . 1 ··· 1 (1 p)z − − 12.1.4. Sum of a random number of independent random variables.

Proposition 12.5. Let N be a positive integer random variable and, given N, let X1, X2,... be i.i.d. independent of N. Then

(i) SN := X + + XN has generating function 1 ··· GSN (z) = GN(GX1 (z)),

(ii) ESN = ENEX1, and 2 (iii) Var(SN) = EN Var(X1) + Var(N)[EX1] . Proof. For (i),

X1+ +XN GSN (z) = Ez ··· X∞ X + +Xn = E(z 1 ··· N = n)P N = n | { } n=0 X∞ X + +Xn = Ez 1 ··· P N = n { } n=0 X ∞ n = [GX (z)] P N = n 1 { } n=0

= GN(GX1 (z)).

For (ii), we use the properties of generating functions to obtain ESN = G0 (1). Thus, SN

G0 (z) = G0 (z)G0 (G (z)), SN X1 N X1 50 HARRY CRANE by the chain rule, and

G0 (1) = G0 (1)G0 (G (1)) SN X1 N X1

= EX1GN0 (1) = EX1EN. Item (iii) is a good exercise that you should definitely do.  12.2. Infinitely divisible distributions. The class of infinitely divisible distributions com- prises an important family of probability distributions. We say a F is infinitely divisible if, for every n N, there exists a sequence X ,..., Xn of i.i.d. random ∈ 1 variables so that Sn := X1 + + Xn has distribution F. Many of the distributions we have encountered so far are infinitely··· divisible. The easiest way to see this is through the probability generating function. For example, the constant distribution is infinitely divisible, for if X = c with probability one, we can write c c X = + + (n times), n ··· n for every n 1. Less trivially,≥ by the superposition property, we know that the Poisson distribution with parameter λ can be obtained as the sum of n i.i.d. Poisson random variables with parameter λ/n. Therefore, the Poisson distribution is infinitely divisible. On the other hand, the Binomial distribution is not infinitely divisble. To see this, we prove that the Bernoulli distribution is not infinitely divisible. Proposition 12.6. The Bernoulli distribution is not infinitely divisible for any 0 < p < 1. Proof. First of all, if p = 0, then X is deterministic and the above argument says that X is trivially infinitely divisible. For 0 < p < 1, let X be a Bernoulli random variable with parameter p. We have already observed that the generating function for X is G(z) = 1 p + pz. Now, if X is infinitely divisible, then there must be a random variable Y with pgf− p 1/2 GY(z):= G(z) = (1 p + pz) − so that X = Y1 + Y2, where Y1, Y2 are i.i.d. with the same distribution as Y. However, we see that   1 1 2 3/2 G00(z) = p (1 p + pz)− ; Y 2 −2 − and so 1 2 3/2 G00(0) = p (1 p)− < 0, Y −4 − which is not a probability.  When we let the parameter r in the Negative Binomial distribution take any non-negative real value, the Negative-Binomial(p, r) distribution is infinitely divisible. To be specific, let X have the Negative Binomial distribution with parameter (p, r). Then X is equal in distribution to X1 + + Xn, where X1,..., Xn are i.i.d. with Negative-Binomial(p, r/n) distribution. We have··· not yet verified that the Negative Binomial distribution permits non-integer values of r, but you have been left with this task as a homework exercise.