1.7.1 Moments and Moment Generating Functions Definition 1.12
Total Page:16
File Type:pdf, Size:1020Kb
18 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY 1.7.1 Moments and Moment Generating Functions Definition 1.12. The nth moment (n N) of a random variable X is defined as ∈ n µn′ = E X The nth central moment of X is defined as µ = E(X µ)n, n − where µ = µ1′ = E X. Note, that the second central moment is the variance of a random variable X, usu- ally denoted by σ2. Moments give an indication of the shape of the distribution of a random variable. Skewness and kurtosis are measured by the following functions of the third and fourth central moment respectively: the coefficient of skewness is given by E(X µ)3 µ γ 3 1 = −3 = 3 ; σ 2 µ2 the coefficient of kurtosis is given by E(X µ)4 µ γ 4 . 2 = −4 3= 2 3 σ − µ2 − Moments can be calculated from the definition or by using so called moment gen- erating function. Definition 1.13. The moment generating function (mgf) of a random variable X is a function M : R [0, ) given by X → ∞ tX MX (t) = E e , provided that the expectation exists for t in some neighborhood of zero. More explicitly, the mgf of X can be written as ∞ tx MX (t)= e fX (x)dx, if X is continuous, Z −∞ tx MX (t)= e P (X = x)dx, if X is discrete. xX ∈X The method to generate moments is given in the following theorem. 1.7. EXPECTED VALUES 19 Theorem 1.7. If X has mgf MX (t), then n (n) E(X )= MX (0), where dn M (n)(0) = M (t) . X dtn X |0 That is, the n-th moment is equal to the n-th derivative of the mgf evaluated at t =0. Proof. Assuming that we can differentiate under the integral sign we may write d d ∞ tx MX (t)= e fX (x)dx dt dt Z −∞ ∞ d tx = e fX (x)dx Z dt −∞ ∞ tx = xe fX (x)dx Z −∞ = E(XetX ). Hence, evaluating the last expression at zero we obtain d M (t) = E(XetX ) = E(X). dt X |0 |0 For n =2 we will get d2 M (t) = E(X2etX ) = E(X2). dt2 X |0 |0 Analogously, it can be shown that for any n N we can write ∈ dn M (t) = E(XnetX ) = E(Xn). dtn X |0 |0 Example 1.14. Find the mgf of X Exp(λ) and use results of Theorem 1.7 to obtain the mean and variance of X.∼ By definition the mgf can be written as t ∞ tx MX (t)=E(e X)= e fX (x)dx. Z −∞ 20 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY For the exponential distribution we have λx fX (x)= λe− I(0, )(x), ∞ where λ R+. Here we used the notation of the indicator function I (x) whose meaning∈ is as follows: X 1, if x ; I (x)= ∈ X X 0, otherwise. That is, λx λe− , if x (0, ); fX (x)= ∈ ∞ 0, otherwise. Hence, integrating by the method of substitution, we get ∞ tx λx ∞ (t λ)x λ MX (t)= e λe− dx = λ e − dx = provided that t < λ. Z Z t λ | | 0 0 − Now, using Theorem 1.7 we obtain the first and the second moments, respectively: λ 1 E(X)= M ′ (0) = = , X (λ t)2 t=0 λ − 2λ 2 E(X2)= M (2)(0) = = . X (λ t)3 t=0 λ2 − Hence, the variance of X is 2 1 1 var(X)=E(X2) [E(X)]2 = = . − λ2 − λ2 λ2 Exercise 1.10. Calculate mgf for Binomial and Poisson distributions. Moment generating functions provide methods for comparing distributions or finding their limiting forms. The following two theorems give us the tools. Theorem 1.8. Let FX (x) and FY (y) be two cdfs whose all moments exist. Then 1. If FX and FY have bounded support, then FX (u) = FY (u) for all u iff E(Xn)=E(Y n) for all n =0, 1, 2,.... 2. Ifthemgfsof X and Y exist and are equal, i.e., MX (t)= MY (t) for all t in some neighborhood of zero, then FX (u)= FY (u) for all u. 1.7. EXPECTED VALUES 21 Theorem 1.9. Suppose that X ,X ,... is a sequence of random variables, each { 1 2 } with mgf MXi (t). Furthermore, suppose that lim MXi (t)= MX (t), for all t in a neighborhood of zero, i →∞ and MX (t) is an mgf. Then, there is a unique cdf FX whose moments are deter- mined by MX (t) and, for all x where FX (x) is continuous, we have lim FXi (x)= FX (x). i →∞ This theorem means that the convergence of mgfs implies convergence of cdfs. Example 1.15. We know that the Binomial distribution can be approximated by a Poisson distribution when p is small and n is large. Using the above theorem we can confirm this fact. The mgf of X Bin(n, p) and of Y Poisson(λ) are, respectively: n ∼ ∼ t n λ(et 1) M n (t) = [pe + (1 p)] , M (t)= e − . X − Y We will show that the mgf of X tends to the mgf of Y , where λ = np. We will need the following useful result given in the lemma: Lemma 1.1. Let a1, a2,... be a sequence of numbers converging to a, that is, limn an = a. Then →∞ a n lim 1+ n = ea. n n →∞ Now, we can write t n M n (t)= pe + (1 p) X − 1 n = 1+ np(et 1) n − λ(et 1) n = 1+ − n λ(et 1) e − = MY (t). n −→→∞ Hence, by Theorem 1.9 the Binomial distribution converges to a Poisson distribu- tion. 22 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY 1.8 Functions of Random Variables If X is a random variable with cdf FX (x), then any function of X, say g(X)= Y is also a random variable. The question then is “what is the distribution of Y ?” The function y = g(x) is a mapping from the induced sample space of the random variable X, , to a new sample space, , of the random variable Y , that is X Y g(x) : . X →Y 1 The inverse mapping g− acts from to and we can write Y X 1 g− (A)= x : g(x) A where A . { ∈ X ∈ } ⊂Y Then, we have P (Y A)= P (g(X) A) ∈ ∈ = P x : g(x) A { ∈ X ∈ } 1 = P X g− (A) . ∈ The following theorem relates the cumulative distribution functions of X and Y = g(X). Theorem 1.10. Let X have cdf FX (x), Y = g(X) and let domain and codomain of g(X), respectively, be = x : f (x) > 0 , and = y : y = g(x) for some x . X { X } Y { ∈X} 1 (a) If g is an increasing function on then F (y)= F g− (y) for y . X Y X ∈Y 1 (b) If g is a decreasing function on , then F (y)=1 F g− (y) for y . X Y − X ∈Y Proof. The cdf of Y = g(X) can be written as F (y)= P (Y y) Y ≤ = P (g(X) y) ≤ = P x : g(x) y { ∈ X ≤ } = fX (x)dx. Z x :g(x) y { ∈X ≤ } (a) If g is increasing, then 1 1 1 x : g(x) y = x : g− g(x) g− (y) = x : x g− (y) . { ∈ X ≤ } { ∈ X ≤ } { ∈ X ≤ } 1.8. FUNCTIONS OF RANDOM VARIABLES 23 So, we can write FY (y)= fX (x)dx Z x :g(x) y { ∈X ≤ } = fX (x)dx Z x :x g−1(y) { ∈X ≤ } g−1(y) = fX (x)dx Z −∞ 1 = FX g− (y) . (b) Now, if g is decreasing, then 1 1 1 x : g(x) y = x : g− g(x) g− (y) = x : x g− (y) . { ∈ X ≤ } { ∈ X ≥ } { ∈ X ≥ } So, we can write FY (y)= fX (x)dx Z x :g(x) y { ∈X ≤ } = fX (x)dx Z x :x g−1(y) { ∈X ≥ } ∞ = fX (x)dx Zg−1(y) 1 =1 F g− (y) . − X Example 1.16. Find the distribution of Y = g(X) = log X, where X ([0, 1]). The cdf of X is − ∼ U 0, for x< 0; F (x)= x, for 0 x 1; X ≤ ≤ 1, for x> 1. For x [0, 1] the function g(x) = log x is defined on = (0, ) and it is decreasing.∈ − Y ∞ y 1 y For y > 0, y = log x implies that x = e− , i.e., g− (y)= e− and − 1 y y F (y)=1 F g− (y) =1 F e− =1 e− . Y − X − X − Hence we may write y FY (y)= 1 e− I(0, ). − ∞ This is exponential distribution function for λ =1. 24 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY For continuous rvs we have the following result. Theorem 1.11. Let X have pdf fX (x) and let Y = g(X), where g is a monotone function. Suppose that fX (x) is continuous on its support = x : fX (x) > 1 X { 0 and that g− (y) has a continuous derivative on support = y : y = } Y { g(x) for some x . Then the pdf of Y is given by ∈X} 1 d 1 fY (y)= fX g− (y) g− (y) I . |dy | Y Proof. d f (y)= F (y) Y dy Y d 1 dy FX g− (y) , if g is increasing; = d 1 1 FX g− (y) , if g is decreasing. dy − 1 d 1 fX g− (y) dy g− (y), if g is increasing; = 1 d 1 fX g− (y) g− (y), if g is decreasing.