Moments and the Moment Generating Function Math 217 Probability And

made up in the early 19th century from the Greek word for curvature.) Kurtosis is not a particularly important concept, but I mention it here for com- pleteness. Moments and the moment It turns out that the whole distribution for X generating function is determined by all the moments, that is different Math 217 Probability and Statistics distributions can't have identical moments. That's Prof. D. Joyce, Fall 2014 what makes moments important. There are various reasons for studying moments The moment generating function. There is a and the moment generating functions. One of them clever way of organizing all the moments into one that the moment generating function can be used mathematical object, and that object is called the to prove the central limit theorem. moment generating function. It's a function m(t) of a new variable t defined by Moments, central moments, skewness, and kurtosis. The kth moment of a random variable m(t) = E(etX ): k X is defined as µk = E(X ). Thus, the mean is t the first moment, µ = µ1, and the variance can Since the exponential function e has the power se- be found from the first and second moments, σ2 = ries 2 µ2 − µ . 1 1 X tk t2 tk The kth central moment is defined as E((X−µ)k). et = = 1 + t + + ··· + + ··· ; k! 2! k! Thus, the variance is the second central moment. k=0 The higher moments have more obscure mean- ings as k grows. we can rewrite m(t) as follows A third central moment of the standardized ran- m(t) = E(etX ) dom variable X∗ = (X − µ)/σ, (tX)2 (tX)k 3 = E 1 + tX + + ··· + + ··· ∗ 3 E((X − µ) ) 2! k! β3 = E((X ) ) = σ3 t2E(X2) tkE(Xk) = 1 + tE(X) + + ··· + + ··· is called the skewness of X. A distribution that's 2! k! symmetric about its mean has 0 skewness. (In fact t2µ tkµ = 1 + tµ + 2 + ··· + k + ··· all the odd central moments are 0 for a symmetric 1 2! k! distribution.) But if it has a long tail to the right µ2 2 µk k = 1 + µ1t + t + ··· + t + ··· and a short one to the left, then it has a positive 2! k! skewness, and a negative skewness in the opposite Theorem. The kth derivative of m(t) evaluated at situation. t = 0 is the kth moment µ of X. A fourth central moment of X∗, k In other words, the moment generating function E((X − µ)4) β = E((X∗)4) = generates the moments of X by differentiation. 4 σ4 The primary use of moment generating functions is callled kurtosis. A fairly flat distribution with is to develop the theory of probability. For instance, long tails has a high kurtosis, while a short tailed the easiest way to prove the central limit theorem distribution has a low kurtosis. A bimodal distri- is to use moment generating functions. bution has a very high kurtosis. A normal distri- For discrete distributions, we can also compute bution has a kurtosis of 3. (The word kurtosis was the moment generating function directly in terms 1 Zt (X+Y )t Xt Y t of the probability mass function f(x) = P (X=x) Proof: mZ (t) = E(e ) = E(e ) = E(e e ). as Now, since X and Y are independent, so are eXt X m(t) = E(etX ) = etxf(x): and eY t. Therefore, E(eXteY t) = E(eXt)E(eY t) = x mX (t)mY (t). q.e.d. For continuous distributions, the moment generat- Note that this property of convolution on mo- ing function can be expressed in terms of the prob- ment generating functions implies that for a sam- ability density function f as ple sum Sn = X1 + X2 + ··· + Xn, the moment Z 1 generating function is tX tx m(t) = E(e ) = e fX (x) dx: n −∞ mSn (t) = (mX (t)) : We can couple that with the standardizing prop- Properties of moment generating functions. erty to determine the moment generating function Translation. If Y = X + a, then for the standardized sum at m (t) = e m (t): Sn − nµ Y X S∗ = p : n σ n Y t (X+a)t Proof: mY (t) = E(e ) = E(e ) = Xt at at Xt at Since the mean of S is nµ and its standard devia- E(e e ) = e E(e ) = e mX (t). q.e.d. p n tion is σ n, so when it's standardized, we get Scaling. If Y = bX, then p −nµt/(σ n) t m ∗ (t) = e m ( p ) Sn Sn σ n mY (t) = mX (bt): p − nµt/σ pt = e mSn ( σ n ) Y t (bX)t X(bt) Proof: mY (t) = E(e ) = E(e ) = E(e ) = p n = e− nµt/σ m ( pt ) mX (bt). q.e.d. X σ n Standardizing. From the last two properties, if We'll use this result when we prove the central limit X − µ theorem X∗ = σ Some examples. The same definitions for these is the standardized random variable for X, then apply to discrete distributions and continuous ones. th −µt/σ That is, if X is any random variable, then its n mX∗ (t) = e mX (t/σ): moment is n Proof: First translate by −µ to get µn = E(X ) −µt and its moment generating function is mX−µ(t) = e mX (t): 1 X µk Then scale that by a factor of 1/σ to get m (t) = E(etX ) = tk: X k! k=0 m(X−µ)/σ = mX−µ(t/σ) −µt/σ The only difference is that when you compute them, = e mX (t/σ) you use sums for discrete distributions but integrals q.e.d. for continuous ones. That's because expectation is defined in terms of sums or integrals in the two Convolution. If X and Y are independent vari- cases. Thus, in the continuous case ables, and Z = X + Y , then Z 1 n n m (t) = m (t) m (t): µn = E(X ) = x fX (x) dx Z X Y −∞ 2 and The moment generating function for a uniform distribution on [0; 1]. Let X be uniform Z 1 tX tx on [0; 1] so that the probability density function f m(t) = E(e ) = e fX (x) dx: X −∞ has the value 1 on [0; 1] and 0 outside this interval. We'll look at one example of a moment gener- Let's first compute the moments. ating function for a discrete distribution and three for continuous distributions. Z 1 n n µn = E(X ) = x fX (x) dx −∞ The moment generating function for a geo- Z 1 metric distribution. Recall that when indepen- = xn dx dent Bernoulli trials are repeated, each with prob- 0 n+1 1 x 1 ability p of success, the time X it takes to get the = = first success has a geometric distribution. n + 1 0 n + 1 P (X = j) = qj−1p; for j = 1; 2;:::: Next, let's compute the moment generating func- Let's compute the generating function for the geo- tion. metric distribution. 1 Z 1 X tj j−1 tx m(t) = e q p m(t) = e fX (x) dx −∞ j=1 Z 1 1 tx p X = e dx = etjqj q 0 j=1 1 t 1 tx e − 1 = e = t t 1 1 0 X X The series etjqj = (etq)j is a geometric series j=1 j=1 etq Note that the expression for g(t) does not al- with sum . Therefore, 1 − etq low t = 0 since there is a t in the denominator. Still g(0) can be evaluated by using power series or p etq pet L'Hôpital'srule. m(t) = = : q 1 − etq 1 − qet From this generating function, we can find the moments. For instance, µ = m0(0): The derivative 1 The moment generating function for an ex- of m is ponential distribution with parameter λ . pet m0(t) = ; Recall that when events occur in a Poisson pro- (1 − qet)2 cess uniformly at random over time at a rate of p 1 λ events per unit time, then the random variable so µ = m0(0) = = . This agrees with 1 (1 − q)2 p X giving the time to the first event has an expo- what we already know, that the mean of the geo- nential distribution. The density function for X is −λx metric distribution is 1=p. fX (x) = λe , for x 2 [0; 1). 3 Let's compute its moment generating function. From these values of all the moments, we can compute the moment generating function. Z 1 tx m(t) = e fX (x) dx 1 −∞ X µn m(t) = tn Z 1 n! = etxλe−λx dx n=0 0 1 X µ2m Z 1 = t2m = λ e(t−λ)x dx (2m)! m=0 0 1 (t−λ)x 1 X (2m)! 1 e = t2m = λ 2mm! (2m)! t − λ 0 m=0 (t−λ)x 0 1 e e X 1 = lim λ − λ = t2m x!1 t − λ t − λ 2mm! m=0 2 Now if t < λ, then the limit in the last line is 0, so = et =2 in that case λ Thus, the moment generating function for the stan- m(t) = : λ − t dard normal distribution Z is This is a minor, yet important point. The mo- 2 m (t) = et =2: ment generating function doesn't have to be defined Z for all t. We only need it to be defined for t near More generally, if X = σZ +µ is a normal distribu- 0 because we're only interested in its derivatives tion with mean µ and variance σ2, then the moment evaluated at 0.

Load more