<<

made up in the early 19th from the Greek word for curvature.) is not a particularly important concept, but I mention it here for com- pleteness. Moments and the It turns out that the whole distribution for X generating is determined by all the moments, that is different Math 217 and distributions can’t have identical moments. That’s Prof. D. Joyce, Fall 2014 what makes moments important. There are various reasons for studying moments The moment . There is a and the moment generating functions. One of them clever way of organizing all the moments into one that the moment generating function can be used mathematical object, and that object is called the to prove the . moment generating function. It’s a function m(t) of a new t defined by Moments, central moments, , and kurtosis. The kth moment of a m(t) = E(etX ). k X is defined as µk = E(X ). Thus, the is t the first moment, µ = µ1, and the can Since the exponential function e has the power se- be found from the first and moments, σ2 = ries 2 µ2 − µ . ∞ 1 X tk t2 tk The kth is defined as E((X−µ)k). et = = 1 + t + + ··· + + ··· , k! 2! k! Thus, the variance is the second central moment. k=0 The higher moments have more obscure mean- ings as k grows. we can rewrite m(t) as follows A third central moment of the standardized ran- m(t) = E(etX ) dom variable X∗ = (X − µ)/σ,  (tX)2 (tX)k  3 = E 1 + tX + + ··· + + ··· ∗ 3 E((X − µ) ) 2! k! β3 = E((X ) ) = σ3 t2E(X2) tkE(Xk) = 1 + tE(X) + + ··· + + ··· is called the skewness of X. A distribution that’s 2! k! symmetric about its mean has 0 skewness. (In fact t2µ tkµ = 1 + tµ + 2 + ··· + k + ··· all the odd central moments are 0 for a symmetric 1 2! k! distribution.) But if it has a to the right µ2 2 µk k = 1 + µ1t + t + ··· + t + ··· and a short one to the left, then it has a positive 2! k! skewness, and a negative skewness in the opposite Theorem. The kth of m(t) evaluated at situation. t = 0 is the kth moment µ of X. A fourth central moment of X∗, k In other words, the moment generating function E((X − µ)4) β = E((X∗)4) = generates the moments of X by differentiation. 4 σ4 The primary use of moment generating functions is callled kurtosis. A fairly flat distribution with is to develop the theory of probability. For instance, long tails has a high kurtosis, while a short tailed the easiest way to prove the central limit theorem distribution has a low kurtosis. A bimodal distri- is to use moment generating functions. bution has a very high kurtosis. A normal distri- For discrete distributions, we can also compute bution has a kurtosis of 3. (The word kurtosis was the moment generating function directly in terms

1 Zt (X+Y )t Xt Y t of the probability mass function f(x) = P (X=x) Proof: mZ (t) = E(e ) = E(e ) = E(e e ). as Now, since X and Y are independent, so are eXt X m(t) = E(etX ) = etxf(x). and eY t. Therefore, E(eXteY t) = E(eXt)E(eY t) = x mX (t)mY (t). q.e.d. For continuous distributions, the moment generat- Note that this property of on mo- ing function can be expressed in terms of the prob- ment generating functions implies that for a sam- ability density function f as ple sum Sn = X1 + X2 + ··· + Xn, the moment Z ∞ generating function is tX tx m(t) = E(e ) = e fX (x) dx. n −∞ mSn (t) = (mX (t)) . We can couple that with the standardizing prop- Properties of moment generating functions. erty to determine the moment generating function Translation. If Y = X + a, then for the standardized sum at m (t) = e m (t). Sn − nµ Y X S∗ = √ . n σ n Y t (X+a)t Proof: mY (t) = E(e ) = E(e ) = Xt at at Xt at Since the mean of S is nµ and its standard devia- E(e e ) = e E(e ) = e mX (t). q.e.d. √ n tion is σ n, so when it’s standardized, we get Scaling. If Y = bX, then √ −nµt/(σ n) t m ∗ (t) = e m ( √ ) Sn Sn σ n mY (t) = mX (bt). √ − nµt/σ √t = e mSn ( σ n ) Y t (bX)t X(bt) Proof: mY (t) = E(e ) = E(e ) = E(e ) = √  n = e− nµt/σ m ( √t ) mX (bt). q.e.d. X σ n Standardizing. From the last two properties, if We’ll use this result when we prove the central limit X − µ theorem X∗ = σ Some examples. The same definitions for these is the standardized random variable for X, then apply to discrete distributions and continuous ones. th −µt/σ That is, if X is any random variable, then its n mX∗ (t) = e mX (t/σ). moment is n Proof: First translate by −µ to get µn = E(X )

−µt and its moment generating function is mX−µ(t) = e mX (t). ∞ X µk Then scale that by a factor of 1/σ to get m (t) = E(etX ) = tk. X k! k=0 m(X−µ)/σ = mX−µ(t/σ) −µt/σ The only difference is that when you compute them, = e mX (t/σ) you use sums for discrete distributions but q.e.d. for continuous ones. That’s because expectation is defined in terms of sums or integrals in the two Convolution. If X and Y are independent vari- cases. Thus, in the continuous case ables, and Z = X + Y , then Z ∞ n n m (t) = m (t) m (t). µn = E(X ) = x fX (x) dx Z X Y −∞

2 and The moment generating function for a uni- form distribution on [0, 1]. Let X be uniform Z ∞ tX tx on [0, 1] so that the probability density function f m(t) = E(e ) = e fX (x) dx. X −∞ has the value 1 on [0, 1] and 0 outside this .

We’ll look at one example of a moment gener- Let’s first compute the moments. ating function for a discrete distribution and three for continuous distributions. Z ∞ n n µn = E(X ) = x fX (x) dx −∞ The moment generating function for a geo- Z 1 distribution. Recall that when indepen- = xn dx dent Bernoulli trials are repeated, each with prob- 0 n+1 1 x 1 ability p of success, the X it takes to get the = = first success has a . n + 1 0 n + 1

P (X = j) = qj−1p, for j = 1, 2,.... Next, let’s compute the moment generating func- Let’s compute the generating function for the geo- tion. metric distribution.

∞ Z ∞ X tj j−1 tx m(t) = e q p m(t) = e fX (x) dx −∞ j=1 Z 1 ∞ tx p X = e dx = etjqj q 0 j=1 1 t 1 tx e − 1 = e = t t ∞ ∞ 0 X X The series etjqj = (etq)j is a geometric series j=1 j=1 etq Note that the expression for g(t) does not al- with sum . Therefore, 1 − etq low t = 0 since there is a t in the denominator. Still g(0) can be evaluated by using power series or p etq pet L’Hˆopital’srule. m(t) = = . q 1 − etq 1 − qet

From this generating function, we can find the moments. For instance, µ = m0(0). The derivative 1 The moment generating function for an ex- of m is ponential distribution with λ . pet m0(t) = , Recall that when events occur in a Poisson pro- (1 − qet)2 cess uniformly at random over time at a rate of p 1 λ events per unit time, then the random variable so µ = m0(0) = = . This agrees with 1 (1 − q)2 p X giving the time to the first event has an expo- what we already know, that the mean of the geo- nential distribution. The density function for X is −λx metric distribution is 1/p. fX (x) = λe , for x ∈ [0, ∞).

3 Let’s compute its moment generating function. From these values of all the moments, we can com- pute the moment generating function. Z ∞ tx m(t) = e fX (x) dx ∞ −∞ X µn m(t) = tn Z ∞ n! = etxλe−λx dx n=0 0 ∞ X µ2m Z ∞ = t2m = λ e(t−λ)x dx (2m)! m=0 0 ∞ (t−λ)x ∞ X (2m)! 1 e = t2m = λ 2mm! (2m)! t − λ 0 m=0  (t−λ)x  0 ∞ e e X 1 = lim λ − λ = t2m x→∞ t − λ t − λ 2mm! m=0 2 Now if t < λ, then the limit in the last line is 0, so = et /2 in that case λ Thus, the moment generating function for the stan- m(t) = . λ − t dard Z is

This is a minor, yet important point. The mo- 2 m (t) = et /2. ment generating function doesn’t have to be defined Z for all t. We only need it to be defined for t near More generally, if X = σZ +µ is a normal distribu- 0 because we’re only interested in its tion with mean µ and variance σ2, then the moment evaluated at 0. generating function is

2 2 The moment generating function for the gX (t) = exp(µt + σ t /2). standard normal distribution. Let Z be a ran- dom variable with a standard normal distribution. Math 217 Home Page at Its probability density function is http://math.clarku.edu/~djoyce/ma217/

1 −x2/2 fZ (x) = √ e . 2π Its moments can be computed from the definition, but it takes repeated applications of integration by parts to compute

Z ∞ 1 n −x2/2 µn = √ x e dx. −∞ 2π We won’t do that computation here, but it turns out that when n is odd, the is 0, so µn is 0 if n is odd. On the other hand when n is even, say n = 2m, then it turns out that

(2m)! µ = . 2m 2mm!

4