Econ 508B: Lecture 5 Expectation, MGF and CGF

Home , Cumulant, Expected value

Hongyi Liu

Washington University in St. Louis

July 31, 2017

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 1 / 23 Outline

1 Expected Values

2 Moment Generating Functions

3 Cumulative Generating Functions

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 2 / 23 Outline

1 Expected Values

2 Moment Generating Functions

3 Cumulative Generating Functions

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 3 / 23 Motivation:Probability v.s. Expectation

To start with, people probably have a better understanding for an expected value than for probability. Like optimization and approximation problems, they are phrased in terms of expectations. Expectations are indeed seen as special cases and are treated with uniformity and economy.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 4 / 23 Definition 1.1 Let X be a random variable on (Ω, F,P ).The expected value of X, EX, is defined as Z EX = XdP, Ω given the integral is well-defined, i.e., at least one of the two quantities R X+dP and R X−dP is finite.

Proposition 1.1 (Change of variable formula) If X is a random variable on (Ω, F,P ) and g : R → R is Borel measurable and Y = g(X) is also a random variable on (Ω, F,P ). R |Y |dP = R |g(x)|P (dx) = R |y|P (dy). Ω R X R Y R If Ω |Y |dP < ∞, then Z Z Z Y dP = h(x)PX (dx) = yPY (dy). Ω R R

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 5 / 23 Moment

Deﬁnition 1.2 th th For any positive integer n, the n moment µn and the n central 0 moment µn of a random variable X is deﬁned by

n 0 n µn ≡ EX , µn ≡ E(X − EX)

provided the expectation is well-deﬁned.

In particular, the variance of a random variable X is the 2th central moment, namely V ar(X) = E(X − EX)2, provided EX2 < ∞.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 6 / 23 Outline

1 Expected Values

2 Moment Generating Functions

3 Cumulative Generating Functions

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 7 / 23 The payoﬀ of MGF is that it gives the direct connection between MGF and the moments of a random variable X as follows.

MGF

Deﬁnition 2.1 The moment generating function (MGF) of a random variable X is tX MX (t) ≡ E(e ), for all t ∈ R

etX is always non-negative, therefore, E(etX ) is well-deﬁned but could be inﬁnity (Why?).

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 8 / 23 MGF

Deﬁnition 2.1 The moment generating function (MGF) of a random variable X is tX MX (t) ≡ E(e ), for all t ∈ R

etX is always non-negative, therefore, E(etX ) is well-defined but could be infinity (Why?). The payoff of MGF is that it gives the direct connection between MGF and the moments of a random variable X as follows.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 8 / 23 non-negative case

Proposition 2.1 Let X be a non-negative random variable t > 0. Then

∞ n X t µn M (t) ≡ E(etX ) = X n! n=0

tX P∞ tnXn Proof: By Taylor expansion, e = n=0 n! and X is non-negative, this comes from M.C.T.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 9 / 23 bounded case

Proposition 2.2

Let X be a random variable and let MX (t) be finite for all |t| < , for some > 0, then (1) E|X|n < ∞ for all n ≥ 1, P∞ n µn (2) MX (t) = n=0 t n! for all |t| < , th (3) MX (·)is infinitely differentiable on (−, +) and for r ∈ N, the r derivative of MX (·) is

∞ n (r) X t M (t) = µ = E(etX Xr)for |t| < . X n+r n! n=0 In particular, (r) r MX (0) = µr = EX

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 10 / 23 Pn (tx)j |tx| (2) : Notice that | j=0 j! | ≤ e for all x ∈ R and n ∈ N, then D.C.T. implies (2) holds.

(3) : The derivative of MX (·) can be found by term-by-term diﬀerentiation of the power series. Hence,

r ∞ ∞ r n (r) d X µn X d (t ) µn M (t) = ( tn ) = X dtr n! dtr n! n=0 n=0 ∞ ∞ X tn−r X tn = µ = µ n (n − r)! n+r n! n=0 n=0

Proof

|t|n|X|n |tX| (1) : According to MX (t) is ﬁnite and the fact that n! ≤ e for all n ∈ N, then E(e|tX|) ≤ E(etX ) + E(e−tX ) < ∞ for |t| <

Therefore, choosing a t ∈ (−, +) leads to the outcome of (1).

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 11 / 23 (3) : The derivative of MX (·) can be found by term-by-term diﬀerentiation of the power series. Hence,

r ∞ ∞ r n (r) d X µn X d (t ) µn M (t) = ( tn ) = X dtr n! dtr n! n=0 n=0 ∞ ∞ X tn−r X tn = µ = µ n (n − r)! n+r n! n=0 n=0

Proof

|t|n|X|n |tX| (1) : According to MX (t) is ﬁnite and the fact that n! ≤ e for all n ∈ N, then E(e|tX|) ≤ E(etX ) + E(e−tX ) < ∞ for |t| <

Therefore, choosing a t ∈ (−, +) leads to the outcome of (1). Pn (tx)j |tx| (2) : Notice that | j=0 j! | ≤ e for all x ∈ R and n ∈ N, then D.C.T. implies (2) holds.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 11 / 23 Proof

|t|n|X|n |tX| (1) : According to MX (t) is ﬁnite and the fact that n! ≤ e for all n ∈ N, then E(e|tX|) ≤ E(etX ) + E(e−tX ) < ∞ for |t| <

Therefore, choosing a t ∈ (−, +) leads to the outcome of (1). Pn (tx)j |tx| (2) : Notice that | j=0 j! | ≤ e for all x ∈ R and n ∈ N, then D.C.T. implies (2) holds.

(3) : The derivative of MX (·) can be found by term-by-term diﬀerentiation of the power series. Hence,

r ∞ ∞ r n (r) d X µn X d (t ) µn M (t) = ( tn ) = X dtr n! dtr n! n=0 n=0 ∞ ∞ X tn−r X tn = µ = µ n (n − r)! n+r n! n=0 n=0

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 11 / 23 Remark 2.1

If MX (t) finite within a finite circle is fulfilled, then all the moments {µn}n≥1 of X are determined and its probability distribution as well. However, in general, probability distributions are not completely determined by their moments.

Example 2.1 Let X ∼ N(0, 1), then for all t ∈ R,

Z +∞ ∞ 2 k tx 1 −x2/2 t2/2 X (t ) 1 MX (t) = e √ e dx = e = . 2π k! 2k −∞ k=0 ( 0 if n is odd Thus µn = (2k)! k!2k if n = 2k, k = 1, 2, ...

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 12 / 23 Example 2.1 Let X ∼ N(0, 1), then for all t ∈ R,

Z +∞ ∞ 2 k tx 1 −x2/2 t2/2 X (t ) 1 MX (t) = e √ e dx = e = . 2π k! 2k −∞ k=0 ( 0 if n is odd Thus µn = (2k)! k!2k if n = 2k, k = 1, 2, ...

Remark 2.1

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 12 / 23 Intuitively speaking, if the sequence of moments does not grow so quickly, then the distribution is determined by its moments. Example 2.2 A standard example of two distinct distributions with the same moment is based on the density of lognormal distribution (Billingsley, Probability and Measure, chapter 30.) 1 f(x) = √ 1/x exp(−(log x)2/2) 2π And its perturbed density:

fa(x) = f(x)(1 + a sin(2π log x))

They have the same moments and the nth moment of each of them is exp(n2/2). Proof: Homework!

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 13 / 23 Joint moment generating function

Deﬁnition 2.2 The joint moment generating function of a random vector X = (X1, ..., Xk) is deﬁned by

t1X1+···tkXk MX1,...,Xk (t1, ..., tk) ≡ E(e ),

for all t1, ..., tk ∈ R. And the deﬁnition applied here for MX1,...,Xk (·) is

similar to MX (t), namely the MGF of X ’exists’ if MX1,...,Xk (·) is ﬁnite d in a neighborhood of the origin of R ,||t|| < t0, t0 > 0.

k r X 1 X M (t) = 1 + κit + κijt t + ··· X i 2 i j i=1 i,j=1

i ···ir i ir where κ 1 = E(Y 1 ··· Y ) for i1, ..., ir = 1, ..., k, which is referred to as the moment about the origin of order r of X, moments of order r form an array, symmetrical w.r.t permutations of indices.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 14 / 23 Moreover, ∂rM (t) i1···ir X κ = |t=0 ∂ti1 ··· ∂tir The relationship

MX (t) = MX1 × · · · × MXk holds if and only if the components of X are independent.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 15 / 23 Alternatively, by the deﬁnition of expectation and MGF, random variable X, occurs −5 with probability 1/8, occurs 1 with probability 1/4, and occurs 7 with probability 5/8. Thus its E(Xn) is trivially n (n) 1 n 1 5 n E[X ] = MX (0) = 8 (−5) + 4 + 8 7 .

Example

1 −5t 1 t 5 7t n Suppose MX (t) = 8 e + 4 e + 8 e . E(X )? Answer: 1 1 5 M (n)(t) = (−5)ne−5t + et + 7ne7t X 8 4 8 1 1 5 E[Xn] = M (n)(0) = (−5)n + + 7n X 8 4 8

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 16 / 23 Example

1 −5t 1 t 5 7t n Suppose MX (t) = 8 e + 4 e + 8 e . E(X )? Answer: 1 1 5 M (n)(t) = (−5)ne−5t + et + 7ne7t X 8 4 8 1 1 5 E[Xn] = M (n)(0) = (−5)n + + 7n X 8 4 8 Alternatively, by the deﬁnition of expectation and MGF, random variable X, occurs −5 with probability 1/8, occurs 1 with probability 1/4, and occurs 7 with probability 5/8. Thus its E(Xn) is trivially n (n) 1 n 1 5 n E[X ] = MX (0) = 8 (−5) + 4 + 8 7 .

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 16 / 23 Outline

1 Expected Values

2 Moment Generating Functions

3 Cumulative Generating Functions

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 17 / 23 Cumulant Generating Function

Deﬁnition 3.1

Let MX (t) be finite for |t| < t0. The cumulant generating function of X is defined as KX (t) = log MX (t) The CGF also completely determines the distribution of X and it can be expanded in a power series with same radius of convergence R ≥ t0 as follows t2 t3 K (t) = κ t + κ + κ + ···. X 1 2 2! 3 3! r The coefficient κr of t /r! is referred to as the cumulant of order r of X, dr κ = κ (X) = K (t)| r r dtr X t=0

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 18 / 23 Multivariate Cumulative generating function

When X = (X1, ..., Xk) is a vector, the CGF is deﬁned as

KX (t) = logMX (t)

If MX (t) exists, then the CGF admits a multivariate Taylor series expansion in a neighborhood of the origin, with the coeﬃcients corresponding to cumulants of X. Deﬁnition 3.2 The joint cumulant of order r is

∂rK (t) i1,i2,···,ir X κ = |t=0. ∂ti1 ··· ∂tir

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 19 / 23 Sums of I.I.D. random variables

Pn Let Sn = i=1 Xi and MXi exists, then

n MSn (t) = (MX (t)) ,KSn (t) = nKX (t),

Also, κr(Sn) = nκr(X) = nκr. In a word, when working with sums of i.i.d random variables, its cumulants are simply times n by each random variable’s cumulants.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 20 / 23 Example 3.1 Let X ∼ N(µ, σ2) and then

2 2 µt+σ2 t 2 t M (t) = e 2 ,K (t) = µt + σ X X 2

2 Therefore, κ1 = µ, κ2 = σ , κr = 0 for r = 3, 4, ....

Cumulants of order larger than 2 are all zero if and only if X has a normal distribution.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 21 / 23 Location Shifts

Shifting from X to X + a induce the corresponding transformation of MX (·) and KX (·), respectively

t(X+a) at MX+a(t) = E(e ) = e MX (t), and KX+a(t) = at + KX (t).

Only the ﬁrst cumulant is aﬀected, i.e., κ1(X + a) = a + κ1.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 22 / 23 Scale Changes

Scaling change of X by b, b > 0 obtains that X/b. It follows that

tX/b MX/b(t) = E(e ) = MX (t/b),

KX/b(t) = KX (t/b), r r κr(X/b) = κr(X)/b = κr/b .

All cumulants are aﬀected by a scale change unless b = 1.

Hongyi Liu (Washington University in St. Louis)Math Camp 2017 Stats July 31, 2017 23 / 23