Lecture 6: Expected Value and Moments Discrete Random Variable: X Sta 111 E[X ] = Xp(X = X) All X Colin Rundel

The expected value of a random variable is deﬁned as follows Lecture 6: Expected Value and Moments Discrete Random Variable: X Sta 111 E[X ] = xP(X = x) all x Colin Rundel

May 21, 2014 Continous Random Variable: Z E[X ] = xP(X = x)dx all x

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 1 / 33

Expected Value Expected Value Expected Value of a function Properties of Expected Value

The expected value of a function of a random variable is deﬁned as follows Constants - E(c) = c if c is constant

Discrete Random Variable: Indicators - E(IA) = P(A) where IA is an indicator function X E[f (X )] = f (x)P(X = x) all x Constant Factors - E(cX ) = cE(x)

Continous Random Variable: Addition - E(X + Y ) = E(X ) + E(Y ) Z E[f (X )] = f (x)P(X = x)dx all x Multiplication - E(XY ) = E(X )E(Y ) if X and Y are independent.

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 2 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 3 / 33 Expected Value Expected Value Variance Properties of Variance

What is Var(aX + b) when a and b are constants? Another common property of random variables we are interested in is the Variance which measures the squared deviation from the mean.

Var(X ) = E (X − E(X ))2 = E(X − µ)2 One common simpliﬁcation:

Var(X ) = E(X − µ)2 = E(X 2 − 2µX + µ2) = E(X 2) − 2µE(X ) + µ2 = E(X 2) − µ2 Which gives us: Var(aX ) = a2Var(X ) Standard Deviation: Var(X + c) = Var(X ) SD(X ) = pVar(X ) Var(c) = 0

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 4 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 5 / 33

Expected Value Expected Value Properties of Variance, cont. Covariance

What about when X and Y are not independent?

E(XY ) 6= E(X )E(Y ) ⇒ E(XY ) − µx µy 6= 0

This quantity is known as Covariance, and is roughly speaking a What about Var(X + Y )? generalization of variance to two variables

Cov(X , Y ) = E[(X − E(X ))(Y − E(Y ))]

= E[(X − µx )(Y − µy )]

= E[XY + µx µy − X µy − Y µx ])

= E(XY ) − µx µy

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 6 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 7 / 33 Expected Value Expected Value Properties of Covariance Properties of Variance, cont.

A general formula for the variance of the linear combination of two Cov(X , Y ) = E[(X − µx )(Y − µy )] = E(XY ) − µx µy random variables:

Cov(X , Y ) = 0 if X and Y are independent

Cov(X , c) = 0

Cov(X , X ) = Var(X )

From which we can see that

Cov(aX , bY ) = ab Cov(X , Y ) Var(X + Y ) = Var(X ) + Var(Y ) + Cov(X , Y ) Var(X − Y ) = Var(X ) + Var(Y ) − Cov(X , Y ) Cov(X + a, Y + b) = Cov(X , Y )

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 8 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 9 / 33

Expected Value Expected Value Properties of Variance, cont. Bernoulli Random Variable

For a completely general formula for the variances of a linear combination of n random variables:

n ! n n X X X Let X ∼ Bern(p), what is E(X ) and Var(X )? Var ci Xi = Cov(ci Xi , cj Xj ) i=1 i=1 j=1 n n n X 2 X X = ci Var(Xi ) + ci cj Cov(Xi , Xj ) i=1 i=1 j=1 i6=j

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 10 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 11 / 33 Expected Value Expected Value Binomial Random Variable Hypergeometric Random Variable - E(X )

Lets consider a simple case where we have an urn with m black marbles and N − m white marbles. Let Bi be an indicator variable for the ith marble being black. ( 1 if ith draw is black Let X ∼ Binom(n, p), what is E(X ) and Var(X )? Bi = 0 otherwise We can redeﬁne X = Pn Y where Y , ··· , Y ∼ Bern(p), and since we i=1 i 1 n In the case where N = 2 and m = 1 what is P(Bi ) = 1 for all i? are sampling with replacement all Yi and Yj are independent. Ω = {BW , WB}

P(B1) = 1/2, P(B2) = 1/2

P(W1) = 1/2, P(W2) = 1/2

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 12 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 13 / 33

Expected Value Expected Value Hypergeometric Random Variable - E(X ) - cont. Hypergeometric Random Variable - E(X ) - cont.

What about when N = 3 and m = 1?

Ω = {BW1W2, BW2W1, W1BW2, W2BW1, W1W2B, W2W1B}

Let X ∼ Hypergeo(N, m, n) then X = B1 + B2 + ··· + Bn P(B1) = 1/3, P(B2) = 1/3, P(B3) = 1/3

P(W1) = 2/3, P(W2) = 2/3, P(W3) = 2/3

Proposition

P(Bi = 1) = m/N for all i

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 14 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 15 / 33 Expected Value Expected Value Hypergeometric Random Variable - E(X ) - 2nd way Hypergeometric Random Variable - Var(X )

Let X ∼ Hypergeo(N, m, n), what is Var(X )?

n ! n n X X X Var(X ) = Var Bi = Cov(Bi , Bj ) i=1 i=1 j=1 n n n X X X = Var(Bi ) + Cov(Bi , Bj ) Let X ∼ Hypergeo(N, m, n), what is E(X )? i=1 i=1 j=1 i6=j

n n X X m m Var(Bi ) = 1 − N N i=1 i=1 n X m(N − m) nm(N − m) = = N2 N2 i=1

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 16 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 17 / 33

Expected Value Expected Value Hypergeometric Random Variable - Variance, cont. Hypergeometric Random Variable - Variance, cont.

n n n n X X X X Let X ∼ Hypergeo(N, m, n) Cov(Bi , Bj ) = E(Bi BJ ) − E(Bi )E(Bj ) i=1 j=1 i=1 j=1 i6=j i6=j n n n n n X X X X X Var(X ) = Var(Bi ) + Cov(Bi , Bj ) = P(Bi = 1 ∩ Bj = 1) − P(Bi = 1)P(Bj = 1) i=1 i=1 j=1 i=1 j=1 i6=j i6=j nm(N − m) m(N − m) n n = − n(n − 1) X X m m N2 N2(N − 1) = P(Bi = 1)P(Bj = 1|Bi = 1) − N N nm(N − m)(N − 1) − nm(n − 1)(N − m) i=1 j=1 = i6=j N2(N − 1) n n n n X X m m − 1 m m X X Nm(m − 1) − m2(N − 1) nm(N − m) N − n = − = = N N − 1 N N N2(N − 1) N2 N − 1 i=1 j=1 i=1 j=1 m m N − n i6=j i6=j = n 1 − n n N N N − 1 X X m(N − m) m(N − m) = − = −n(n − 1) N2(N − 1) N2(N − 1) i=1 j=1 i6=j

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 18 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 19 / 33 Expected Value Expected Value Poisson Random Variable - E(X ) Poisson Random Variable - Var(X )

Let X ∼ Poisson(λ), what is E(X )? Let X ∼ Poisson(λ), what is Var(X )?

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 20 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 21 / 33

Expected Value Moments St. Petersburg Lottery Moments

Some deﬁnitions,

We start with $1 on the table and a coin. Raw moment: 0 n µn = E(X ) At each step: Toss the coin; if it shows Heads, take the money. If it shows Tails, I double the money on the table. Central moment: n µn = E[(X − µ) ] Let X be the amount you win, what is E(X )? Normalized / Standardized moment:

µn σn

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 22 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 23 / 33 Moments Moments Common Moments of Interest Third and Forth Central Moments

Zeroth Moment: 0 µ0 = µ0 = 1 3 3 2 2 3 µ3 = E[(X − µ) ] = E(X − 3X µ + 3X µ − µ ) First Moment: 0 3 2 3 3 µ1 = E(X ) = µ = E(X ) − 3µE(X ) + 3µ − µ 3 2 3 µ1 = E(X − µ) = 0 = E(X ) − 3µσ − µ Second Moment: 0 2 3 2 = µ3 − 3µσ − µ µ2 = E[(X − µ) ] = Var(X ) 0 0 2 µ2 − (µ1) = Var(X ) Third Moment: µ3 Skewness(X ) = 3 4 4 3 2 2 3 4 σ µ4 = E[(X − µ) ] = E(X − 4X µ + 6X µ − 4X µ + µ ) Fourth Moment: 4 3 2 2 4 4 µ4 = E(X ) − 4µE(X ) + 6µ E(X ) − 4µ + µ Kurtosis(X ) = 4 σ 4 3 2 2 2 2 2 4 µ4 = E(X ) − 4µE(X ) + 2µ (σ + µ ) + 4µ σ + µ Ex. Kurtosis(X ) = 4 − 3 σ = µ0 − 4µµ0 + 6µ2σ2 + 3µ4 Note that some moments do not exist, which is the case when E(X n) does not 4 3 converge.

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 24 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 25 / 33

Moments Moments Moment Generating Function Moment Generating Function - Poisson

The moment generating function of a discrete random variable X is deﬁned for all real values of t by tX X tx MX (t) = E e = e P(X = x) x This is called the moment generating function because we can obtain the moments of X by successively diﬀerentiating M (t) wrt t and then X Let X ∼ Pois(λ) then evaluating at t = 0.

0 0 MX (0) = E[e ] = 1 = µ0 d d M0 (t) = E[etX ] = E etX = E[XetX ] X dt dt 0 0 0 MX (0) = E[Xe ] = E[X ] = µ1 d d d M00(t) = M0 (t) = E[XetX ] = E (XetX ) = E[X 2etX ] X dt X dt dt 00 2 0 2 0 MX (0) = E[X e ] = E[X ] = µ2

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 26 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 27 / 33 Moments Moments Moment Generating Function - Poisson Skewness Moment Generating Function - Poisson Kurtosis

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 28 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 29 / 33

Moments Moments Moment Generating Function - Binomial Moment Generating Function - Normal

2 Let X ∼ Binom(n, p) then Let Z ∼ N (0, 1) and X ∼ N (µ, σ ) where X = µ + σZ then Z ∞ tZ tx 1 −x2/2 MZ (t) = E[e ] = e √ e dx n ! n ! −∞ 2π tX X tk n k n−k X n t k n−k MX (t) = E[e ] = e p (1 − p) = (pe ) (1 − p) Z ∞ 2 k k 1 − x −tx k=0 k=0 = √ e 2 dx 2π −∞ = [pet + (1 − p)]n Z ∞ 2 2 1 − (x−t) + t = √ e 2 2 dx 2π −∞ Z ∞ 2 0 t t n−1 t2/2 1 − (x−t) MX (t) = npe (pe + 1 − p) = e √ e 2 dx 0 0 −∞ 2π MX (0) = µ1 = np = E(X ) 2 = et /2

00 t 2 t n−2 t t n−1 MX (t) = n(n − 1)(pe ) (pe + 1 − p) + npe (pe + 1 − p) 00 0 2 2 tX t(µ+σZ) tµ tσZ MX (0) = µ2 = n(n − 1)p + np = E(X ) MX (t) = E[e ] = E[e ] = E[e e ] tµ tσZ tµ = e E[e ] = e MU (tσ) 0 0 2 2 2 2 2 2 Var(X ) = µ − (µ ) = n(n − 1)p + np − n p 2 2 t σ 2 1 = etµet σ /2 = exp tµ + = np((n − 1)p + 1 − np) = np(1 − p) 2

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 30 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 31 / 33 Moments Moments Moment Generating Function - Normal, cont. Moment Generating Function - Normal, cont.

0 000 3 2 µ3 = MX (0) = µ + 3µσ µ0 = M0000(0) = µ4 + 6µ2σ2 + 3σ4 2 2 4 X 0 t σ 2 MX (t) = exp(µt + )(µ + tσ ) 2 µ µ0 − 3µσ2 − µ3 0 0 Skewness(X ) = 3 = 3 MX (0) = µ1 = µ σ3 σ3 µ3 + 3µσ2 − 3µσ2 − µ3 = = 0 σ3 2 2 2 2 00 t σ 2 t σ 2 2 MX (t) = exp µt + σ + exp µt + (µ + tσ ) 2 2 0 0 2 2 4 µ4 µ4 − 4µµ3 + 6µ σ + 3µ 00 0 2 2 Kurtosis(X ) = 4 = 4 MX (0) = µ2 = µ + σ σ σ µ0 − 4µ(µ3 + 3µσ2) + 6µ2σ2 + 3µ4 = 4 σ4 (µ4 + 6µ2σ2 + 3σ4) − 6µ2σ2 − 1µ4 3σ4 = = = 3 σ4 σ4 Ex. Kurtosis(X ) = 0

Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 32 / 33 Sta 111 (Colin Rundel) Lecture 6 May 21, 2014 33 / 33