<<

Generating

T Definition Let X =(X1,...,Xn) be a random vector and T n t =(t1,...,tn) R .Themoment generating function (MGF) is STAT/MTHE 353: 5 – Moment Generating 2 defined by tT X Functions and Multivariate MX (t)=E e for all t for which the expectation exists (i.e., finite).

T. Linder Remarks:

n tiXi Queen’s University MX (t)=E e i=1 P T For 0 =(0,..., 0) ,wehave MX (0)=1. Winter 2017 If X is a discrete with finitely many values, then tT X n MX (t)=E e is always finite for all t R . 2 We will always assume that the distribution of X is such that M (t) is finite for all t ( t ,t )n for some t > 0. X 2 0 0 0

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 1/34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 2/34

Connection with moments The single most important property of the MGF is that is uniquely determines the distribution of a random vector: Let k ,...,k be nonnegative integers and k = k + + k . Then 1 n 1 ··· n Theorem 1 k k @ @ t1X1+ +tnXn MX (t)= E e ··· k1 kn k1 kn Assume MX (t) and MY (t) are the MGFs of the random vectors X and @t @tn @t @tn 1 ··· 1 ··· Y and such that M (t)=M (t) for all t ( t ,t )n.Then k X Y 0 0 @ t X + +t X 2 = E e 1 1 ··· n n k1 kn n @t1 @tn FX (z)=FY (z) for all z ✓ ··· ◆ R k1 k t X + +t X 2 = E X X n e 1 1 ··· n n 1 ··· n where FX and FY are the joint cdfs of X and Y . Setting t = 0 =(0,...,0)T , we get

Remarks: k @ k1 kn MX (t) = E X X k1 kn t=0 1 n n @t @tn ··· FX (z)=FY (z) for all z R clearly implies MX (t)=MY (t). 1 2 ··· Thus M (t)=M (t) F (z)=F (z) X Y () X Y Most often we will use the theorem for random variables instead of For a (scalar) random variable X we obtain the kth moment of X:

random vectors. In this case, MX (t)=MY (t) for all t ( t0,t0) k 2 d k implies FX (z)=FY (z) for all z R. MX (t) = E X 2 dtk t=0 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 3/34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 4/34 Theorem 2 n Assume X1,...,Xm are independent random vectors in R and let X = X + + X .Then 1 ··· m m

MX (t)= MXi (t) i=1 Example: MGF for X Gamma(r, ) and X + + X where the X Y ⇠ 1 ··· m i are independent and X Gamma(r , ). i ⇠ i Proof: Example: MGF for X Poisson() and X1 + + Xm where the Xi T T ⇠ ··· t X t (X1+ +Xm) are independent and X Gamma( ). Also, use the MGF to find MX (t)=E e = E e ··· i i 2 ⇠ T T E(X), E(X ),andVar(X). = Eet X1 et Xm T ··· T = Eet X1 E et Xm ···

= MX1 (t) MXm (t) ··· ⇤

Note: This theorem gives us a powerful tool for determining the distribution of the sum of independent random variables.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 5/34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 6/34

Theorem 3 Applications to Normal Distribution Assume X is a random vector in Rn, A is an m n real matrix and ⇥ b Rm. Then the MGF of Y = AX + b is given at t Rm by Let X N(0, 1).Then 2 2 ⇠ T t b T tX 1 tx 1 x2/2 MY (t)=e MX (A t) MX (t)=E(e )= e e dx p2⇡ Z1 1 1 1 (x2 2tx) 1 1 1 (x t)2 t2 Proof: = e 2 dx = e 2 dx p2⇡ p2⇡ Z1 Z1 ⇥ ⇤ tT Y tT (AX+b) MY (t)=E e = E e t2/2 1 1 1 (x t)2 = e e 2 dx tT b tTAX tT b (AT t)T X p2⇡ = e E e = e E e Z1 T N(t,1) pdf = et bM (AT t) X ⇤ 2 = et /2 | {z }

We obtain that for all t R Note: In the scalar case Y = aX + b we obtain 2 t2/2 MX (t)=e tb MY (t)=e MX (at)

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 7/34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 8/34 Moments of a standard normal random variable Sum of independent normal random variables

z zk Recall the power series expansion e = 1 valid for all z R. Recall that if X N(0, 1) and Y = X + µ,where > 0,then k=0 k! 2 ⇠ Apply this to z = tX Y N(µ, 2). Thus the MGF of a N(µ, 2) random variable is P ⇠ 2 1 k tµ tµ (t) /2 tX (tX) MY (t)=e MX (t) = e e MX (t)=E(e )=E k! tµ+t22/2 ✓ kX=0 ◆ = e 1 E (tX)k 1 E Xk = = tk k! k! k=0 ⇥ ⇤ k=0 2 X X Let X1,...,Xm be independent r.v.’s with Xi N(µi, ) and set ⇠ i X = X1 + + Xm. Then and to z = t2/2 ··· m m 2 i 2 2 m m 2 2 2 1 t /2 tµi+t i /2 t i=1 µi + i=1 i t /2 t /2 MX (t)= MX (t) = e = e MX (t)=e = i i! i=1 i=1 P P i=0 X Y Y the coecient of tk,fork =1, 2,... we obtain This implies m m 2 k! X N µi, i ,keven, ⇠ k 2k/2(k/2)! ✓ i=1 i=1 ◆ E(X )=8 X X i.e., X + + X is normal with µ = m µ and <>0,kodd. 1 m X i=1 i 2··· m 2 X = i . > i=1 P STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution: 9/34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 10 / 34 P

Multivariate Normal Distributions For any nonnegative definite n n matrix C the following hold: ⇥

Linear Algebra Review (1) C has n nonnegative eigenvalues 1,...,n (counting multiplicities) and corresponding n orthogonal unit-length eigenvectors b ,...,b : Recall that an n n real matrix C is called nonnegative definite if it 1 n ⇥ is symmetric and Cbi = ibi,i=1,...,n T n x Cx 0 for all x R 2 T T where bi bi =1, i =1,...,n and bi bj =0if i = j. and positive definite if it is symmetric and 6 (2) (Spectral decomposition) C can be written as T n x Cx > 0 for all x R such that x = 0 2 6 T C = BDB

Let A be an arbitrary n n real matrix. Then C = AT A is ⇥ where D = diag(1,...,n) is the diagonal matrix of the nonnegative definite. If A is nonsingular (invertible), then C is eigenvalues of C,andB is the orthogonal matrix whose ith column

positive definite. is bi,i.e.,B =[b1 ...bn]. T T T T T Proof: A A is symmetric since (A A)T =(A )(A )T = A A. (3) C is positive definite C is nonsingular all the eigenvalues () () Thus it is nonnegative definite since i are positive xT (AT A)x = xT AT Ax =(Ax)T (Ax) = Ax 2 0 k k ⇤ STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 11 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 12 / 34 (4) C has a unique nonnegative definite square root C1/2,i.e.,there exists a unique nonnegative definite A such that Lemma 4 C = AA If ⌃ is the matrix of some random vector T X =(X1,...,Xn) , then it is nonnegative definite. Proof: We only prove the existence of A. Let n 1/2 1/2 1/2 1/2 1/2 Proof: We know that ⌃ =Cov(X) is symmetric. Let b R be D = diag( ,...,n ) and note that D D = D. Let 2 1 arbitrary. Then A = BD1/2BT . Then A is nonnegative definite and 2 1/2 T 1/2 T bT ⌃b = bT Cov(X)b =Cov(bT X) = Var(bT X) 0 A = AA =(BD B )(BD B ) 1/2 T 1/2 T 1/2 1/2 T = BD B BD B = BD D B so ⌃ is nonnegative definite = C ⇤ Remark: It can be shown that an n n matrix ⌃ is nonnegative definite ⇥ T if and only if there exists a random vector X =(X1,...,Xn) such that Remarks: Cov(X)=⌃. If C is positive definite, then so is A. If we don’t require that A be nonnegative definite, then in general there are infinitely many solutions A for AAT = C.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 13 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 14 / 34

Definition Let µ Rn and let ⌃ be an n n nonnegative definite Defining the Multivariate Normal Distribution 2 ⇥ matrix. A random vector X =(X1,...,Xn) is said to have a Let Z1,...,Zn be independent r.v.’s with Zi N(0, 1). The multivariate normal distribution with µ and ⌃ if its T ⇠ multivariate MGF of Z =(Z1,...,Zn) is multivariate MGF is

n T 1 T T n t µ+ t ⌃t t Z tiZi tiZi M (t)=e 2 MZ (t)=E e = E e i=1 = E e X P i=1 n Y t2/2 n t2/2 1 tTt = e i = e i=1 i = e 2 i=1 P Notation: X N(µ, ⌃). Y ⇠ Remarks: n T Now let µ R and A an n n real matrix. Then the MGF of If Z =(Z1,...,Zn) with Zi N(0, 1), i =1,...,n,then 2 ⇥ ⇠ X = AZ + µ is Z N(0, I),whereI is the n n identity matrix. ⇠ ⇥ tT µ T tT µ 1 (AT t)T (AT t) We saw that if Z N(0, I),thenX = AZ + µ N(µ, ⌃),where M (t)=e M (A t) = e e 2 ⇠ ⇠ X Z ⌃ = AAT . One can show the following: tT µ 1 tT AAT t tT µ+ 1 tT ⌃t = e e 2 = e 2 X N(µ, ⌃) if and only if X = AZ + µ for a random n-vector ⇠ T T Z N(0, I) and some n n matrix A with ⌃ = AA . where ⌃ = AA . Note that ⌃ is nonnegative definite. ⇠ ⇥

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 15 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 16 / 34 Mean and covariance for multivariate normal distribution

Consider first Z N(0, I),i.e.,Z =(Z ,...,Z )T ,wheretheZ are If X N(µ, ⌃),thenX = AZ + µ for a random n-vector ⇠ 1 n i ⇠ independent N(0, 1) random variables. Then Z N(0, I) and some n n matrix A with ⌃ = AAT . ⇠ ⇥ T T We have E(Z)= E(Z1),...,E(Zn) =(0, , 0) ··· E(AZ + µ)=AE(Z)+µ = µ and Also,

1 if i = j, T T E (Zi E(Zi))(Zj E(Zj)) = E(ZiZj) = Cov(AZ + µ)=Cov(AZ)=A Cov(Z)A = AA = ⌃ 80 if i = j < 6 Thus Thus : E(X)=µ, Cov(X)=⌃

E(Z)=0, Cov(Z)=I

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 17 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 18 / 34

Theorem 6 Joint pdf for multivariate normal distribution If X =(X ,...,X )T N(µ, ⌃),where⌃ is nonsingular, then it has a 1 n ⇠ Lemma 5 joint pdf given by If a random vector X =(X ,...,X )T has ⌃ that is 1 n 1 T 1 1 (x µ) ⌃ (x µ) n fX (x)= e 2 , x R not of full rank (i.e., singular), then X does not have a joint pdf. (2⇡)n det ⌃ 2

n p Proof sketch: If ⌃ is singular, then there exists b R such that b = 0 Proof: We know that X = AZ + µ where T 2 n 6 and ⌃b = 0. Consider the random variable b X = i=1 biXi: Z =(Z ,...,Z )T N(0, I) and A is an n n matrix such that 1 n ⇠ ⇥ AAT = ⌃. Since ⌃ is nonsingular, A must be nonsingular with inverse Var(bT X)=Cov(bT X) = bT Cov(X)b =PbT ⌃b =0 1 A . Thus the mapping T Therefore P (b X = c)=1for some constant c. If X had a joint pdf h(z)=Az + µ f(x),thenforB = x : bT x = c we should have 1 { } is invertible with inverse g(x)=A (x µ) whose Jacobian is T 1 1=P (b X = c)=P (X B)= f(x1,...,xn) dx1 dxn J (x)=detA 2 ··· ··· g Z Z B By the multivariate transformation theorem

But this is impossible since B is an (n 1)-dimensional hyperplane 1 1 fX (x)=fZ (g(x)) Jg(x) = fZ A (x µ) det A whose n-dimensional volume is zero, so the must be zero. ⇤ | | | | STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 19 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 20 / 34 T Special case: bivariate normal Proof cont’d: Since Z =(Z1,...,Zn) ,wheretheZi are independent N(0, 1) random variables, we have For n =2we have

n 2 1 z2/2 1 1 n z2 1 1 zTz µ1 1 ⇢12 fZ (z)= e i = e 2 i=1 i = e 2 µ = and ⌃ = p n n µ ⇢ 2 i=1 2⇡ (2⇡) P (2⇡) " 1# " 1 2 2 # Y✓ ◆ p p 2 so we get where µi = E(Xi), i = Var(Xi), i =1, 2,and

1 1 Cov(X1,X2) fX (x)=fZ A (x µ) det A ⇢ = ⇢(X ,X )= | | 1 2 1 1 T 1 1 2 1 (A (x µ)) (A (x µ)) 1 = e 2 det A (2⇡)n | | Thus the bivariate normal distribution is determined by five scalar

1 T 1 T 1 2 2 1 (x µ) (A ) A (x µ) 1 parameters µ1, µ2, 1, 2,and⇢. = p e 2 det A (2⇡)n | | ⌃ is positive definite ⌃ is invertible det ⌃ > 0: 1 T 1 1 (x µ) ⌃ (x µ) () () = p e 2 (2⇡)n det ⌃ det ⌃ =(1 ⇢2)22 > 0 ⇢ < 1 and 22 > 0 1 2 () | | 1 2 p 1 1 1 T 1 1 since det A = and (A ) A = ⌃ (exercise!) ⇤ so a bivariate normal random variable (X1,X2) has a pdf if and only if | | pdet ⌃ the components X and X have positive and ⇢ < 1. 1 2 | |

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 21 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 22 / 34

T Thus the joint pdf of (X1,X2) N(µ, ⌃) is We have ⇠

(x µ )2 (x µ )2 2⇢(x µ )(x µ ) 1 1 1 1 1 + 2 2 1 1 2 2 2 2 2(1 ⇢2) 2 2 12 1 1 ⇢12 1 2 ⇢12 f(x1,x2)= e 1 2 ⌃ = = 2 2 2 2⇡12 1 ⇢ ⇢12 2 det ⌃ ⇢12 1 " # " # p and Remark: If ⇢ =0,then T 1 (x µ) ⌃ (x µ) (x µ )2 (x µ )2 1 1 1 2 2 1 2 2 + 2 2 f(x ,x )= e 1 2 1 ⇢12 x1 µ1 1 2 2 2⇡12 = x1 µ1,x2 µ1 2 2 2 2 (1 ⇢ ) (x µ )2 (x µ )2 1 2 " ⇢12 1 #"x2 µ1# 1 1 2 2 1 22 1 22 h i = e 1 e 2 2 1p2⇡ · 2p2⇡ 1 2(x1 µ1) ⇢12(x2 µ2) = 2 2 2 x1 µ1,x2 µ1 2 = fX1 (x1)fX2 (x2) (1 ⇢ )12 "1(x2 µ2) ⇢12(x1 µ1)# h i 1 2 2 2 2 Therefore X1 and X2 are independent. It is also easy to see that = 2 2 2 2(x1 µ1) 2⇢12(x1 µ1)(x2 µ2)+1(x2 µ2) (1 ⇢ )12 f(x ,x )=f (x )f (x ) for all x and x implies ⇢ =0. Thus we 1 2 X1 1 X2 2 1 2 2 2 1 (x1 µ1) (x2 µ2) 2⇢(x1 µ1)(x2 µ2) obtain = 2 2 + 2 (1 ⇢ ) 1 2 12 ✓ ◆ Two jointly normal random variables X1 and X2 are independent if and only if they are uncorrelated.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 23 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 24 / 34 In general, the following important facts can be proved using the Marginals of multivariate normal distributions

multivariate MGF: T Let X =(X1,...,Xn) N(µ, ⌃). If A is an m n matrix and m ⇠ ⇥ T b R ,then (i) If X =(X1,...,Xn) N(µ, ⌃),thenX1,X2,...Xn are 2 ⇠ Y = AX + b independent if and only if they are uncorrelated, i.e., m Cov(Xi,Xj)=0if i = j,i.e.,⌃ is a diagonal matrix. is a random m-vector. Its MGF at t R is 6 2 T (ii) Assume X =(X1,...,Xn) N(µ, ⌃) and let tT b T ⇠ MY (t)=e MX (A t) T T X1 =(X1,...,Xk) , X2 =(Xk+1,...,Xn) ⌧ T µ+ 1 ⌧ T ⌃⌧ n Since MX (⌧ )=e 2 for all ⌧ R , we obtain 2 Then X1 and X2 are independent if and only if tT b (AT t)T µ+ 1 (AT t)T ⌃(AT t) MY (t)=e e 2 Cov(X1, X2)=0k (n k),thek (n k) matrix of zeros, i.e., ⌃ T 1 T T ⇥ ⇥ t (b+Aµ)+ 2 t A⌃A t can be partitioned as = e This that Y N(b + Aµ, A⌃AT ),i.e.,Y is multivariate normal ⌃11 0k (n k) ⇠ ⌃ = ⇥ with mean b + Aµ and covariance A⌃AT . " 0(n k) k ⌃22 # ⇥ Example: Let a1,...,an R and determine the distribution of where ⌃11 =Cov(X1) and ⌃22 =Cov(X2). 2 Y = a X + + a X . 1 1 ··· n n STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 25 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 26 / 34

Note the following: For some 1 m

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 27 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 28 / 34 Conditional distributions T Recall that X = AZ + µ for some Z =(Z1,...,Zn) where the Zi are T T Let X =(X1,...,Xn) N(µ, ⌃) and for 1 m

conditional distribution of X2 given X1 = x1.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 29 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 30 / 34

1 Since B is invertible, given X = x ,wehaveZ = B (x µ ). So We want to solve for B, C and D. First consider BBT = ⌃ . We 1 1 1 1 1 11 given X = x , we have that the conditional distribution of X and the choose B to be the unique positive definite square root of ⌃ : 1 1 2 11 conditional distribution of 1/2 B = ⌃ 1 11 CB (x µ )+DZ + µ 1 1 2 2 Recall that B is symmetric and it is invertible since ⌃11 is. Then are the same. T ⌃21 = CB = implies But Z2 is independent of X1, so given X1 = x1, the conditional 1 T 1 1 distribution of CB (x µ )+DZ + µ is the same as its C = ⌃21(B ) = ⌃21B 1 1 2 2 unconditional distribution. T T Then ⌃22 = CC + DD gives We conclude that the conditional distribution of X2 given X1 = x1 is T T 1 1 T DD = ⌃ CC = ⌃ ⌃ B B (⌃ ) multivariate normal with mean 22 22 21 21 1 1 1 = ⌃22 ⌃21(BB) ⌃12 = ⌃22 ⌃21⌃11 ⌃12 E(X X = x )=µ + CB (x µ ) 2| 1 1 2 1 1 1 1 = µ + ⌃ B B (x µ ) Now note that X = AZ + µ gives 2 21 1 1 1 = µ + ⌃ ⌃ (x µ ) 2 21 11 1 1 X1 = BZ1 + µ1, X2 = CZ1 + DZ2 + µ2 T 1 and covariance matrix ⌃22 1 = DD = ⌃22 ⌃21⌃11 ⌃12 | STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 31 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 32 / 34 Special case: bivariate normal Equivalently, the conditional distribution of X2 given X1 = x1 is T Suppose X =(X1,X2) N(µ, ⌃) with ⇠ N µ + ⇢ 2 (x µ ), 2(1 ⇢2) 2 1 1 2 µ 2 ⇢ 1 1 1 1 2 µ = and ⌃ = 2 "µ1# "⇢12 2 # If ⇢ < 1, then the conditional pdf exists and is given by We have | | 1 2 µ2 + ⌃21⌃11 (x1 µ1)=µ2 + ⇢ (x1 µ1) 2 2 1 x2 µ2 ⇢ (x1 µ1) 1 1 and 22(1 ⇢2) fX X (x2 x1)= e 2 2 2 2 2 1 2 1 2 ⇢ 12 2 2 | | 2 2⇡(1 ⇢ ) ⌃22 ⌃21⌃11 ⌃12 = 2 = 2(1 ⇢ ) 2 1 p Thus the conditional distribution of X2 given X1 = x1 is normal with 2 Remark: Note that E(X2 X1 = x1)=µ2 + ⇢ (x1 µ1) is a linear (conditional) mean | 1 (ane) function of x . 1 E(X X = x )=µ + ⇢ 2 (x µ ) 2| 1 1 2 1 1 Example: Recall the MMSE estimate problem for X N(0, 2 ) from 1 ⇠ X the observation Y = X + Z,whereZ N(0, 2 ) and X and Z are and variance ⇠ Z independent. Use the above the find g (y)=E[X Y = y] and compute Var(X X = x )=2(1 ⇢2) ⇤ 2 1 1 2 2 | | the minimum error E (X g⇤(Y )) . STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 33 / 34 STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution ⇥ ⇤ 34 / 34