<<

6.5 Conditional Distributions General Bivariate Normal

Let Z1, Z2 ∼ N (0, 1), which we will use to build a general bivariate normal Lecture 22: Bivariate distribution.   1 1 2 2 f (z1, z2) = exp − (z1 + z2 ) 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX , µY , σX , σY , ρ April 11, 2012

X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22

6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr

First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X , Y ) and ρ(X , Y )

Cov(X , Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0, 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX , σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2])

h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0, 1) + 1 − ρ N (0, 1)] + µY = σ [N (0, ρ2) + N (0, 1 − ρ2)] + µ Y Y Cov(X , Y ) ρ(X , Y ) = = ρ = σY N (0, 1) + µY σX σY 2 = N (µY , σY )

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables

Consequently, if we want to generate a Bivariate Normal random Let X1,..., Xn have a continuous joint distribution with pdf f defined of S. We can define n new random variables Y ,..., Y as follows: 2 2 1 n with X ∼ N (µX , σX ) and Y ∼ N (µY , σY ) where the correlation of X and Y = r (X ,..., X ) ··· Y = r (X ,..., X ) Y is ρ we can generate two independent unit normals Z1 and Z2 and use 1 1 1 n n n 1 n

the transformation: If we assume that the n functions r1,..., rn define a one-to-one differentiable transformation from S to T then let the inverse of this transformation be

x1 = s1(y1,..., yn) ··· xn = sn(y1,..., yn) X = σX Z1 + µX Then the joint pdf g of Y1,..., Yn is p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY ( f (s1,..., sn)|J| for (y1,..., yn) ∈ T g(y1,..., yn) = 0 otherwise We can also use this result to find the joint density of the Bivariate Normal using a 2d change of variables. Where  ∂s1 ··· ∂s1  ∂y1 ∂yn  . . .  J = det  . . .   . . .  ∂sn ··· ∂sn ∂y1 ∂yn

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 4 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 5 / 22

6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Density General Bivariate Normal - Density

The first thing we need to find are the inverses of the transformation. If Next we calculate the Jacobian,

x = r (z , z ) and y = r (z , z ) we need to find functions h and h such 1 1 1 2 2 1 2 1 2 " ∂s1 ∂s1 # " 0 # ∂x ∂y σX 1 J = det = det −ρ 1 = that Z1 = s1(X , Y ) and Z2 = s2(X , Y ). ∂s2 ∂s2 √ √ p 2 2 2 σ σ 1 − ρ ∂x ∂y σX 1−ρ σY 1−ρ X Y X = σ Z + µ X 1 X The joint density of X and Y is then given by X − µX Z1 = σX f (x, y) = f (z1, z2)|J|     1 1 2 2 1 1 2 2 p 2 = exp − (z + z ) |J| = exp − (z + z ) Y = σ [ρZ1 + 1 − ρ Z2] + µ 1 2 p 2 1 2 Y Y 2π 2 2πσX σY 1 − ρ 2 Y − µY X − µX p 2 = ρ + 1 − ρ Z2 " " !2 !2## σY σX 1 1 x − µX 1 y − µY x − µX = exp − + − ρ   p 2 2 1 Y − µY X − µX 2πσX σY 1 − ρ 2 σX 1 − ρ σY σX Z2 = p − ρ 1 − ρ2 σY σX " 2 2 !# 1 −1 (x − µX ) (y − µY ) (x − µX ) (y − µY ) Therefore, = exp + − 2ρ 2 1/2 2 2 2 2πσX σY (1 − ρ ) 2(1 − ρ ) σX σY σX σY   x − µX 1 y − µY x − µX s1(x, y) = s2(x, y) = p − ρ σX 1 − ρ2 σY σX

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 6 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 7 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Density (Matrix Notation) General Bivariate Normal - Density (Matrix Notation)

Obviously, the density for the Bivariate Normal is ugly, and it only gets Recall for a 2 × 2 matrix, worse when we consider higher dimensional joint densities of normals. We  a b  1  d −b  1  d −b  A = A−1 = = can write the density in a more compact form using matrix notation, c d det A −c a ad − bc −c a

     2  Then, x µX σX ρσX σY x = µ = Σ = 2 (x − µ)T Σ−1(x − µ) y µY ρσX σY σY 1  T  2    x − µx σY −ρσX σY x − µx = 2 σ2 σ2 (1 − ρ2) y − µy −ρσX σY σ y − µy   X Y X 1 −1/2 1 T −1 1  2 T   f (x) = (det Σ) exp − (x − µ) Σ (x − µ) σY (x − µX ) − ρσX σY (y − µY ) x − µx = 2 2π 2 2 2 2 −ρσ σ (x − µ ) + σ (y − µ ) y − µy σX σY (1 − ρ ) X Y X X Y 1 −1/2 = σ2 (x − µ )2 − 2ρσ σ (x − µ )(y − µ ) + σ2 (y − µ )2 We can confirm our results by checking the value of (det Σ) and 2 2 2 Y X X Y X Y X Y σX σY (1 − ρ ) T −1 ! (x − µ) Σ (x − µ) for the bivariate case. 1 (x − µ )2 (x − µ )(y − µ ) (y − µ )2 = X − 2ρ X Y + Y 2 2 2 1 − ρ σ σX σY σ −1/2 1 X Y (det Σ)−1/2 = σ2 σ2 − ρ2σ2 σ2  = X Y X Y 2 1/2 σX σY (1 − ρ )

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 8 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 9 / 22

6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Examples General Bivariate Normal - Examples

X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1)

ρ = 0 ρ = 0 ρ = 0 ρ = 0.25 ρ = 0.5 ρ = 0.75

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 10 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 11 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Examples General Bivariate Normal - Examples

X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2)

ρ = −0.25 ρ = −0.5 ρ = −0.75 ρ = −0.75 ρ = −0.75 ρ = −0.75

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 12 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 13 / 22

6.5 Conditional Distributions 6.5 Conditional Distributions Multivariate Normal Distribution Multivariate Normal Distribution - Cholesky

Matrix notation allows us to easily express the density of the multivariate In the bivariate case, we had a nice transformation such that we could normal distribution for an arbitrary number of dimensions. We express the generate two independent unit normal values and transform them into a k-dimensional multivariate normal distribution as follows, from an arbitrary bivariate normal distribution.

X ∼ Nk (µ, Σ) There is a similar method for the multivariate normal distribution that takes advantage of the Cholesky decomposition of the matrix. where µ is the k × 1 column vector of and Σ is the k × k where {Σ}i,j = Cov(Xi , Xj ). The Cholesky decomposition is defined for a symmetric, positive definite matrix X as The density of the distribution is L = Chol(X) where L is a lower triangular matrix such that LLT = X. 1  1  f (x) = (det Σ)−1/2 exp − (x − µ)T Σ−1(x − µ) (2π)k/2 2

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 14 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 15 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions Multivariate Normal Distribution - RNG Cholesky and the Bivariate Transformation

Let Z ,..., Z ∼ N (0, 1) and Z = (Z ,..., Z )T then We need to find the Cholesky decomposition of Σ for the general bivariate 1 k 1 k case where  σ2 ρσ σ  Σ = X X Y ρσ σ σ2 µ + Chol(Σ)Z ∼ Nk (µ, Σ) X Y Y this is offered without proof in the general k-dimensional case but we can We need to solve the following for a, b, c

check that this results in the same transformation we started with in the      2   2  a 0 a b a ab σX ρσX σY bivariate case and should justify how we knew to use that particular = 2 2 = 2 b c 0 c ab b + c ρσX σY σY transformation. This gives us three (unique) equations and three unknowns to solve for,

2 2 2 2 2 a = σX ab = ρσX σY b + c = σY

a = σX

b = ρσX σY /a = ρσY q 2 2 2 1/2 c = σY − b = σY (1 − ρ )

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 16 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 17 / 22

6.5 Conditional Distributions 6.5 Conditional Distributions Cholesky and the Bivariate Transformation of the Bivariate Normal

2 1/2 Let Z1, Z2 ∼ N (0, 1) then Using X = µX + σX Z1 and Y = µY + σY [ρZ1 + (1 − ρ ) Z2] where Z1, Z2 ∼ N (0, 1) we can find E(Y |X ).  X  = µ + Chol(Σ)Z h   i Y 2 1/2 E[Y |X = x] = E µY + σY ρZ1 + (1 − ρ ) Z2 X = x       µX σX 0 Z1     x − µX 2 1/2 = + 2 1/2 = E µ + σ ρ + (1 − ρ ) Z X = x µY ρσY σY (1 − ρ ) Z2 Y Y 2 σX  µ   σ Z    = X + X 1 x − µX 2 1/2 µ ρσ Z + σ (1 − ρ2)1/2Z = µY + σY ρ + (1 − ρ ) E[Z2|X = x] Y Y 1 Y 2 σX   x − µX = µY + σY ρ X = µX + σX Z1 σX 2 1/2 Y = µY + σY [ρZ1 + (1 − ρ ) Z2] By symmetry,   y − µY E[X |Y = y] = µX + σX ρ σY

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 18 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 19 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions

Conditional of the Bivariate Normal Example - Husbands and Wives (Example 5.10.6, deGroot)

Using X = µ + σ Z and Y = µ + σ [ρZ + (1 − ρ2)1/2Z ] where Suppose that the heights of married couples can be explained by a bivariate normal distribution. X X 1 Y Y 1 2 If the wives have a heigh of 66.8 inches and a standard of 2 inches while the Z1, Z2 ∼ N (0, 1) we can find Var(Y |X ). heights of the husbands have a mean of 70 inches and a standard deviation of 2 inches. The correlation between the heights is 0.68. What is the probability that for a randomly selected couple the wife is taller than her husband? h   i 2 1/2 Var[Y |X = x] = Var µY + σY ρZ1 + (1 − ρ ) Z2 X = x     x − µX 2 1/2 = Var µY + σY ρ + (1 − ρ ) Z2 X = x σX 2 = Var[σY (1 − ρ )Z2|X = x] 2 2 = σY (1 − ρ )

By symmetry, 2 2 Var[X |Y = y] = σX (1 − ρ )

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 20 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 21 / 22

6.5 Conditional Distributions 6.5 Conditional Distributions Example - Conditionals Example - Conditionals, cont.

Suppose that X1 and X2 have a bivariate normal distribution where E(X1|X2) = 3.7 − 0.15X2, E(X2|X1) = 0.4 − 0.6X1, and Var(X2|X1) = 3.64.

Find E(X1), Var(X1), E(X2), Var(X2), and ρ(X1, X2).

Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 22 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 23 / 22