6.5 Conditional Distributions General Bivariate Normal
Let Z1, Z2 ∼ N (0, 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1, z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX , µY , σX , σY , ρ April 11, 2012
X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr
First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X , Y ) and ρ(X , Y )
Cov(X , Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0, 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX , σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2])
h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0, 1) + 1 − ρ N (0, 1)] + µY = σ [N (0, ρ2) + N (0, 1 − ρ2)] + µ Y Y Cov(X , Y ) ρ(X , Y ) = = ρ = σY N (0, 1) + µY σX σY 2 = N (µY , σY )
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables
Consequently, if we want to generate a Bivariate Normal random variable Let X1,..., Xn have a continuous joint distribution with pdf f defined of S. We can define n new random variables Y ,..., Y as follows: 2 2 1 n with X ∼ N (µX , σX ) and Y ∼ N (µY , σY ) where the correlation of X and Y = r (X ,..., X ) ··· Y = r (X ,..., X ) Y is ρ we can generate two independent unit normals Z1 and Z2 and use 1 1 1 n n n 1 n
the transformation: If we assume that the n functions r1,..., rn define a one-to-one differentiable transformation from S to T then let the inverse of this transformation be
x1 = s1(y1,..., yn) ··· xn = sn(y1,..., yn) X = σX Z1 + µX Then the joint pdf g of Y1,..., Yn is p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY ( f (s1,..., sn)|J| for (y1,..., yn) ∈ T g(y1,..., yn) = 0 otherwise We can also use this result to find the joint density of the Bivariate Normal using a 2d change of variables. Where ∂s1 ··· ∂s1 ∂y1 ∂yn . . . J = det . . . . . . ∂sn ··· ∂sn ∂y1 ∂yn
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 4 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 5 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Density General Bivariate Normal - Density
The first thing we need to find are the inverses of the transformation. If Next we calculate the Jacobian,
x = r (z , z ) and y = r (z , z ) we need to find functions h and h such 1 1 1 2 2 1 2 1 2 " ∂s1 ∂s1 # " 0 # ∂x ∂y σX 1 J = det = det −ρ 1 = that Z1 = s1(X , Y ) and Z2 = s2(X , Y ). ∂s2 ∂s2 √ √ p 2 2 2 σ σ 1 − ρ ∂x ∂y σX 1−ρ σY 1−ρ X Y X = σ Z + µ X 1 X The joint density of X and Y is then given by X − µX Z1 = σX f (x, y) = f (z1, z2)|J| 1 1 2 2 1 1 2 2 p 2 = exp − (z + z ) |J| = exp − (z + z ) Y = σ [ρZ1 + 1 − ρ Z2] + µ 1 2 p 2 1 2 Y Y 2π 2 2πσX σY 1 − ρ 2 Y − µY X − µX p 2 = ρ + 1 − ρ Z2 " " !2 !2## σY σX 1 1 x − µX 1 y − µY x − µX = exp − + − ρ p 2 2 1 Y − µY X − µX 2πσX σY 1 − ρ 2 σX 1 − ρ σY σX Z2 = p − ρ 1 − ρ2 σY σX " 2 2 !# 1 −1 (x − µX ) (y − µY ) (x − µX ) (y − µY ) Therefore, = exp + − 2ρ 2 1/2 2 2 2 2πσX σY (1 − ρ ) 2(1 − ρ ) σX σY σX σY x − µX 1 y − µY x − µX s1(x, y) = s2(x, y) = p − ρ σX 1 − ρ2 σY σX
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 6 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 7 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Density (Matrix Notation) General Bivariate Normal - Density (Matrix Notation)
Obviously, the density for the Bivariate Normal is ugly, and it only gets Recall for a 2 × 2 matrix, worse when we consider higher dimensional joint densities of normals. We a b 1 d −b 1 d −b A = A−1 = = can write the density in a more compact form using matrix notation, c d det A −c a ad − bc −c a
2 Then, x µX σX ρσX σY x = µ = Σ = 2 (x − µ)T Σ−1(x − µ) y µY ρσX σY σY 1 T 2 x − µx σY −ρσX σY x − µx = 2 σ2 σ2 (1 − ρ2) y − µy −ρσX σY σ y − µy X Y X 1 −1/2 1 T −1 1 2 T f (x) = (det Σ) exp − (x − µ) Σ (x − µ) σY (x − µX ) − ρσX σY (y − µY ) x − µx = 2 2π 2 2 2 2 −ρσ σ (x − µ ) + σ (y − µ ) y − µy σX σY (1 − ρ ) X Y X X Y 1 −1/2 = σ2 (x − µ )2 − 2ρσ σ (x − µ )(y − µ ) + σ2 (y − µ )2 We can confirm our results by checking the value of (det Σ) and 2 2 2 Y X X Y X Y X Y σX σY (1 − ρ ) T −1 ! (x − µ) Σ (x − µ) for the bivariate case. 1 (x − µ )2 (x − µ )(y − µ ) (y − µ )2 = X − 2ρ X Y + Y 2 2 2 1 − ρ σ σX σY σ −1/2 1 X Y (det Σ)−1/2 = σ2 σ2 − ρ2σ2 σ2 = X Y X Y 2 1/2 σX σY (1 − ρ )
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 8 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 9 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Examples General Bivariate Normal - Examples
X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1)
ρ = 0 ρ = 0 ρ = 0 ρ = 0.25 ρ = 0.5 ρ = 0.75
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 10 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 11 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Examples General Bivariate Normal - Examples
X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2)
ρ = −0.25 ρ = −0.5 ρ = −0.75 ρ = −0.75 ρ = −0.75 ρ = −0.75
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 12 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 13 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions Multivariate Normal Distribution Multivariate Normal Distribution - Cholesky
Matrix notation allows us to easily express the density of the multivariate In the bivariate case, we had a nice transformation such that we could normal distribution for an arbitrary number of dimensions. We express the generate two independent unit normal values and transform them into a k-dimensional multivariate normal distribution as follows, sample from an arbitrary bivariate normal distribution.
X ∼ Nk (µ, Σ) There is a similar method for the multivariate normal distribution that takes advantage of the Cholesky decomposition of the covariance matrix. where µ is the k × 1 column vector of means and Σ is the k × k covariance matrix where {Σ}i,j = Cov(Xi , Xj ). The Cholesky decomposition is defined for a symmetric, positive definite matrix X as The density of the distribution is L = Chol(X) where L is a lower triangular matrix such that LLT = X. 1 1 f (x) = (det Σ)−1/2 exp − (x − µ)T Σ−1(x − µ) (2π)k/2 2
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 14 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 15 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions Multivariate Normal Distribution - RNG Cholesky and the Bivariate Transformation
Let Z ,..., Z ∼ N (0, 1) and Z = (Z ,..., Z )T then We need to find the Cholesky decomposition of Σ for the general bivariate 1 k 1 k case where σ2 ρσ σ Σ = X X Y ρσ σ σ2 µ + Chol(Σ)Z ∼ Nk (µ, Σ) X Y Y this is offered without proof in the general k-dimensional case but we can We need to solve the following for a, b, c
check that this results in the same transformation we started with in the 2 2 a 0 a b a ab σX ρσX σY bivariate case and should justify how we knew to use that particular = 2 2 = 2 b c 0 c ab b + c ρσX σY σY transformation. This gives us three (unique) equations and three unknowns to solve for,
2 2 2 2 2 a = σX ab = ρσX σY b + c = σY
a = σX
b = ρσX σY /a = ρσY q 2 2 2 1/2 c = σY − b = σY (1 − ρ )
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 16 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 17 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions Cholesky and the Bivariate Transformation Conditional Expectation of the Bivariate Normal
2 1/2 Let Z1, Z2 ∼ N (0, 1) then Using X = µX + σX Z1 and Y = µY + σY [ρZ1 + (1 − ρ ) Z2] where Z1, Z2 ∼ N (0, 1) we can find E(Y |X ). X = µ + Chol(Σ)Z h i Y 2 1/2 E[Y |X = x] = E µY + σY ρZ1 + (1 − ρ ) Z2 X = x µX σX 0 Z1 x − µX 2 1/2 = + 2 1/2 = E µ + σ ρ + (1 − ρ ) Z X = x µY ρσY σY (1 − ρ ) Z2 Y Y 2 σX µ σ Z = X + X 1 x − µX 2 1/2 µ ρσ Z + σ (1 − ρ2)1/2Z = µY + σY ρ + (1 − ρ ) E[Z2|X = x] Y Y 1 Y 2 σX x − µX = µY + σY ρ X = µX + σX Z1 σX 2 1/2 Y = µY + σY [ρZ1 + (1 − ρ ) Z2] By symmetry, y − µY E[X |Y = y] = µX + σX ρ σY
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 18 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 19 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions
Conditional Variance of the Bivariate Normal Example - Husbands and Wives (Example 5.10.6, deGroot)
Using X = µ + σ Z and Y = µ + σ [ρZ + (1 − ρ2)1/2Z ] where Suppose that the heights of married couples can be explained by a bivariate normal distribution. X X 1 Y Y 1 2 If the wives have a mean heigh of 66.8 inches and a standard deviation of 2 inches while the Z1, Z2 ∼ N (0, 1) we can find Var(Y |X ). heights of the husbands have a mean of 70 inches and a standard deviation of 2 inches. The correlation between the heights is 0.68. What is the probability that for a randomly selected couple the wife is taller than her husband? h i 2 1/2 Var[Y |X = x] = Var µY + σY ρZ1 + (1 − ρ ) Z2 X = x x − µX 2 1/2 = Var µY + σY ρ + (1 − ρ ) Z2 X = x σX 2 = Var[σY (1 − ρ )Z2|X = x] 2 2 = σY (1 − ρ )
By symmetry, 2 2 Var[X |Y = y] = σX (1 − ρ )
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 20 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 21 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions Example - Conditionals Example - Conditionals, cont.
Suppose that X1 and X2 have a bivariate normal distribution where E(X1|X2) = 3.7 − 0.15X2, E(X2|X1) = 0.4 − 0.6X1, and Var(X2|X1) = 3.64.
Find E(X1), Var(X1), E(X2), Var(X2), and ρ(X1, X2).
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 22 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 23 / 22