<<

6.4, 6.5 Covariance and Correlation Covariance

We have previously discussed Covariance in relation to the of the Lecture 20: Covariance / Correlation & General sum of two random variables (Review Lecture 8). Bivariate Normal Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X , Y )

Sta230 / Mth 230 Specifically, covariance is defined as

Colin Rundel Cov(X , Y ) = E[(X − E(X ))(Y − E(Y ))] April 11, 2012 = E(XY ) − E(X )E(Y )

this is a generalization of variance to two random variables and generally measures the degree to which X and Y tend to be large (or small) at the same time or the degree to which one tends to be large while the other is small.

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 1 / 33

6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Covariance, cont. Properties of Covariance

The magnitude of the covariance is not usually informative since it is affected by the magnitude of both X and X . However, the sign of the Cov(X , Y ) = E[(X − µx )(Y − µy )] = E(XY ) − µx µy covariance tells us something useful about the relationship between X and Y . Cov(X , Y ) = Cov(Y , X ) Cov(X , Y ) = 0 if X and Y are independent Consider the following conditions:

X > µX and Y > µY then (X − µX )(Y − µY ) will be positive. Cov(X , c) = 0

X < µX and Y < µY then (X − µX )(Y − µY ) will be positive. Cov(X , X ) = Var(X )

X > µX and Y < µY then (X − µX )(Y − µY ) will be negative. Cov(aX , bY ) = ab Cov(X , Y )

X < µX and Y > µY then (X − µX )(Y − µY ) will be negative. Cov(X + a, Y + b) = Cov(X , Y ) Cov(X , Y + Z) = Cov(X , Y ) + Cov(X , Z)

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 2 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 3 / 33 6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Correlation Correlation, cont.

Since Cov(X , Y ) depends on the magnitude of X and Y we would prefer to have a measure of association that is not affected by changes in the scales of the random variables.

The most common measure of linear association is correlation which is defined as

Cov(X , Y ) ρ(X , Y ) = σX σY −1 < ρ(X , Y ) < 1 Where the magnitude of the correlation measures the strength of the linear association and the sign determines if it is a positive or negative relationship.

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 4 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 5 / 33

6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Correlation and Independence Example

Given random variables X and Y Let X = {−1, 0, 1} with equal probability and Y = X 2. Clearly X and Y are not independent random variables. X and Y are independent =⇒ Cov(X , Y ) = ρ(X , Y ) = 0

Cov(X , Y ) = ρ(X , Y ) = 0 =6 ⇒ X and Y are independent

Cov(X , Y ) = 0 is necessary but not sufficient for independence!

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 6 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 7 / 33 6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Example - Linear Dependence

Let X ∼ Unif(0, 1) and Y |X = aX + b for constants a and b. Find Let X1, X2, ··· , Xk be the k random variables that reflect the number of Cov(X , Y ) and ρ(X , Y ) outcomes belonging to categories 1,..., k in n trials with the probability of success for category i being pi , X1, ··· , Xk ∼ Multinom(n, p1, ··· , pk )

P(X1 = x1, ··· , Xk = xk ) = f (x1, ··· , xk |n, p1, ··· , pk )

n! x1 xk = p1 ··· pk x1! ··· xk ! k k X X where xi = n and pi = 1 i=1 i=1

E(Xi ) = npi

Var(Xi ) = npi (1 − pi )

Cov(Xi , Xj ) = −npi pj

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 8 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 9 / 33

6.4, 6.5 Covariance and Correlation 6.4, 6.5 General Bivariate Normal Example - Covariance of Multinomial Distribution General Bivariate Normal

Marginal distribution of X - consider category i a success and all other categories to be a i Let Z1, Z2 ∼ N (0, 1), which we will use to build a general bivariate normal failure, therefore in the n trials there are Xi successes and n − Xi failures with the probability of success is pi and failure is 1 − pi which Xi has a binomial distribution distribution.

X ∼ Binom(n, p )   i i 1 1 2 2 f (z1, z2) = exp − (z1 + z2 ) Similarly, the distribution of Xi + Xj will also follow a Binomial distribution with probability of 2π 2 success pi + pj .

Xi + Xj ∼ Binom(n, pi + pj ) We want to transform these unit normal distributions to have the follow parameters: µX , µY , σX , σY , ρ

X = µX + σX Z1  p 2  Y = µY + σY ρ Z1 + 1 − ρ Z2

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 10 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 11 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr

First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X , Y ) and ρ(X , Y )

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 12 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 13 / 33

6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - RNG Multivariate Change of Variables

Consequently, if we want to generate a Bivariate Normal Let X1,..., Xn have a continuous joint distribution with pdf f defined of S. We can define n new random variables Y ,..., Y as follows: 2 2 1 n with X ∼ N (µX , σX ) and Y ∼ N (µY , σY ) where Corr(X , Y ) = ρ we can Y = r (X ,..., X ) ··· Y = r (X ,..., X ) generate two independent unit normals Z1 and Z2 and use the 1 1 1 n n n 1 n transformation: If we assume that the n functions r1,..., rn define a one-to-one differentiable transformation from S to T then let the inverse of this transformation be

x1 = s1(y1,..., yn) ··· xn = sn(y1,..., yn) X = µX + σX Z1 Then the joint pdf g of Y1,..., Yn is  p 2  Y = µY + σY ρZ1 + 1 − ρ Z2 ( f (s1,..., sn)|J| for (y1,..., yn) ∈ T g(y1,..., yn) = 0 otherwise We can also use this result to find the joint density of the Bivariate Where Normal using a 2d change of variables.  ∂s1 ··· ∂s1  ∂y1 ∂yn  . . .  J = det  . . .   . . .  ∂sn ··· ∂sn ∂y1 ∂yn

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 14 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 15 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Density General Bivariate Normal - Density

The first thing we need to find are the inverses of the transformation. If Next we calculate the Jacobian,

X = r (z , z ) and Y = r (z , z ) we need to find functions Z and Z 1 1 1 2 2 1 2 1 2 " ∂s1 ∂s1 # " 0 # ∂x ∂y σX 1 J = det = det −ρ 1 = such that Z1 = s1(X , Y ) and Z2 = s2(X , Y ). ∂s2 ∂s2 √ √ p 2 2 2 σ σ 1 − ρ ∂x ∂y σX 1−ρ σY 1−ρ X Y

X = σX Z1 + µX The joint density of X and Y is then given by X − µX Z1 = σX p f (x, y) = f (z1, z2)|J| Y = σ [ρZ + 1 − ρ2Z ] + µ Y 1 2 Y 1  1  1  1  = exp − (z2 + z2) |J| = exp − (z2 + z2) Y − µ X − µ 1 2 p 2 1 2 Y X p 2 2π 2 2πσX σY 1 − ρ 2 = ρ + 1 − ρ Z2 σY σX   " " !2 !2## 1 Y − µY X − µX 1 1 x − µX 1 y − µY x − µX Z2 = − ρ = exp − + − ρ p 2 p 2 2 1 − ρ σY σX 2πσX σY 1 − ρ 2 σX 1 − ρ σY σX Therefore, " 2 2 !# 1 −1 (x − µX ) (y − µY ) (x − µX ) (y − µY ) = exp + − 2ρ 2πσ σ (1 − ρ2)1/2 2(1 − ρ2) σ2 σ2 σ σ   X Y X Y X Y x − µX 1 y − µY x − µX s1(x, y) = s2(x, y) = p − ρ σX 1 − ρ2 σY σX

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 16 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 17 / 33

6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Density ( Notation) General Bivariate Normal - Density (Matrix Notation)

Obviously, the density for the Bivariate Normal is ugly, and it only gets Recall for a 2 × 2 matrix,  a b  1  d −b  1  d −b  worse when we consider higher dimensional joint densities of normals. We A = A−1 = = can write the density in a more compact form using matrix notation, c d det A −c a ad − bc −c a Then,      2  x µX σX ρσX σY x = µ = Σ = 2 (x − µ)T Σ−1(x − µ) y µY ρσX σY σY 1  x − µ T  σ2 −ρσ σ   x − µ  = x Y X Y x 2 2 2 2 σ σ (1 − ρ ) y − µy −ρσX σY σX y − µy 1  1  X Y −1/2 T −1 1  2 T   f (x) = (det Σ) exp − (x − µ) Σ (x − µ) σY (x − µX ) − ρσX σY (y − µY ) x − µx = 2 2π 2 2 2 2 −ρσ σ (x − µ ) + σ (y − µ ) y − µy σX σY (1 − ρ ) X Y X X Y 1 = σ2 (x − µ )2 − 2ρσ σ (x − µ )(y − µ ) + σ2 (y − µ )2 −1/2 2 2 2 Y X X Y X Y X Y We can confirm our results by checking the value of (det Σ) and σX σY (1 − ρ ) T −1 ! (x − µ) Σ (x − µ) for the bivariate case. 1 (x − µ )2 (x − µ )(y − µ ) (y − µ )2 = X − 2ρ X Y + Y 2 2 2 1 − ρ σX σX σY σY −1/2 1 (det Σ)−1/2 = σ2 σ2 − ρ2σ2 σ2  = X Y X Y 2 1/2 σX σY (1 − ρ )

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 18 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 19 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Examples General Bivariate Normal - Examples

X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1)

ρ = 0 ρ = 0 ρ = 0 ρ = 0.25 ρ = 0.5 ρ = 0.75

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 20 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 21 / 33

6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Examples General Bivariate Normal - Examples

X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2)

ρ = −0.25 ρ = −0.5 ρ = −0.75 ρ = −0.75 ρ = −0.75 ρ = −0.75

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 22 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 23 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal Multivariate Normal Distribution Multivariate Normal Distribution - Cholesky

Matrix notation allows us to easily express the density of the multivariate In the bivariate case, we had a nice transformation such that we can normal distribution for an arbitrary number of dimensions. We express the generate two independent unit normal values and transform them into a k-dimensional multivariate normal distribution as follows, from an arbitrary bivariate normal distribution.

X ∼ Nk (µ, Σ) There is a similar method for the multivariate normal distribution that takes advantage of the Cholesky decomposition of the . where µ is the k × 1 column vector of means and Σ is the k × k covariance matrix where {Σ}i,j = Cov(Xi , Xj ). The Cholesky decomposition is defined for a symmetric, positive definite matrix X as The density of the distribution is L = Chol(X ) where L is a lower triangular matrix such that LLT = X . 1  1  f (x) = (det Σ)−1/2 exp − (x − µ)T Σ−1(x − µ) (2π)k/2 2

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 24 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 25 / 33

6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal Multivariate Normal Distribution - RNG Cholesky and the Bivariate Transformation

Let Z ,..., Z ∼ N (0, 1) and Z = (Z ,..., Z )T then We need to find the Cholesky decomposition of Σ for the general bivariate 1 k 1 k case where  σ2 ρσ σ  Σ = X X Y ρσ σ σ2 µ + Chol(Σ)Z ∼ Nk (µ, Σ) X Y Y this is offered without proof in the general k-dimensional case but we can We need to solve the following for a, b, c

check that this results in the same transformation we started with in the      2   2  a 0 a b a ab σX ρσX σY = 2 2 = 2 bivariate case and should justify how we knew to use that particular b c 0 c ab b + c ρσX σY σY transformation. This gives us three (unique) equations and three unknowns to solve for,

2 2 2 2 2 a = σX ab = ρσX σY b + c = σY

a = σX

b = ρσX σY /a = ρσY q 2 2 2 1/2 c = σY − b = σY (1 − ρ )

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 26 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 27 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal Cholesky and the Bivariate Transformation Conditional Expectation of the Bivariate Normal

2 1/2 Let Z1, Z2 ∼ N (0, 1) then Using X = µX + σX Z1 and Y = µY + σY [ρZ1 + (1 − ρ ) Z2] where Z1, Z2 ∼ N (0, 1) we can find E(Y |X ).  X  = µ + Chol(Σ)Z Y       µX σX 0 Z1 = + 2 1/2 µY ρσY σY (1 − ρ ) Z2     µX σX Z1 = + 2 1/2 µY ρσY Z1 + σY (1 − ρ ) Z2

X = µX + σX Z1 2 1/2 Y = µY + σY [ρZ1 + (1 − ρ ) Z2]

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 28 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 29 / 33

6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal

Conditional Variance of the Bivariate Normal Example - Husbands and Wives (Example 5.10.6, deGroot)

Using X = µ + σ Z and Y = µ + σ [ρZ + (1 − ρ2)1/2Z ] where Suppose that the heights of married couples can be explained by a bivariate normal distribution. X X 1 Y Y 1 2 If the wives have a heigh of 66.8 inches and a of 2 inches while the Z1, Z2 ∼ N (0, 1) we can find Var(Y |X ). heights of the husbands have a mean of 70 inches and a standard deviation of 2 inches. The correlation between the heights is 0.68. What is the probability that for a randomly selected couple the wife is taller than her husband?

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 30 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 31 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal Example - Conditionals Example - Conditionals, cont.

Suppose that X1 and X2 have a bivariate normal distribution where E(X1|X2) = 3.7 − 0.15X2, σ1 σ1 E(X2|X1) = 0.4 − 0.6X1, and Var(X2|X1) = 3.64. (i) µ1 − ρ µ2 = 3.7 (ii) ρ = −0.15 σ2 σ2

Find E(X1), Var(X1), E(X2), Var(X2), and ρ(X1, X2). σ2 σ2 (iii) µ2 − ρ µ1 = 0.4 (iv) ρ = −0.6 σ1 σ1 2 2 Based on the given information we know the following, (v) (1 − ρ )σ2 = 3.64 From (ii) and (iv) σ1 σ1 E(X1|X2) = µ1 − ρ µ2 + ρ X2 = 3.7 − 0.15X2 σ1 σ2 2 σ2 σ2 ρ × ρ = 0.15 × 0.6 → ρ = 0.09 → ρ = ±0.3 = −0.3 σ2 σ1 σ2 σ2 E(X2|X1) = µ2 − ρ µ1 + ρ X1 = 0.4 − 0.6X1 σ1 σ1 From (v) 2 2 2 2 σ2 = 3.64/(1 − ρ ) = 4 Var(X2|X1) = (1 − ρ )σ2 = 3.64 From (ii) From which we get the following set of equations  2 2 −0.15σ2 σ1 = = 1 σ1 σ1 ρ (i) µ1 − ρ µ2 = 3.7 (ii) ρ = −0.15 σ2 σ2 From (i) and (iii) σ2 σ2 µ1 + 0.15µ2 = 3.7 0.6µ1 + µ2 = 0.4 (iii) µ2 − ρ µ1 = 0.4 (iv) ρ = −0.6 σ1 σ1 µ1 = 4 µ2 = −2 2 2 (v) (1 − ρ )σ2 = 3.64

Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 32 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 33 / 33