Lecture 20: Covariance / Correlation & General Bivariate Normal

6.4, 6.5 Covariance and Correlation Covariance We have previously discussed Covariance in relation to the variance of the Lecture 20: Covariance / Correlation & General sum of two random variables (Review Lecture 8). Bivariate Normal Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X ; Y ) Sta230 / Mth 230 Specifically, covariance is defined as Colin Rundel Cov(X ; Y ) = E[(X − E(X ))(Y − E(Y ))] April 11, 2012 = E(XY ) − E(X )E(Y ) this is a generalization of variance to two random variables and generally measures the degree to which X and Y tend to be large (or small) at the same time or the degree to which one tends to be large while the other is small. Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 1 / 33 6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Covariance, cont. Properties of Covariance The magnitude of the covariance is not usually informative since it is affected by the magnitude of both X and X . However, the sign of the Cov(X ; Y ) = E[(X − µx )(Y − µy )] = E(XY ) − µx µy covariance tells us something useful about the relationship between X and Y . Cov(X ; Y ) = Cov(Y ; X ) Cov(X ; Y ) = 0 if X and Y are independent Consider the following conditions: X > µX and Y > µY then (X − µX )(Y − µY ) will be positive. Cov(X ; c) = 0 X < µX and Y < µY then (X − µX )(Y − µY ) will be positive. Cov(X ; X ) = Var(X ) X > µX and Y < µY then (X − µX )(Y − µY ) will be negative. Cov(aX ; bY ) = ab Cov(X ; Y ) X < µX and Y > µY then (X − µX )(Y − µY ) will be negative. Cov(X + a; Y + b) = Cov(X ; Y ) Cov(X ; Y + Z) = Cov(X ; Y ) + Cov(X ; Z) Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 2 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 3 / 33 6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Correlation Correlation, cont. Since Cov(X ; Y ) depends on the magnitude of X and Y we would prefer to have a measure of association that is not affected by changes in the scales of the random variables. The most common measure of linear association is correlation which is defined as Cov(X ; Y ) ρ(X ; Y ) = σX σY −1 < ρ(X ; Y ) < 1 Where the magnitude of the correlation measures the strength of the linear association and the sign determines if it is a positive or negative relationship. Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 4 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 5 / 33 6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Correlation and Independence Example Given random variables X and Y Let X = {−1; 0; 1g with equal probability and Y = X 2. Clearly X and Y are not independent random variables. X and Y are independent =) Cov(X ; Y ) = ρ(X ; Y ) = 0 Cov(X ; Y ) = ρ(X ; Y ) = 0 =6 ) X and Y are independent Cov(X ; Y ) = 0 is necessary but not sufficient for independence! Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 6 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 7 / 33 6.4, 6.5 Covariance and Correlation 6.4, 6.5 Covariance and Correlation Example - Linear Dependence Multinomial Distribution Let X ∼ Unif(0; 1) and Y jX = aX + b for constants a and b. Find Let X1; X2; ··· ; Xk be the k random variables that reflect the number of Cov(X ; Y ) and ρ(X ; Y ) outcomes belonging to categories 1;:::; k in n trials with the probability of success for category i being pi , X1; ··· ; Xk ∼ Multinom(n; p1; ··· ; pk ) P(X1 = x1; ··· ; Xk = xk ) = f (x1; ··· ; xk jn; p1; ··· ; pk ) n! x1 xk = p1 ··· pk x1! ··· xk ! k k X X where xi = n and pi = 1 i=1 i=1 E(Xi ) = npi Var(Xi ) = npi (1 − pi ) Cov(Xi ; Xj ) = −npi pj Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 8 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 9 / 33 6.4, 6.5 Covariance and Correlation 6.4, 6.5 General Bivariate Normal Example - Covariance of Multinomial Distribution General Bivariate Normal Marginal distribution of X - consider category i a success and all other categories to be a i Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal failure, therefore in the n trials there are Xi successes and n − Xi failures with the probability of success is pi and failure is 1 − pi which means Xi has a binomial distribution distribution. X ∼ Binom(n; p ) i i 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Similarly, the distribution of Xi + Xj will also follow a Binomial distribution with probability of 2π 2 success pi + pj . Xi + Xj ∼ Binom(n; pi + pj ) We want to transform these unit normal distributions to have the follow parameters: µX ; µY ; σX ; σY ; ρ X = µX + σX Z1 p 2 Y = µY + σY ρ Z1 + 1 − ρ Z2 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 10 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 11 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 12 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 13 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S. We can define n new random variables Y ;:::; Y as follows: 2 2 1 n with X ∼ N (µX ; σX ) and Y ∼ N (µY ; σY ) where Corr(X ; Y ) = ρ we can Y = r (X ;:::; X ) ··· Y = r (X ;:::; X ) generate two independent unit normals Z1 and Z2 and use the 1 1 1 n n n 1 n transformation: If we assume that the n functions r1;:::; rn define a one-to-one differentiable transformation from S to T then let the inverse of this transformation be x1 = s1(y1;:::; yn) ··· xn = sn(y1;:::; yn) X = µX + σX Z1 Then the joint pdf g of Y1;:::; Yn is p 2 Y = µY + σY ρZ1 + 1 − ρ Z2 ( f (s1;:::; sn)jJj for (y1;:::; yn) 2 T g(y1;:::; yn) = 0 otherwise We can also use this result to find the joint density of the Bivariate Where Normal using a 2d change of variables. 2 @s1 ··· @s1 3 @y1 @yn 6 . 7 J = det 6 . 7 4 . 5 @sn ··· @sn @y1 @yn Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 14 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 15 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Density General Bivariate Normal - Density The first thing we need to find are the inverses of the transformation. If Next we calculate the Jacobian, X = r (z ; z ) and Y = r (z ; z ) we need to find functions Z and Z 1 1 1 2 2 1 2 1 2 " @s1 @s1 # " 0 # @x @y σX 1 J = det = det −ρ 1 = such that Z1 = s1(X ; Y ) and Z2 = s2(X ; Y ). @s2 @s2 p p p 2 2 2 σ σ 1 − ρ @x @y σX 1−ρ σY 1−ρ X Y X = σX Z1 + µX The joint density of X and Y is then given by X − µX Z1 = σX p f (x; y) = f (z1; z2)jJj Y = σ [ρZ + 1 − ρ2Z ] + µ Y 1 2 Y 1 1 1 1 = exp − (z2 + z2) jJj = exp − (z2 + z2) Y − µ X − µ 1 2 p 2 1 2 Y X p 2 2π 2 2πσX σY 1 − ρ 2 = ρ + 1 − ρ Z2 σY σX " " !2 !2## 1 Y − µY X − µX 1 1 x − µX 1 y − µY x − µX Z2 = − ρ = exp − + − ρ p 2 p 2 2 1 − ρ σY σX 2πσX σY 1 − ρ 2 σX 1 − ρ σY σX Therefore, " 2 2 !# 1 −1 (x − µX ) (y − µY ) (x − µX ) (y − µY ) = exp + − 2ρ 2πσ σ (1 − ρ2)1=2 2(1 − ρ2) σ2 σ2 σ σ X Y X Y X Y x − µX 1 y − µY x − µX s1(x; y) = s2(x; y) = p − ρ σX 1 − ρ2 σY σX Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 16 / 33 Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 17 / 33 6.4, 6.5 General Bivariate Normal 6.4, 6.5 General Bivariate Normal General Bivariate Normal - Density (Matrix Notation) General Bivariate Normal - Density (Matrix Notation) Obviously, the density for the Bivariate Normal is ugly, and it only gets Recall for a 2 × 2 matrix, a b 1 d −b 1 d −b worse when we consider higher dimensional joint densities of normals. We A = A−1 = = can write the density in a more compact form using matrix notation, c d det A −c a ad − bc −c a Then, 2 x µX σX ρσX σY x = µ = Σ = 2 (x − µ)T Σ−1(x − µ) y µY ρσX σY σY 1 x − µ T σ2 −ρσ σ x − µ = x Y X Y x 2 2 2 2 σ σ (1 − ρ ) y − µy −ρσX σY σX y − µy 1 1 X Y −1=2 T −1 1 2 T f (x) = (det Σ) exp − (x − µ) Σ (x − µ) σY (x − µX ) − ρσX σY (y − µY ) x − µx = 2 2π 2 2 2 2 −ρσ σ (x − µ ) + σ (y − µ ) y − µy σX σY (1 − ρ ) X Y X X Y 1 = σ2 (x − µ )2 − 2ρσ σ (x − µ )(y − µ ) + σ2 (y − µ )2 −1=2 2 2 2 Y X X Y X Y X Y We can confirm our results by checking the value of (det Σ) and σX σY (1 − ρ ) T −1 ! (x − µ) Σ (x − µ) for the bivariate case.

Lecture 20: Covariance / Correlation & General Bivariate Normal

LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function

Spatial Autocorrelation: Covariance and Semivariance Semivariance

Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis

Characteristics and Statistics of Digital Remote Sensing Imagery (1)

Covariance of Cross-Correlations: Towards Efﬁcient Measures for Large-Scale Structure

On the Use of the Autocorrelation and Covariance Methods for Feedforward Control of Transverse Angle and Position Jitter in Linear Particle Beam Accelerators*

Lecture 4 Multivariate Normal Distribution and Multivariate CLT

Lecture 2 Covariance Is the Statistical Measure That Indicates The

Week 7: Multiple Regression

Stat 111 Spring 2011 Week 3 Central Limit Theorem, Covariance, T Distribution Presentations 1. Prove the Central Limit Theorem I

Linear Regression in Matrix Form

The Truth About Linear Regression