<<

Multivariate Edps/Soc 584, Psych 594

Carolyn J. Anderson

Department of Educational Psychology ILLINOIS university of illinois at urbana-champaign

c Board of Trustees, University of Illinois

Spring 2017 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Outline

◮ Motivation ◮ The multivariate normal distribution ◮ The Bivariate Normal Distribution ◮ More properties of multivariate normal ◮ Estimation of µ and Σ ◮ Central Theorem

Reading: Johnson & Wichern pages 149–176

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 2.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Motivation ◮ To be able to make inferences about populations, we need a model for the distribution of random variables We’ll use −→ the multivariate normal distribution, because... ◮ It’s often a good population model. It’s a reasonably good approximation of many phenomenon. A lot of variables are approximately normal (due to the for sums and ). ◮ The distribution of (test) are often approximately multivariate or normal due to the central limit theorem. ◮ Due to it’s central importance, we need to thoroughly understand and know it’s properties.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 3.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Introduction to the Multivariate Normal

◮ The density of the Univariate normal distribution (p = 1 variables):

1 1 x 2 f (x)= exp − for < x < √ 2 −2 σ −∞ ∞ 2πσ ◮ The parameters that completely characterize the distribution: ◮ = E(X ) = ◮ σ2 = var(X ) =

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 4.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Introduction to the Multivariate Normal (continued) Area corresponds to probability: 68% area between σ and 95% between 1.96σ: ± ±

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 5.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Generalization to Multivariate Normal

2 x − − = (x )(σ2) 1(x ) σ − − A squared statistical between x & in standard units. Generalization to p > 1 variables: ◮ We have xp×1 and µp×1 and Σp×p. ◮ The exponent term for multivariate normal is

′ − (x µ) Σ 1(x µ) − − where < xi < for i = 1,..., p. −∞ ∞ ◮ This is a scalar and reduces to what’s at the top for p = 1. − ◮ It is a squared of x to µ (if Σ 1 exists). It takes into consideration both variability and covariability. ◮ Integrating

1 ... exp (x µ)′Σ−1(x µ) = (2π)p/2 Σ 1/2 −2 − − | | x1 xp

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 6.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Proper Distribution

Since the sum of over all possible values must add up to 1, we need to divide by (2π)p/2 Σ 1/2 to get a “proper” density | | function. Multivariate Normal density function:

1 1 ′ − f (x)= exp (x µ) Σ 1(x µ) (2π)p/2 Σ 1/2 −2 − − | | where < xi < for i = 1,..., p. −∞ ∞ To denote this, we use p(µ, Σ) N For p = 1, this reduces to the univariate normal p.d.f.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 7.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Bivariate Normal: p =2

x E(x ) x = 1 E(x)= 1 = 1 = µ x E(x ) 2 2 2 σ σ Σ = 11 12 σ σ 12 22 and −1 1 σ22 σ12 Σ = 2 − σ σ σ σ12 σ11 11 22 − 12 − If we replace σ12 by ρ12√σ11σ22, then we get

−1 1 σ22 ρ12√σ11σ22 Σ = 2 − σ σ (1 ρ ) ρ12√σ11σ22 σ11 11 22 − 12 − Using this, let’s look at the statistical distance of x from µ. . . C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 8.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Bivariate Normal & Statistical Distance The quantity in the exponent of the bivariate normal is

′ − (x µ) Σ 1(x µ) − −

1 = ((x ), (x )) 1 − 1 2 − 2 σ σ (1 ρ2 ) 11 22 − 12 σ ρ √σ σ ρ √σ σ x 22 − 12 11 22 − 12 11 22 1 − 1 × σ11 x2 2 − 2 2 1 x1 1 x2 2 x1 1 x2 2 = 2 − + − 2ρ12 − − 1 ρ √σ11 √σ22 − √σ11 √σ22 − 12 1 2 2 = 2 z1 + z2 2ρ12z1z2 1 ρ12 − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 9.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Bivariate Normal & Independence 2 2 1 1 x1 1 x2 2 f (x) = exp − 2 − + − 2π√σ11σ22 2(1 ρ ) √σ11 √σ22 − 12 x1 1 x2 2 2ρ12 − − − √σ √σ 11 22 If σ12 = 0 or equivalently ρ12 =0, then X1 and X2 are uncorrelated. For bivariate normal, σ12 = 0 implies that X1 and X2 are statistically independent, because the density factors

1 1 x 2 x 2 f (x) = exp − 1 − 1 + 2 − 2 2π√σ11σ22 2 √σ11 √σ22 1 1 x 2 1 1 x 2 = exp − 1 − 1 exp − 2 − 2 √2πσ 2 √σ11 √2πσ 2 √σ22 11 22 = f (x ) f (x ) 1 1 × 2 2 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 10.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Picture: k = 0, σkk = 1, =0.0

0.2

0.15

0.1

0.05

0 5 4 0 2 0 −2 −5 −4 y−axis x−axis

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 11.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Overhead k = 0, σkk = 1, r =0.0 4

3

2

1

0 y−axis −1

−2

−3

−4 −4 −3 −2 −1 0 1 2 3 4 x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 12.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Picture: k = 0, σkk = 1, r =0.75

0.25

0.2

0.15

0.1

0.05

0 5 4 0 2 0 −2 −5 −4 y−axis x−axis

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 13.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Overhead: k = 0, σkk = 1, r =0.75 4

3

2

1

0 y−axis −1

−2

−3

−4 −4 −3 −2 −1 0 1 2 3 4 x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 14.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Summary: Comparing r =0.0 vs r =0.75 For the figures shown, 1 = 2 = 0 and σ11 = σ22 = 1: ◮ With r = 0.0, ◮ Σ = diag(σ11, σ22), a diagonal . ◮ Density is “random” in the x-y plane. ◮ When take a slice parallel to x-y, you get a circle. ◮ When r = .75, ◮ Σ is not a diagonal . ◮ Density is not random in x-y plane. ◮ There is a linear tilt (ie., density is concentrated on a line). ◮ When you take a slice you get an that’s tilted. ◮ Tilt depends on relative values of σ11 and σ22 (and used in plotting). ◮ When Σ = σ2I (i.e., diagonal with equal ), it’s “spherical normal”.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 15.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Real Time Software Demo

◮ binormal.m (Peter Dunn)

◮ Graph Bivariate .R (http://www.stat.ucl.ac.be/ISpersonnel/lecoutre/stats/fichiers/˜gallery

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 16.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Slices of Multivariate Normal Density

◮ For bi-variate normal, you get an ellipse whose equation is

′ − (x µ) Σ 1(x µ)= c2 − −

which gives all (x1, x2) pairs with constant probability. ◮ The are call contours and all are centered around µ. ◮ Definition: A constant probability contour equals

′ − = all x such that (x µ) Σ 1(x µ)= c2 { − − } = surface of centered at µ { }

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 17.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Probability Contours: Axes of ellipsoid Important Points:

◮ (x µ)′Σ−1(x µ) χ2 (if Σ > 0) − − ∼ p | | ◮ The solid ellipsoid of values x that satisfy ′ − (x µ) Σ 1(x µ) c2 = χ2 − − ≤ p(α) has probability (1 α) where χ2 is the (1 α)th100% point of − p(α) − the chi- distribution with p degrees of freedom.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 18.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Example: Axses of Ellipses & Prob. Contours Back to the example where x N with ∼ 2 5 9 16 µ = and Σ = ρ = .667 10 16 64 → and we want the “95% probability contour”. The upper 5% point of the chi-square distribution with 2 degrees 2 √ of freedom is χ2(.05) = 5.9915, so c = 5.9915 = 2.4478 th Axes: µ c√λi ei where (λi , ei ) is the i (i = 1, 2) ± eigenvalue/eigenvector pair of Σ.

′ λ1 = 68.316 e1 = (.2604, .9655) ′ λ = 4.684 e = (.9655, .2604) 2 2 −

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 19.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Major Axis Using the largest eigenvalue and corresponding eigenvector:

5 .2604 2.45 √68.316 10 ± .9655 χ2 λ1 q 2(.05) µ e1 5 .2604 20.250 10 ± .9655 5 5.273 10 ± 19.551 .273 10.273 − , −→ 9.551 29.551 −

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 20.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Minor Axis Same process but now use λ2 and e2, the smallest eigenvalue and corresponding eigenvector:

5 .9655 2.45√4.684 10 ± .2604 − 5 .9655 5.30 10 ± .2604 − 5 5.119 10 ± 1.381 − .119 10.119 − , −→ 11.381 8.619

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 21.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Graph of 95% Probability Contour

x2

(10` .273, 29.551)

( 0.119, 11.381)` cµ’=(5,10) − ` (10.119, 8.619)

x1

` ( 0.273, 9.551) − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 22.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Example: Equation for Contour Equation for Contour:

′ − (x µ) Σ 1 (x µ) 5.99 − − ≤

−1 9 16 (x1 5) ((x1 5), (x2 10)) − 5.99 − − 16 64 (x 10) ≤ 2 −

.200 .050 (x1 5) ((x1 5), (x2 10)) − − 5.99 − − .050 .028 (x2 10) ≤ − − .2(x 5)2 + .028(x 10)2 .1(x 5)(x 10) 5.99 1 − 2 − − 1 − 2 − ≤ (x µ)′Σ−1(x µ) is a , which is equation for a − −

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 23.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Points inside or outside? Are the following points inside or outside the 95% probability contour?

◮ Is the point (10,20) inside or outside the 95% probability contour? (10, 20) .2(10 5)2 + .028(20 10)2 .1(10 5)(20 10) −→ − − − − − = .2(25) + .028(100) .1(50) − = 2.8

◮ Is the point (16,20) inside or outside the 95% probability contour? (16, 20) .2(16 5)2 + .028(20 10)2 .1(16 5)(20 10) −→ − − − − − .2(121) + .028(100) .1(11)(10) − = 16

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 24.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Points Inside and Outside

x2

(16, 20) s s (10, 20)

cµ’=(5,10)

x1

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 25.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

More Properties that we’ll Expand on

If X p(µ, Σ), then ∼N ◮ Linear combinations of components of X are (multivariate) normal.

◮ All sub-sets of the components of X are (multivariate) normal.

◮ Zero implies that the corresponding components of X are statistical independent.

◮ The conditional distributions of the components of X are (multivariate) normal.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 26.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

1: Linear Combinations If X p(µ, Σ), then any ∼N ′ a X = a X + a X + + apXp 1 1 2 2 is distributed as ′ ′ ′ a X (a µ, a Σa) ∼N1 Also, If a′X is normal (a′µ, a′Σa) for all possible a, then X N must be p(µ, Σ). N 5 16 12 a′ = (3, 2) X , ′ ∼N 10 12 36 Y = a X = 3X1 + 2X2 5 16 12 3 = (3, 2) =35 and σ2 = (3, 2) = 432 Y 10 Y 12 36 2 Y (35, 432) C.J. Anderson (Illinois) Multivariate∼N Normal Distribution Spring 2015 27.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

More Linear Combinations If X p(µ, Σ), then the q linear combinations ∼N

a a a p X 11 12 1 1 a21 a22 a2p X2 Yq×1 = Aq×pX =  . . . .   .  ......      aq aq aqp   Xp   1 2     ′    is distributed as q(Aµ, AΣA ). N Also, if Y = AX + d,

where dq×1 is a vector constants, then

′ Y = (Aµ + d, AΣA ). N C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 28.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Numerical Example with Multiple Combinations

5 16 12 X ∼N2 10 12 36 Y1 = X1 + X2 1 1 so A × = Y = X X 2 2 1 1 2 1 − 2 − 1 1 5 15 = Aµ = = Y 1 1 10 5 − − ′ 1 1 16 12 1 1 76 20 ΣY = AΣA = = − 1 1 12 36 1 1 20 28 − − − So 15 76 20 Y 2 , − ∼N 5 20 28 − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 29.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Multiple Regression as an Example This example will use what we know about linear combinations and now what we know about the distribution of linear combinations. Model

◮ Y = response .

◮ Z1, Z2,..., Zr are predictor/explanatory variables, which are considered to be fixed. ◮ The model is

Y = βo + β1Z1 + β2Z2 + . . . + βr Zr + ǫ

◮ The error of prediction ǫ is viewed as a .

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 30.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Multiple Regression as an Example Suppose we have n observations on Y and have values of Zi for all i = 1,..., n; that is,

Y1 = βo + β1Z11 + β2Z12 + . . . + βr Z1r + ǫ1

Y2 = βo + β1Z21 + β2Z22 + . . . + βr Z2r + ǫ2 . . . .

Yn = βo + β1Zn1 + β2Zn2 + . . . + βr Znr + ǫn

2 where E(ǫj ) = 0, var(ǫj )= σ (a constant), and cov(ǫj ,ǫk ) = 0 for j = k. In terms of matrices, βo Y1 1 Z11 Z12 . . . Z1r ǫ1 β1 Y2 1 Z21 Z22 . . . Z2r   ǫ2  .  =  . . . .  β2 +  .  . . . . .  .  .      .     Yn   1 Zn Zn . . . Znr     ǫn     1 2   β         r    Y = Zβ + ǫ where E(ǫ)= 0 and cov(ǫ)= σ2I.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 31.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Distribution of Y

Y = Zβ + ǫ where E(ǫ)= 0 and cov(ǫ)= σ2I. vector of constants random So Y is a linear combination of a multivariate normally distributed variable, ǫ.

◮ Mean of Y:

µY = E(Y)= E(Zβ + ǫ)= Zβ + E(ǫ)= Zβ

◮ Covariance of Y: 2 ΣY = σ I (the same as ǫ). ◮ Distribution of Y is multivariate normal because ǫ is multivariate normal: Y (Zβ,σ2I) ∼N

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 32.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Least Square Estimation

Y = Zβ + ǫ where E(ǫ)= 0 and cov(ǫ)= σ2I β and σ2 are unknown parameters that need to be estimated from .

Let y1, y2,..., yn be a random with values z1, z2,..., zr on the explanatory variables. The estimate of β is the vector b that minimizes

n n ′ 2 2 (yj z b) = (yj bo b zj b zj . . . br zjr ) − j − − 1 1 − 2 2 − − j=1 j=1 ′ = (y Zb) (y Zb) ′ − − = ǫ ǫ

′ th ′ where zj is the j row of Z, and b = (bo , b1, b2,..., br ) . If Z has full rank (i.e., the rank of Z is r + 1 n), then the least ≤ squares estimate of β is

′ − ′ βˆ = (Z Z) 1Z y

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 33.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

What’s the distribution of βˆ?

′ − ′ βˆ = (Z Z) 1Z y = Ay 2 We showed that Y n(Zβ,σ I). ∼N ◮ Mean of βˆ: = E(βˆ) = E(AY) βˆ = AE(Y) = AZβ ′ − ′ = (Z Z) 1 Z Z β = β

for βˆ

Σ AΣ A′ βˆ = Y ′ − ′ ′ − = ((Z Z) 1Z )(σ2I)(Z(Z Z) 1) ′ − ′ ′ − = σ2(Z Z) 1Z Z(Z Z) 1 ′ − = σ2(Z Z) 1

′ ◮ The distribution of βˆ: βˆ (β,σ2(Z Z)−1). ∼N

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 34.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

The distribution of Yˆ The “fitted values” or predicted values are

ˆy = Zβˆ = Hy

′ ′ where H = Z(Z Z)−1Z . The matrix H is the “hat” matrix.

′ ◮ We just showed that βˆ (β,σ2(Z Z)−1), and so yˆ is a linear ∼N combination of a vector that’s multivariate normal. ◮ Mean of Yˆ : E Zβˆ ZE βˆ Zβ Yˆ = ( )= ( )=

◮ Covariance matrix for Yˆ

ZΣ Z′ Z 2 Z′Z −1 Z′ 2Z′Z Z′Z −1 2I βˆ = (σ ( ) ) = σ ( ) = σ

◮ Distribution of Yˆ : Yˆ (Zβ,σ2I) ∼N

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 35.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

The distribution ofǫ ˆ The estimated residuals are ǫˆ = y yˆ = (I H)y − − and they contain the necessary to estimate σ2. The least squares estimate of σ2 is

ˆǫ′ǫˆ s2 = n (r + 1) − The estimates βˆ andǫ ˆ are uncorrelated. 2 Multivariate Assumption ǫ Nn(0,σ I) and what we ∼ know about linear combinations of random variables allowed us to derive the distribution of various random variables.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 36.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

The distribution ofǫ ˆ

Last few comments on this example:

◮ The least squares estimates of β and ǫ are also the maximum likelihood estimates.

◮ The maximum likelihood estimate of σ2 isσ ˆ2 =ǫ ˆ′ǫ/ˆ n

◮ βˆ and ˆǫ are statistically independent.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 37.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

2: Sub-sets of Variables If X p(µ, Σ), then all sub-sets of X are (multivariate) nor- ∼N mally distributed. For example, let’s partition X into two sub-sets

X1 1 . .  .   .  Xq X1(q×1) q µ1(q×1) Xp×1 =   = and µ =   =  Xq  X − ×  q  µ − ×  +1  2((p q) 1)  +1  2((p q) 1)  .   .   .   .       X     p   p     

Σ11(q×q) Σ12(q×(p−q)) Σ11 Σ12 Σp×p = = Σ − × Σ − × − Σ Σ 21((p q) p) 22((p q) (p q)) 21 22

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 38.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Sub-sets of Variables continued Then for

X × X = 1(q 1) X − × 2((p q) 1) The distributions of the sub-sets are X (µ , Σ ) and X (µ , Σ ) 1 ∼N 1 11 2 ∼N 2 22 The result that

◮ Each of the Xi ’s are univariate normals (next page) ◮ All possible sub-sets are multivariate normal. ◮ All marginal distributions are (multivariate) normal.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 39.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Little Example on Sub-sets Suppose that X1 X = X (µ, Σ)  2  ∼N3 X3 Due to the result on sub-sets of multivariate normals,

X ( ,σ ) 1 ∼N 1 11 X ( ,σ ) 2 ∼N 2 22 X ( ,σ ) 3 ∼N 3 33 Also X2 2 σ22 σ23 X ∼N σ σ 3 3 32 33 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 40.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

3: Zero Covariance & Statistical Independence There are three parts to this one: ◮ If X is (q 1) and X is (q 1) are 1 1 × 2 2 × statistically independent, then cov(X1, X2)= Σ12 = 0.

◮ If X1 µ1 Σ11 Σ12 q1+q2 , , X2 ∼ N µ Σ21 Σ22 2 Then X1 and X2 are statistically independent if and only if ′ Σ12 = Σ21 = 0.

◮ If X1 and X2 are statistically independent and distributed as q (µ , Σ ) and q (µ , Σ ), respectively, then N 1 1 11 N 2 2 2

X1 µ1 Σ11 0 q1+q2 , . X2 ∼ N µ 0 Σ22 2 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 41.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Example

Y1 2 1 0 .5 Y2 1 3 0 .5 Y4×1 =   and ΣY =   Y3 0 040  Y   .5 .5 0 1   4        and Y 4(µ, Σ). ∼N ′ ′ Let’s take X1 = (Y1, Y2, Y4) and X2 = (Y3). Then 1 2 1 .5 0 X1 2 1 3 .5 0 4   ,   X ∼N 4 .5 .5 1 0 2    0 0 0 4   3        So X1 is statistically independent of X2.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 42.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

4: Conditional Distributions

′ ′ ′ Let X = (X × , X × ) be distributed at q1+q2(µ, Σ) 1(q1 1) 2(q2 1) N with µ Σ Σ µ = 1 and Σ = 11 12 µ Σ Σ 2 21 22 and Σ > 0 (i.e., positive definite). Then the | | conditional distribution of X1 given X2 = x2 is (multivariate) normal with mean and covariance matrix

− − µ + Σ Σ 1(x µ ) and Σ Σ Σ 1Σ 1 12 22 2 − 2 11 − 12 22 21 Let’s look more closely at this for a simple case of q1 = q2 = 1.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 43.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Conditional Distribution for q1 = q2 =1 Bivariate normal distribution X σ σ 1 1 , 11 12 X ∼N2 σ σ 2 2 21 22 σ σ f (x x ) is + 12 (x ),σ σ 12 1| 2 N1 1 σ 2 − 2 11 − 12 σ 22 22 Notes: ◮ σ12 = ρ12√σ11√σ22

◮ −1 Σ12Σ22 = σ12/σ22 = ρ12(√σ11/√σ22) − ◮ Σ Σ Σ 1Σ = σ σ2 /σ = σ (1 ρ2 ) 11 − 12 22 21 11 − 12 22 11 − 12 Alternative way to write f (x x ): 1| 2 √σ11 2 f (x1 x2) is 1 1 + ρ12 (x2 2),σ11(1 ρ12) | N √σ22 − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 44.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Multiple Regression as a Conditional Dist. Consider the case where q1 = 1 and q2 > 1. ◮ All conditional distributions are normal. − ◮ The conditional covariance matrix Σ Σ Σ 1Σ does not 11 − 12 22 21 depend on the values of the conditioning variables. ◮ The conditional means have the following form:

β q β q β q q 1, 1+1 1, 1+2 1, 1+ 2 β q β q β q q −1  2, 1+1 2, 1+2 2, 1+ 2  Let Σ12Σ22 = βq ×q = . 1 2 ..    βq q βq q βq q q   1, 1+1 1, 1+2 1, 1+ 2    + q1+q2 β (x ) 1 i=q1+1 1i i i q1+q2 − + β i (xi i )  2 i=q1+1 2  Condtional means − .  .   q1+q2   q1 + i q βq1i (xi i )   = 1+1 −    C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 45.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Estimation of and Σ & of . Suppose we have a p dimensional normal distribution with mean µ and covariance matrix Σ.

Take n observations x , x ,..., xn (these are each (p 1) vectors). 1 2 ×

Xj p(µ, Σ) j = 1, 2,..., n and independent ∼N For p = 1, we know that the MLEs are n 1 1 2 ˆ =x ¯ = xj , σ n ∼N n j =1 n n 2 2 1 2 2 And nσˆ = (xj x¯) and (xj x¯) χ − − σ2 − ∼ (n 1) j j =1 =1 n 2 1 2 2 2 Orσ ˆ = (xj x¯) σ χ − n − ∼ (n 1) j =1 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 46.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Estimation of and Σ: Multivariate Case The maximum likelihood of µ is n 1 µˆ = X¯ = X n j j =1 and the ML estimator of Σ is n n 1 2 1 ′ Σˆ = − S = Sn = (Xj µˆ)(Xj µˆ) n n − − j =1 Sampling Distribution ofµ ˆ: The estimator is a linear combination of normal random vectors each from p(µ, Σ) i.i.d.: N 1 1 1 µˆ = X¯ = X + X + + Xn n 1 n 2 n Soµ ˆ also has a normal distribution, C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 47.1/ 56 1 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Sampling Distribution of Σˆ n 1 Σˆ = − S n The matrix n ′ (n 1)S = (xj ¯x)(xj ¯x) − − − j =1 is distributed as a Wishart with (n 1) degrees of − freedom. Whishart distribution:

◮ A multivariate analogue to the chi-square distribution. ◮ It’s defined as

Wm( Σ) = with m degrees of freedom | m ′ = The distribution of Zj Zj j =1

where Zj p(0, Σ) and independent. ∼N Note : X¯ and S are independent. C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 48.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Law of Large Numbers

Data are not always (multivariate) normal The (for multivariate data):

Let X1, X2,..., Xn be independent observations from a population with mean E(X)= µ. ¯ n Then X = (1/n) j=1 Xj converges in probability to as n gets large; that is, X¯ µ for large samples → And S(or Sn) approach Σ for large samples

These are true regardless of the true distribution of the Xj ’s.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 49.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Central Limit Theorem Let X1, X2,..., Xn be independent observations from a population with mean E(X)= µ and finite (non-singular, full rank), covariance matrix Σ. Then √n(X µ) has an approximate (0, Σ) distribution if − N n >> p (i.e., “much larger than”). So, for “large” n 1 X¯ = Sample mean vector (µ, Σ), ≈N n regardless of the underlying distribution of the Xj ’s. What if Σ is unknown? If n is large “enough”, S will be close to Σ, so 1 √n(X¯ µ) p(0, S) or X¯ p(µ, S). − ≈N ≈N n ¯ ′ −1 ¯ 2 Since n(X µ) Σ (X µ) χp, − − ∼′ − n(X¯ µ) S 1(X¯ µ) χ2 − − ≈ p C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 50.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Few more comments

◮ Using S instead of Σ does not seriously effect approximation.

◮ n must be large relative to p; that is, (n p) is large. − ◮ The probability contours for X¯ are tighter than those for X since we have (1/n)Σ for X¯ rather than Σ for X. See next slide for an example of the latter.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 51.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Comparison of Probability Contours Returning to our example and pretending we have n = 20. Below are contours for 99%, 95%, 90%, 75%, 50% and 20%:

Contours for Xj Contours for X¯

b b

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 52.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Why So Much a Difference with Only 20?

For Xj

9 16 Σ = λ = 68.316 and λ = 4.684 16 64 −→ 1 2 For X¯ with n = 20 1 9 16 0.45 0.80 Σ = = λ = 3.42 and λ = 0.23 20 16 64 0.80 3.20 −→ 1 2

Note that 68.316/20 = 3.42 and 4.684/20 = 0.23.

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 53.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Other Multivariate Distributions: Skew-Normal

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 54.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Marshall-Olkin bivariate exponential

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 55.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others

Contours for 4 different ones

C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 56.1/ 56