Multivariate Normal Distribution Edps/Soc 584, Psych 594
Carolyn J. Anderson
Department of Educational Psychology ILLINOIS university of illinois at urbana-champaign
c Board of Trustees, University of Illinois
Spring 2017 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Outline
◮ Motivation ◮ The multivariate normal distribution ◮ The Bivariate Normal Distribution ◮ More properties of multivariate normal ◮ Estimation of µ and Σ ◮ Central Limit Theorem
Reading: Johnson & Wichern pages 149–176
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 2.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Motivation ◮ To be able to make inferences about populations, we need a model for the distribution of random variables We’ll use −→ the multivariate normal distribution, because... ◮ It’s often a good population model. It’s a reasonably good approximation of many phenomenon. A lot of variables are approximately normal (due to the central limit theorem for sums and averages). ◮ The sampling distribution of (test) statistics are often approximately multivariate or univariate normal due to the central limit theorem. ◮ Due to it’s central importance, we need to thoroughly understand and know it’s properties.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 3.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Introduction to the Multivariate Normal
◮ The probability density function of the Univariate normal distribution (p = 1 variables):
1 1 x 2 f (x)= exp − for < x < √ 2 −2 σ −∞ ∞ 2πσ ◮ The parameters that completely characterize the distribution: ◮ = E(X ) = mean ◮ σ2 = var(X ) = variance
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 4.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Introduction to the Multivariate Normal (continued) Area corresponds to probability: 68% area between σ and 95% between 1.96σ: ± ±
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 5.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Generalization to Multivariate Normal
2 x − − = (x )(σ2) 1(x ) σ − − A squared statistical distance between x & in standard deviation units. Generalization to p > 1 variables: ◮ We have xp×1 and parameters µp×1 and Σp×p. ◮ The exponent term for multivariate normal is
′ − (x µ) Σ 1(x µ) − − where < xi < for i = 1,..., p. −∞ ∞ ◮ This is a scalar and reduces to what’s at the top for p = 1. − ◮ It is a squared statistical distance of x to µ (if Σ 1 exists). It takes into consideration both variability and covariability. ◮ Integrating
1 ... exp (x µ)′Σ−1(x µ) = (2π)p/2 Σ 1/2 −2 − − | | x1 xp
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 6.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Proper Distribution
Since the sum of probabilities over all possible values must add up to 1, we need to divide by (2π)p/2 Σ 1/2 to get a “proper” density | | function. Multivariate Normal density function:
1 1 ′ − f (x)= exp (x µ) Σ 1(x µ) (2π)p/2 Σ 1/2 −2 − − | | where < xi < for i = 1,..., p. −∞ ∞ To denote this, we use p(µ, Σ) N For p = 1, this reduces to the univariate normal p.d.f.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 7.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Bivariate Normal: p =2
x E(x ) x = 1 E(x)= 1 = 1 = µ x E(x ) 2 2 2 σ σ Σ = 11 12 σ σ 12 22 and −1 1 σ22 σ12 Σ = 2 − σ σ σ σ12 σ11 11 22 − 12 − If we replace σ12 by ρ12√σ11σ22, then we get
−1 1 σ22 ρ12√σ11σ22 Σ = 2 − σ σ (1 ρ ) ρ12√σ11σ22 σ11 11 22 − 12 − Using this, let’s look at the statistical distance of x from µ. . . C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 8.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Bivariate Normal & Statistical Distance The quantity in the exponent of the bivariate normal is
′ − (x µ) Σ 1(x µ) − −
1 = ((x ), (x )) 1 − 1 2 − 2 σ σ (1 ρ2 ) 11 22 − 12 σ ρ √σ σ ρ √σ σ x 22 − 12 11 22 − 12 11 22 1 − 1 × σ11 x2 2 − 2 2 1 x1 1 x2 2 x1 1 x2 2 = 2 − + − 2ρ12 − − 1 ρ √σ11 √σ22 − √σ11 √σ22 − 12 1 2 2 = 2 z1 + z2 2ρ12z1z2 1 ρ12 − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 9.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Bivariate Normal & Independence 2 2 1 1 x1 1 x2 2 f (x) = exp − 2 − + − 2π√σ11σ22 2(1 ρ ) √σ11 √σ22 − 12 x1 1 x2 2 2ρ12 − − − √σ √σ 11 22 If σ12 = 0 or equivalently ρ12 =0, then X1 and X2 are uncorrelated. For bivariate normal, σ12 = 0 implies that X1 and X2 are statistically independent, because the density factors
1 1 x 2 x 2 f (x) = exp − 1 − 1 + 2 − 2 2π√σ11σ22 2 √σ11 √σ22 1 1 x 2 1 1 x 2 = exp − 1 − 1 exp − 2 − 2 √2πσ 2 √σ11 √2πσ 2 √σ22 11 22 = f (x ) f (x ) 1 1 × 2 2 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 10.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Picture: k = 0, σkk = 1, r =0.0
0.2
0.15
0.1
0.05
0 5 4 0 2 0 −2 −5 −4 y−axis x−axis
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 11.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Overhead k = 0, σkk = 1, r =0.0 4
3
2
1
0 y−axis −1
−2
−3
−4 −4 −3 −2 −1 0 1 2 3 4 x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 12.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Picture: k = 0, σkk = 1, r =0.75
0.25
0.2
0.15
0.1
0.05
0 5 4 0 2 0 −2 −5 −4 y−axis x−axis
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 13.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Overhead: k = 0, σkk = 1, r =0.75 4
3
2
1
0 y−axis −1
−2
−3
−4 −4 −3 −2 −1 0 1 2 3 4 x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 14.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Summary: Comparing r =0.0 vs r =0.75 For the figures shown, 1 = 2 = 0 and σ11 = σ22 = 1: ◮ With r = 0.0, ◮ Σ = diag(σ11, σ22), a diagonal matrix. ◮ Density is “random” in the x-y plane. ◮ When take a slice parallel to x-y, you get a circle. ◮ When r = .75, ◮ Σ is not a diagonal . ◮ Density is not random in x-y plane. ◮ There is a linear tilt (ie., density is concentrated on a line). ◮ When you take a slice you get an ellipse that’s tilted. ◮ Tilt depends on relative values of σ11 and σ22 (and scale used in plotting). ◮ When Σ = σ2I (i.e., diagonal with equal variances), it’s “spherical normal”.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 15.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Real Time Software Demo
◮ binormal.m (Peter Dunn)
◮ Graph Bivariate .R (http://www.stat.ucl.ac.be/ISpersonnel/lecoutre/stats/fichiers/˜gallery
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 16.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Slices of Multivariate Normal Density
◮ For bi-variate normal, you get an ellipse whose equation is
′ − (x µ) Σ 1(x µ)= c2 − −
which gives all (x1, x2) pairs with constant probability. ◮ The ellipses are call contours and all are centered around µ. ◮ Definition: A constant probability contour equals
′ − = all x such that (x µ) Σ 1(x µ)= c2 { − − } = surface of ellipsoid centered at µ { }
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 17.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Probability Contours: Axes of ellipsoid Important Points:
◮ (x µ)′Σ−1(x µ) χ2 (if Σ > 0) − − ∼ p | | ◮ The solid ellipsoid of values x that satisfy ′ − (x µ) Σ 1(x µ) c2 = χ2 − − ≤ p(α) has probability (1 α) where χ2 is the (1 α)th100% point of − p(α) − the chi-square distribution with p degrees of freedom.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 18.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Example: Axses of Ellipses & Prob. Contours Back to the example where x N with ∼ 2 5 9 16 µ = and Σ = ρ = .667 10 16 64 → and we want the “95% probability contour”. The upper 5% point of the chi-square distribution with 2 degrees 2 √ of freedom is χ2(.05) = 5.9915, so c = 5.9915 = 2.4478 th Axes: µ c√λi ei where (λi , ei ) is the i (i = 1, 2) ± eigenvalue/eigenvector pair of Σ.
′ λ1 = 68.316 e1 = (.2604, .9655) ′ λ = 4.684 e = (.9655, .2604) 2 2 −
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 19.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Major Axis Using the largest eigenvalue and corresponding eigenvector:
5 .2604 2.45 √68.316 10 ± .9655 χ2 λ1 q 2(.05) µ e1 5 .2604 20.250 10 ± .9655 5 5.273 10 ± 19.551 .273 10.273 − , −→ 9.551 29.551 −
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 20.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Minor Axis Same process but now use λ2 and e2, the smallest eigenvalue and corresponding eigenvector:
5 .9655 2.45√4.684 10 ± .2604 − 5 .9655 5.30 10 ± .2604 − 5 5.119 10 ± 1.381 − .119 10.119 − , −→ 11.381 8.619
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 21.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Graph of 95% Probability Contour
x2
(10` .273, 29.551)
( 0.119, 11.381)` cµ’=(5,10) − ` (10.119, 8.619)
x1
` ( 0.273, 9.551) − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 22.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Example: Equation for Contour Equation for Contour:
′ − (x µ) Σ 1 (x µ) 5.99 − − ≤
−1 9 16 (x1 5) ((x1 5), (x2 10)) − 5.99 − − 16 64 (x 10) ≤ 2 −
.200 .050 (x1 5) ((x1 5), (x2 10)) − − 5.99 − − .050 .028 (x2 10) ≤ − − .2(x 5)2 + .028(x 10)2 .1(x 5)(x 10) 5.99 1 − 2 − − 1 − 2 − ≤ (x µ)′Σ−1(x µ) is a quadratic form, which is equation for a − − polynomial
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 23.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Points inside or outside? Are the following points inside or outside the 95% probability contour?
◮ Is the point (10,20) inside or outside the 95% probability contour? (10, 20) .2(10 5)2 + .028(20 10)2 .1(10 5)(20 10) −→ − − − − − = .2(25) + .028(100) .1(50) − = 2.8
◮ Is the point (16,20) inside or outside the 95% probability contour? (16, 20) .2(16 5)2 + .028(20 10)2 .1(16 5)(20 10) −→ − − − − − .2(121) + .028(100) .1(11)(10) − = 16
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 24.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Points Inside and Outside
x2
(16, 20) s s (10, 20)
cµ’=(5,10)
x1
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 25.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
More Properties that we’ll Expand on
If X p(µ, Σ), then ∼N ◮ Linear combinations of components of X are (multivariate) normal.
◮ All sub-sets of the components of X are (multivariate) normal.
◮ Zero covariance implies that the corresponding components of X are statistical independent.
◮ The conditional distributions of the components of X are (multivariate) normal.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 26.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
1: Linear Combinations If X p(µ, Σ), then any linear combination ∼N ′ a X = a X + a X + + apXp 1 1 2 2 is distributed as ′ ′ ′ a X (a µ, a Σa) ∼N1 Also, If a′X is normal (a′µ, a′Σa) for all possible a, then X N must be p(µ, Σ). N 5 16 12 a′ = (3, 2) X , ′ ∼N 10 12 36 Y = a X = 3X1 + 2X2 5 16 12 3 = (3, 2) =35 and σ2 = (3, 2) = 432 Y 10 Y 12 36 2 Y (35, 432) C.J. Anderson (Illinois) Multivariate∼N Normal Distribution Spring 2015 27.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
More Linear Combinations If X p(µ, Σ), then the q linear combinations ∼N
a a a p X 11 12 1 1 a21 a22 a2p X2 Yq×1 = Aq×pX = . . . . . ...... aq aq aqp Xp 1 2 ′ is distributed as q(Aµ, AΣA ). N Also, if Y = AX + d,
where dq×1 is a vector constants, then
′ Y = (Aµ + d, AΣA ). N C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 28.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Numerical Example with Multiple Combinations
5 16 12 X ∼N2 10 12 36 Y1 = X1 + X2 1 1 so A × = Y = X X 2 2 1 1 2 1 − 2 − 1 1 5 15 = Aµ = = Y 1 1 10 5 − − ′ 1 1 16 12 1 1 76 20 ΣY = AΣA = = − 1 1 12 36 1 1 20 28 − − − So 15 76 20 Y 2 , − ∼N 5 20 28 − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 29.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Multiple Regression as an Example This example will use what we know about linear combinations and now what we know about the distribution of linear combinations. Linear Regression Model
◮ Y = response variable.
◮ Z1, Z2,..., Zr are predictor/explanatory variables, which are considered to be fixed. ◮ The model is
Y = βo + β1Z1 + β2Z2 + . . . + βr Zr + ǫ
◮ The error of prediction ǫ is viewed as a random variable.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 30.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Multiple Regression as an Example Suppose we have n observations on Y and have values of Zi for all i = 1,..., n; that is,
Y1 = βo + β1Z11 + β2Z12 + . . . + βr Z1r + ǫ1
Y2 = βo + β1Z21 + β2Z22 + . . . + βr Z2r + ǫ2 . . . .
Yn = βo + β1Zn1 + β2Zn2 + . . . + βr Znr + ǫn
2 where E(ǫj ) = 0, var(ǫj )= σ (a constant), and cov(ǫj ,ǫk ) = 0 for j = k. In terms of matrices, βo Y1 1 Z11 Z12 . . . Z1r ǫ1 β1 Y2 1 Z21 Z22 . . . Z2r ǫ2 . = . . . . β2 + . . . . . . . . . Yn 1 Zn Zn . . . Znr ǫn 1 2 β r Y = Zβ + ǫ where E(ǫ)= 0 and cov(ǫ)= σ2I.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 31.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Distribution of Y
Y = Zβ + ǫ where E(ǫ)= 0 and cov(ǫ)= σ2I. vector of constants random So Y is a linear combination of a multivariate normally distributed variable, ǫ.
◮ Mean of Y:
µY = E(Y)= E(Zβ + ǫ)= Zβ + E(ǫ)= Zβ
◮ Covariance of Y: 2 ΣY = σ I (the same as ǫ). ◮ Distribution of Y is multivariate normal because ǫ is multivariate normal: Y (Zβ,σ2I) ∼N
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 32.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Least Square Estimation
Y = Zβ + ǫ where E(ǫ)= 0 and cov(ǫ)= σ2I β and σ2 are unknown parameters that need to be estimated from data.
Let y1, y2,..., yn be a random sample with values z1, z2,..., zr on the explanatory variables. The least squares estimate of β is the vector b that minimizes
n n ′ 2 2 (yj z b) = (yj bo b zj b zj . . . br zjr ) − j − − 1 1 − 2 2 − − j=1 j=1 ′ = (y Zb) (y Zb) ′ − − = ǫ ǫ
′ th ′ where zj is the j row of Z, and b = (bo , b1, b2,..., br ) . If Z has full rank (i.e., the rank of Z is r + 1 n), then the least ≤ squares estimate of β is
′ − ′ βˆ = (Z Z) 1Z y
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 33.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
What’s the distribution of βˆ?
′ − ′ βˆ = (Z Z) 1Z y = Ay 2 We showed that Y n(Zβ,σ I). ∼N ◮ Mean of βˆ: = E(βˆ) = E(AY) βˆ = AE(Y) = AZβ ′ − ′ = (Z Z) 1 Z Z β = β
◮ Covariance matrix for βˆ
Σ AΣ A′ βˆ = Y ′ − ′ ′ − = ((Z Z) 1Z )(σ2I)(Z(Z Z) 1) ′ − ′ ′ − = σ2(Z Z) 1Z Z(Z Z) 1 ′ − = σ2(Z Z) 1
′ ◮ The distribution of βˆ: βˆ (β,σ2(Z Z)−1). ∼N
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 34.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
The distribution of Yˆ The “fitted values” or predicted values are
ˆy = Zβˆ = Hy
′ ′ where H = Z(Z Z)−1Z . The matrix H is the “hat” matrix.
′ ◮ We just showed that βˆ (β,σ2(Z Z)−1), and so yˆ is a linear ∼N combination of a vector that’s multivariate normal. ◮ Mean of Yˆ : E Zβˆ ZE βˆ Zβ Yˆ = ( )= ( )=
◮ Covariance matrix for Yˆ
ZΣ Z′ Z 2 Z′Z −1 Z′ 2Z′Z Z′Z −1 2I βˆ = (σ ( ) ) = σ ( ) = σ
◮ Distribution of Yˆ : Yˆ (Zβ,σ2I) ∼N
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 35.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
The distribution ofǫ ˆ The estimated residuals are ǫˆ = y yˆ = (I H)y − − and they contain the information necessary to estimate σ2. The least squares estimate of σ2 is
ˆǫ′ǫˆ s2 = n (r + 1) − The estimates βˆ andǫ ˆ are uncorrelated. 2 Multivariate Normality Assumption ǫ Nn(0,σ I) and what we ∼ know about linear combinations of random variables allowed us to derive the distribution of various random variables.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 36.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
The distribution ofǫ ˆ
Last few comments on this example:
◮ The least squares estimates of β and ǫ are also the maximum likelihood estimates.
◮ The maximum likelihood estimate of σ2 isσ ˆ2 =ǫ ˆ′ǫ/ˆ n
◮ βˆ and ˆǫ are statistically independent.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 37.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
2: Sub-sets of Variables If X p(µ, Σ), then all sub-sets of X are (multivariate) nor- ∼N mally distributed. For example, let’s partition X into two sub-sets
X1 1 . . . . Xq X1(q×1) q µ1(q×1) Xp×1 = = and µ = = Xq X − × q µ − × +1 2((p q) 1) +1 2((p q) 1) . . . . X p p
Σ11(q×q) Σ12(q×(p−q)) Σ11 Σ12 Σp×p = = Σ − × Σ − × − Σ Σ 21((p q) p) 22((p q) (p q)) 21 22
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 38.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Sub-sets of Variables continued Then for
X × X = 1(q 1) X − × 2((p q) 1) The distributions of the sub-sets are X (µ , Σ ) and X (µ , Σ ) 1 ∼N 1 11 2 ∼N 2 22 The result means that
◮ Each of the Xi ’s are univariate normals (next page) ◮ All possible sub-sets are multivariate normal. ◮ All marginal distributions are (multivariate) normal.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 39.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Little Example on Sub-sets Suppose that X1 X = X (µ, Σ) 2 ∼N3 X3 Due to the result on sub-sets of multivariate normals,
X ( ,σ ) 1 ∼N 1 11 X ( ,σ ) 2 ∼N 2 22 X ( ,σ ) 3 ∼N 3 33 Also X2 2 σ22 σ23 X ∼N σ σ 3 3 32 33 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 40.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
3: Zero Covariance & Statistical Independence There are three parts to this one: ◮ If X is (q 1) and X is (q 1) are 1 1 × 2 2 × statistically independent, then cov(X1, X2)= Σ12 = 0.
◮ If X1 µ1 Σ11 Σ12 q1+q2 , , X2 ∼ N µ Σ21 Σ22 2 Then X1 and X2 are statistically independent if and only if ′ Σ12 = Σ21 = 0.
◮ If X1 and X2 are statistically independent and distributed as q (µ , Σ ) and q (µ , Σ ), respectively, then N 1 1 11 N 2 2 2
X1 µ1 Σ11 0 q1+q2 , . X2 ∼ N µ 0 Σ22 2 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 41.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Example
Y1 2 1 0 .5 Y2 1 3 0 .5 Y4×1 = and ΣY = Y3 0 040 Y .5 .5 0 1 4 and Y 4(µ, Σ). ∼N ′ ′ Let’s take X1 = (Y1, Y2, Y4) and X2 = (Y3). Then 1 2 1 .5 0 X1 2 1 3 .5 0 4 , X ∼N 4 .5 .5 1 0 2 0 0 0 4 3 So set X1 is statistically independent of X2.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 42.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
4: Conditional Distributions
′ ′ ′ Let X = (X × , X × ) be distributed at q1+q2(µ, Σ) 1(q1 1) 2(q2 1) N with µ Σ Σ µ = 1 and Σ = 11 12 µ Σ Σ 2 21 22 and Σ > 0 (i.e., positive definite). Then the | | conditional distribution of X1 given X2 = x2 is (multivariate) normal with mean and covariance matrix
− − µ + Σ Σ 1(x µ ) and Σ Σ Σ 1Σ 1 12 22 2 − 2 11 − 12 22 21 Let’s look more closely at this for a simple case of q1 = q2 = 1.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 43.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Conditional Distribution for q1 = q2 =1 Bivariate normal distribution X σ σ 1 1 , 11 12 X ∼N2 σ σ 2 2 21 22 σ σ f (x x ) is + 12 (x ),σ σ 12 1| 2 N1 1 σ 2 − 2 11 − 12 σ 22 22 Notes: ◮ σ12 = ρ12√σ11√σ22
◮ −1 Σ12Σ22 = σ12/σ22 = ρ12(√σ11/√σ22) − ◮ Σ Σ Σ 1Σ = σ σ2 /σ = σ (1 ρ2 ) 11 − 12 22 21 11 − 12 22 11 − 12 Alternative way to write f (x x ): 1| 2 √σ11 2 f (x1 x2) is 1 1 + ρ12 (x2 2),σ11(1 ρ12) | N √σ22 − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 44.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Multiple Regression as a Conditional Dist. Consider the case where q1 = 1 and q2 > 1. ◮ All conditional distributions are normal. − ◮ The conditional covariance matrix Σ Σ Σ 1Σ does not 11 − 12 22 21 depend on the values of the conditioning variables. ◮ The conditional means have the following form:
β q β q β q q 1, 1+1 1, 1+2 1, 1+ 2 β q β q β q q −1 2, 1+1 2, 1+2 2, 1+ 2 Let Σ12Σ22 = βq ×q = . 1 2 .. βq q βq q βq q q 1, 1+1 1, 1+2 1, 1+ 2 + q1+q2 β (x ) 1 i=q1+1 1i i i q1+q2 − + β i (xi i ) 2 i=q1+1 2 Condtional means − . . q1+q2 q1 + i q βq1i (xi i ) = 1+1 − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 45.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Estimation of and Σ & sampling distribution of estimators. Suppose we have a p dimensional normal distribution with mean µ and covariance matrix Σ.
Take n observations x , x ,..., xn (these are each (p 1) vectors). 1 2 ×
Xj p(µ, Σ) j = 1, 2,..., n and independent ∼N For p = 1, we know that the MLEs are n 1 1 2 ˆ =x ¯ = xj , σ n ∼N n j =1 n n 2 2 1 2 2 And nσˆ = (xj x¯) and (xj x¯) χ − − σ2 − ∼ (n 1) j j =1 =1 n 2 1 2 2 2 Orσ ˆ = (xj x¯) σ χ − n − ∼ (n 1) j =1 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 46.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Estimation of and Σ: Multivariate Case The maximum likelihood estimator of µ is n 1 µˆ = X¯ = X n j j =1 and the ML estimator of Σ is n n 1 2 1 ′ Σˆ = − S = Sn = (Xj µˆ)(Xj µˆ) n n − − j =1 Sampling Distribution ofµ ˆ: The estimator is a linear combination of normal random vectors each from p(µ, Σ) i.i.d.: N 1 1 1 µˆ = X¯ = X + X + + Xn n 1 n 2 n Soµ ˆ also has a normal distribution, C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 47.1/ 56 1 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Sampling Distribution of Σˆ n 1 Σˆ = − S n The matrix n ′ (n 1)S = (xj ¯x)(xj ¯x) − − − j =1 is distributed as a Wishart random matrix with (n 1) degrees of − freedom. Whishart distribution:
◮ A multivariate analogue to the chi-square distribution. ◮ It’s defined as
Wm( Σ) = Wishart distribution with m degrees of freedom | m ′ = The distribution of Zj Zj j =1
where Zj p(0, Σ) and independent. ∼N Note : X¯ and S are independent. C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 48.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Law of Large Numbers
Data are not always (multivariate) normal The Law of Large Numbers (for multivariate data):
Let X1, X2,..., Xn be independent observations from a population with mean E(X)= µ. ¯ n Then X = (1/n) j=1 Xj converges in probability to as n gets large; that is, X¯ µ for large samples → And S(or Sn) approach Σ for large samples
These are true regardless of the true distribution of the Xj ’s.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 49.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Central Limit Theorem Let X1, X2,..., Xn be independent observations from a population with mean E(X)= µ and finite (non-singular, full rank), covariance matrix Σ. Then √n(X µ) has an approximate (0, Σ) distribution if − N n >> p (i.e., “much larger than”). So, for “large” n 1 X¯ = Sample mean vector (µ, Σ), ≈N n regardless of the underlying distribution of the Xj ’s. What if Σ is unknown? If n is large “enough”, S will be close to Σ, so 1 √n(X¯ µ) p(0, S) or X¯ p(µ, S). − ≈N ≈N n ¯ ′ −1 ¯ 2 Since n(X µ) Σ (X µ) χp, − − ∼′ − n(X¯ µ) S 1(X¯ µ) χ2 − − ≈ p C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 50.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Few more comments
◮ Using S instead of Σ does not seriously effect approximation.
◮ n must be large relative to p; that is, (n p) is large. − ◮ The probability contours for X¯ are tighter than those for X since we have (1/n)Σ for X¯ rather than Σ for X. See next slide for an example of the latter.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 51.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Comparison of Probability Contours Returning to our example and pretending we have n = 20. Below are contours for 99%, 95%, 90%, 75%, 50% and 20%:
Contours for Xj Contours for X¯
b b
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 52.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Why So Much a Difference with Only 20?
For Xj
9 16 Σ = λ = 68.316 and λ = 4.684 16 64 −→ 1 2 For X¯ with n = 20 1 9 16 0.45 0.80 Σ = = λ = 3.42 and λ = 0.23 20 16 64 0.80 3.20 −→ 1 2
Note that 68.316/20 = 3.42 and 4.684/20 = 0.23.
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 53.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Other Multivariate Distributions: Skew-Normal
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 54.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Marshall-Olkin bivariate exponential
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 55.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others
Contours for 4 different ones
C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 56.1/ 56