3. the Multivariate Normal Distribution 3.1 Introduction

3. The Multivariate Normal Distribution 3.1 Introduction • A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis • While real data are never exactly multivariate normal, the normal density is often a useful approximation to the \true" population distribution because of a central limit effect. • One advantage of the multivariate normal distribution stems from the fact that it is mathematically tractable and \nice" results can be obtained. 1 To summarize, many real-world problems fall naturally within the framework of normal theory. The importance of the normal distribution rests on its dual role as both population model for certain natural phenomena and approximate sampling distribution for many statistics. 2 3.2 The Multivariate Normal density and Its Properties • Recall that the univariate normal distribution, with mean µ and variance σ2, has the probability density function 1 2 f(x) = p e−[(x−µ)/σ] =2 − 1 < x < 1 2πσ2 • The term x − µ2 = (x − µ)(σ2)−1(x − µ) σ • This can be generalized for p × 1 vector x of observations on serval variables as (x − µ)0Σ−1(x − µ) The p × 1 vector µ represents the expected value of the random vector X, and the p × p matrix Σ is the variance-covariance matrix of X. 3 0 • A p-dimensional normal density for the random vector X = [X1;X2;:::;Xp] has the form 1 0 −1 f(x) = e−(x−µ) Σ (x−µ)=2 (2π)p=2jΣj1=2 where −∞ < xi < 1; i = 1; 2; : : : ; p: We should denote this p-dimensional normal density by Np(µ; Σ): 4 Example 3.1 (Bivariate normal density) Let us evaluate the p = 2 variate normal density in terms of the individual parameters µ = E(X ); µ = 1 p p1 2 E(X2); σ11 = Var(X1); σ22 = Var(X2), and ρ12 = σ12=( σ11 σ22) = Corr(X1;X2): Result 3.1 If Σ is positive definite, so that Σ−1 exists, then 1 Σe = λe implies Σ−1e = e λ so (λ, e) is an eigenvalue-eigenvector pair for Σ corresponding to the pair (1/λ, e) for Σ−1. Also Σ−1 is positive definite. 5 6 Constant probability density contour = f all x such that (x − µ)0Σ−1(x − µ) = c2g = surface of an ellipsoid centered at µ. Contours of constant density for the p-dimensional normal distribution are ellipsoids defined by x such the that (x − µ)0Σ−1(x − µ) = c2 p These ellipsoids are centered at µ and have axes ±c λiei, where Σei = λi for i = 1; 2; : : : ; p. 7 Example 4.2 (Contours of the bivariate normal density) Obtain the axes of constant probability density contours for a bivariate normal distribution when σ11 = σ22 8 The solid ellipsoid of x values satisfying 0 −1 2 (x − µ) Σ (x − µ) ≤ χp(α) 2 has probability 1−α where χp(α) is the upper (100α)th percentile of a chi-square distribution with p degrees of freedom. 9 Additional Properties of the Multivariate Normal Distribution The following are true for a normal vector X having a multivariate normal distribution: 1. Linear combination of the components of X are normally distributed. 2. All subsets of the components of X have a (multivariate) normal distribution. 3. Zero covariance implies that the corresponding components are independently distributed. 4. The conditional distributions of the components are normal. 10 Result 3.2 If X is distributed as Np(µ; Σ), then any linear combination of 0 0 0 variables a X = a1X1 +a2X2 +···+apXp is distributed as N(a µ; a Σa). Also 0 0 0 if a X is distributed as N(a µ; a Σa) for every a, then X must be Np(µ; Σ): Example 3.3 (The distribution of a linear combination of the component of a normal random vector) Consider the linear combination a0X of a multivariate normal random vector determined by the choice a0 = [1; 0;:::; 0]: Result 3.3 If X is distributed as Np(µ; Σ), the q linear combinations 2 3 a11X1 + ··· + a1pXp 6 a21X1 + ··· + a2pXp 7 A(q×p)Xp×1 = 6 . 7 4 . 5 aq1X1 + ··· + aqpXp 0 are distributed as Nq(Aµ; AΣA ). Also Xp×1 + dp×1, where d is a vector of constants, is distributed as Np(µ + d; Σ). 11 Example 3.4 (The distribution of two linear combinations of the components of a normal random vector) For X distributed as N3(µ; Σ), find the distribution of 2 3 X1 X1 − X2 1 −1 0 = 4 X2 5 = AX X2 − X3 0 1 −1 X3 12 Result 3.4 All subsets of X are normally distributed. If we respectively partition X, its mean vector µ, and its covariance matrix Σ as 2 3 2 3 X1 µ1 6 (q × 1) 7 6 (q × 1) 7 6 7 6 7 X(p×1) = 6 ······ 7 µ = 6 ······ 7 6 7 (p×1) 6 7 4 X2 5 4 µ2 5 (p − q) × 1 (p − q) × 1 and 2 3 Σ11 Σ12 6 (q × 1) (q × (p − q)) 7 6 7 Σ(p×p) = 6 ············ 7 6 7 4 Σ21 Σ22 5 ((p − q) × q) ((p − q) × (p − q)) then X1 is distributed as Nq(µ1; Σ11). Example 3.5 (The distribution of a subset of a normal random vector) 0 If X is distributed as N5(µ; Σ), find the distribution of [X2;X4] . 13 Result 3.5 (a) If X1 and X2 are independent, then Cov(X1; X2) = 0, a q1 × q2 matrix of zeros, where X1 is q1 × 1 random vector and X2 is q2 × 1. random vector X1 µ1 Σ11 Σ12 (b) If is Nq1+q2 ; , then X1 and X2 are X2 µ2 Σ21 Σ22 independent if and only if Σ12 = Σ21 = 0. (c) If X1 and X2 are independent and are distributed as Nq (µ ; Σ11) 1 1 X1 and Nq2(µ2; Σ22), respectively, then has the multivariate normal X2 distribution µ1 Σ11 0 Nq1+q2 ; µ2 0 Σ22 14 Example 3.6 (The equivalence of zero covariance and independence for normal variables) Let X3×1 be N3(µ; Σ) with 2 4 1 0 3 Σ = 4 1 3 0 5 0 0 2 Are X1 and X2 independent ? What about (X1;X2) and X3 ? X1 µ1 Result 3.6 Let X = be distributed as Np(µ; Σ) with , Σ = X2 µ2 Σ11 Σ12 , and jΣ22j > 0. Then the conditional distribution of X1, given Σ21 Σ22 that X2 = x2 is normal and has −1 Mean = µ1 + Σ12Σ22 (x2 − µ2) and −1 Covariance = Σ11 − Σ12Σ22 Σ21 Note that the covariance does not depend on the value x2 of the conditioning variable. 15 Example 3.7 (The conditional density of a bivariate normal distribution) Obtain the conditional density of X1, give that X2 = x2 for any bivariate distribution. Result 3.7 Let X be distributed as Np(µ; Σ) with jΣj > 0. Then 0 −1 2 2 (a) (X − µ) Σ (X − µ) is distributed as χp, where χp denotes the chi-square distribution with p degrees of freedom. (b) The Np(µ; Σ)distribution assign probability 1 − α to the solid ellipsoid 0 −1 2 2 fx :(x − µ) Σ (x − µ) ≤ χp(α)g, where χp(α) denote the upper (100α)th 2 percentile of the χp distribution. 16 Result 3.8 Let X1; X2;:::; Xn be mutually independent with Xj distributed as Np(µj; Σ). (Note that each Xj has the same covariance matrix Σ.) Then V1 = c1X1 + c2X2 + ··· + cnXn n n ! P P 2 is distributed as Np cjµj; ( cj )Σ . Moreover, V1 and V2 = b1X1 + j=1 j=1 b2X2 + ··· + bnXn are jointly multivariate normal with covariance matrix 2 n 3 P 2 0 ( cj )Σ b cΣ 6 j=1 7 6 n 7 4 0 P 2 5 b cΣ21 ( bj )Σ j=1 n 0 P Consequently, V1 and V2 are independent if b c = cjbj = 0. j=1 17 Example 3.8 (Linear combinations of random vectors) Let X1; X2; X3 and X4 be independent and identically distributed 3 × 1 random vectors with 2 3 3 2 3 −1 1 3 µ = 4 −1 5 and Σ = 4 −1 1 0 5 1 1 0 2 0 (a) find the mean and variance of the linear combination a X1 of the three 0 components of X1 where a = [a1 a2 a3] . (b) Consider two linear combinations of random vectors 1 1 1 1 X + X + X + X 2 1 2 2 2 3 2 4 and X1 + X2 + X3 − 3X4: Find the mean vector and covariance matrix for each linear combination of vectors and also the covariance between them. 18 3.3 Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation The Multivariate Normal Likelihood • Joint density function of all p × 1 observed random vectors X1; X2;:::; Xn Joint density of X1; X2;:::; Xn n Y 1 0 −1 = e−(xj−µ) Σ (xj−µ)=2 (2π)p=2jΣj1=2 j=1 n P 0 −1 1 − (xj−µ) Σ (xj−µ)=2 = e j=1 (2π)np=2jΣjn=2 " n !#. −1 P ¯ ¯ 0 ¯ ¯ 0 1 −tr Σ (xj−x)(xj−x) +n(x−µ)(x−µ) 2 = e j=1 (2π)np=2jΣjn=2 19 • Likelihood When the numerical values of the observations become available, they may be substituted for the xj in the equation above. The resulting expression, now considered as a function of µ and Σ for the fixed set of observations x1; x2;:::; xn, is called the likelihood. • Maximum likelihood estimation One meaning of best is to select the parameter values that maximize the joint density evaluated at the observations.

Load more