Multivariate Distribution

MULTIVARIATE NORMAL DENSITY FUNCTION 95 CHAPTER 4 The Multivariate Normal Distribution Figure 4.1 The normal density curve. 4.I MULTIVARIATE NORMAL DENSITY FUNCTION y is N(¡r, ø2¡. This function is represented by the familiar bell-shaped curve Many univariate tests and confidence intervals are based on the univariate nor- illustrated in Figure 4.1 for F = 10 and o =2.5. mal distribution. Similarly, the vast majority of multivariate procedures have as their underpinning the multivariate normal distribution. 4.1.2 Multivariate Normal Density The following are some of the useful features of the multivariate normal distribution: (l) only means, variances, and covariances need be estimated in order If y has a multivariate normal distribution with mean vector p and covariance to completely describe the distribution; (2) bivariate plots show linear trends; matrix I, the density is given by (¡) lf ttre variables are uncorrelated, they are independent; (4) linear functions of multivariate normal variables are also normal; (5) as in the univariate case, 1 the convenient form of the density function lends itself to derivation of many g(y) = r-(t-v)'2-t(t-v)/2, (4.2) properties and test statistics; and (6) even when the data are not multivariate àrr* normal, the multivariable normal may serve as a useful approximation' espe- cially in inferences involving sample mean vectors, which are approximately where p is the number of variables. When y has the density (4.2), we say that normal by the central limit theorem (see Section 4.3.2). y is distributed as Np(¡-r,X), or simply y is Nr(p,I). Since the multivariate normal density is an extension of the univariate normal The term (y tù2 /oz = (y tù@2)-t(y p) in the exponent of the univari- density - - - density and shares many of its features, we review the univariate normal ate normal density (4.1) measures the squared distance fromy to p in standard function in Section 4.1.1. We then describe the multivariate normal density in deviation units. Similarly, the term (y- p)'I-'(y- p) in the exponent of the Sections 4.1.24.1.4. multivariate normal density (4.2) is the squared generalized distance from y to ¡r., or the Mahalanobis distance, 4.1.1 Univariate Normal Density If a random variable y, with mean p and variance o2, is normally distributed, a2=(y-t¡)'ã-r(y-p). (4.3) its density is given by The characteristics of this distance between y and ¡r. were discussed in Section 1 f(y) = -æ < ) < æ (4.1) 3.12. -;--En-0lù2/zo2\/ ¿1t\/ o. In the coeffic;þnt of the exponential function in (4.2), lãlt/2 appears as the analogue of Jo2 in (4.1). Inihe next section, *" d'ir"ut, the effååt of lll on the density. When y has the density (4.1), we say that y is distributed as N(¡.r, o2), or simply 94 97 MULTIVARIATE NORMAL DENSITY FUNCTION 96 THE MULTIVARIATE NORMAL DISTRIBUTION 4.1,3 Generalized Population Variance In Section 3.10, we referred to lsl as a generalized_sample variance. _Analo- a generalized populoiion variance' If o2 is small in the univariate gousty, f If is io.*ái,'tt" y uãIu", are concänÍated near the mean. Similarly, a small value of lIl in thé multivariate case indicates that the y's are concentrated close to p in p_rpu." or that there is multicollinearity among the variables.- The term which mutt¡ioti¡neariÐ, indicates that the variables are highly intercorrelated'.in -1 -l a discussion case the effective dimensionality is less than p. (See Chapter 72 for the a reduced number of nr* dimensions that represent the data.) In of finding -21 -2L one or more eigenvaìues of I will be near zero 012 -¿ -1 012 prrr"n."-of multicollinearity, -2 -1 (b) (o) large I by (2.99). small I l: än¿ will be sma¡, sincé lll is the product of the eigenvalues, l: lll with figui" 4.2 shows, for the bivariate case, a comparison of a distribution Figure 4.3 Contour plots for the distributions in Figure 4'2' the smalil;l and a distribution with larger lå1. An alternative way to portray .on..ntråtion of points in the bivariate nonnal distribution is with contour plots' Figure 4.2. Each Figure 4.3 shows contour plots for the two distributions in (or multivariate-obser- univariate case. Because it is not as simple to order rank) ellìpse contains a different proportion of observation vectors y. The contours pro- vation vectors as it is univariate observìtions, not as many nonparametric in Ëigure 4.3 can be found Uy setting the density function equal to a constant and available for multivariate data' in Figuie 4.4.The bivariate normal density surface cedures are solviig for y, as illustrated be exactly multivariate normal, the multivari- a given proportion while real data may not often slicedãt a constant height traceian ellipse, which contains to the true distribu- ate normal will frequently serve as a useful approximation the observations. of our focus on the multivariate normal are the availability appears on the lefl and large l>)l tion. Other reasons lor In both Figures 4.2 and 4.3, small lll normality (see Sections 4-4 and y1 oi t.ut and graphical procedures for assessing appears on th; right. In Figure 4.3a, there is a larger conelâtion between normal in ¿.S) an¿ tfreïiãespreád use of procedures based on the multivariate unä yr. In Figure 4.3å, the variances are larger (in the natural directions)' In packages. Fortunately, many of the procedures based on multivariate g.n".ãt, for any number of variables p, a decrease in intercorrelations among ,o¡í*u." normality are robust to departures from normality' t-he variables oi an increase in the variances will lead to a larger lll. 4.1.4 Diversity of Applications of the Multivariate Normal f07, trl Nearly all the inferential procedures we discuss in this book are based on the multivariate normal distribution. We acknowledge that a major motivation for the widespread use of the multivariate normal is its mathematical tractability' From the multivariate noÍnal assumption, a host of useful procedures can be derived. Practical alternatives to the multivariate normal are fewer than in the (û) (à) larse I small lX I l: Figure 4.4 Constant density contour for bivariate normal Figure 4.2 Bivariate normal densities I 99 OF MULTIVARIATE NORMAL RANDOIÚ VARIABLES 98 THE MULTIVARIATE NORMAL DISTRIBUTION PROPERTIES multivariate normal distribution' with a. Any subset of the y's in y has a 4.2 PROPERTIES OF MULTIVARIATE NORN{AL RANDOM subvector of ¡-rand covari- mean vector .onrrri.lig'or"th. co.r.sponding VARIABLES submatrix E' To illus- ance matrix .o*pottã of tott"iponding .of 'nt containing the first r i.^i", 1", v\ = Oi,i," ',y,) denote ihe subvector we list some of the properties of a random p x I vectof y from a multivariate of the remaining p r ele- elements ot y una"yi= 1¡i,.r, " ',yp) consist - normal distribution No(U., I): ments- Thus y, p' and X are partitioned as 1. Normality of linear combinations of the variables in y a' If a is a vector of constants' the linear function a'y = atyt * a2!2 "l "' * aryo is distributed as N(a'p,a'Ia). The mean and variance of a'y were ,=(î;) ,,=(î;) i;)' giuån pr"uiously in (3'64) ãnd (3.65) as E(a'y) = a'F' and.var(a'y) = a'Ea '=(i;ì tor uny random vector y. we now have the additional attribute that a'y has is rx r' Then y' is distributed as a (univariare) normal distribution if y is Nn(p, I). This is a fundamental where y, and p1 are rx I and Ërl and cov(yr) = Irr hold for any result, since we will often deal with linear combinations' ¡¿,fp',ï',).Hlå,-again, E(yr) = ¡r', in'ittit *ày' nut if y is ¡r-variate normal' then b. If A is a constant 4xp matrix of rank q, where q S p, lhen Ay con- random vector partiii;n"á sists of q linear combinations of the variables in y, with distribution y, is r-variate normal' result' each in y has the univariate nor- lvo(Ar, Å>.1'1. Here, again, E(Ay) = A¡r and cov(Ay) = AËA', in gen- b. As a special case of the above )¡ is not true' If the density (3.69). we now have the addìtional feature mal distribution N1¡',r, o¡¿)' The converse of this ur given in (3.68) and But y is multi- ".ä1, à1, it does not necessarily foìlow that that the 4 variables in Ay have a multivariate normal distribution. of each )i in y it not 2. Standardized variables variate normal. If y is Nu(t-t,2), a standardized vector z can be obtained in two ways: Inthenextthreeproperties,letlheobservationvectorbepartitionedin.totwo let x represent_some additional (4.4) subvector-s denoted ny y aná x. or, altematively, z=(T')-t(y-p), y' Then' as in (3'43) and (3'44)' variables to be considetJ ulong ;ith those in where å = T'T is factored using the Cholesky procedure in Section 2'7, or .""(l) =(i; z=(2t/2¡-r(Y-P), (4.5) '(l) =(Ë;) i:) *here trl/2 is the symmetric square root matrix of I defined in (2.103) such 5. Independence (4.4) (4.5), follows from property lb that z is that I = >t/2>L/2.In either or it y and x are independent if år'" = O'.

Load more