<<

MULTIVARIATE NORMAL DENSITY FUNCTION 95

CHAPTER 4

The Multivariate

Figure 4.1 The normal density curve.

4.I MULTIVARIATE NORMAL DENSITY FUNCTION y is N(¡r, ø2¡. This function is represented by the familiar bell-shaped curve Many univariate tests and confidence intervals are based on the univariate nor- illustrated in Figure 4.1 for F = 10 and o =2.5. mal distribution. Similarly, the vast majority of multivariate procedures have as their underpinning the multivariate normal distribution. 4.1.2 Multivariate Normal Density The following are some of the useful features of the multivariate normal dis- tribution: (l) only , , and need be estimated in order If y has a multivariate normal distribution with vector p and to completely describe the distribution; (2) bivariate plots show linear trends; matrix I, the density is given by (¡) lf ttre variables are uncorrelated, they are independent; (4) linear functions of multivariate normal variables are also normal; (5) as in the univariate case, 1 the convenient form of the density function lends itself to derivation of many g(y) = r-(t-v)'2-t(t-v)/2, (4.2) properties and test ; and (6) even when the are not multivariate àrr* normal, the multivariable normal may serve as a useful approximation' espe- cially in inferences involving sample mean vectors, which are approximately where p is the number of variables. When y has the density (4.2), we say that normal by the (see Section 4.3.2). y is distributed as Np(¡-r,X), or simply y is Nr(p,I). Since the multivariate normal density is an extension of the univariate normal The term (y tù2 /oz = (y tù@2)-t(y p) in the exponent of the univari- density - - - density and shares many of its features, we review the univariate normal ate normal density (4.1) measures the squared distance fromy to p in standard function in Section 4.1.1. We then describe the multivariate normal density in deviation units. Similarly, the term (y- p)'I-'(y- p) in the exponent of the Sections 4.1.24.1.4. multivariate normal density (4.2) is the squared generalized distance from y to ¡r., or the Mahalanobis distance, 4.1.1 Univariate Normal Density If a y, with mean p and o2, is normally distributed, a2=(y-t¡)'ã-r(y-p). (4.3) its density is given by

The characteristics of this distance between y and ¡r. were discussed in Section 1 f(y) = -æ < ) < æ (4.1) 3.12. -;--En-0lù2/zo2\/ ¿1t\/ o. In the coeffic;þnt of the exponential function in (4.2), lãlt/2 appears as the analogue of Jo2 in (4.1). Inihe next section, *" d'ir"ut, the effååt of lll on the density. When y has the density (4.1), we say that y is distributed as N(¡.r, o2), or simply

94 97 MULTIVARIATE NORMAL DENSITY FUNCTION 96 THE MULTIVARIATE NORMAL DISTRIBUTION

4.1,3 Generalized Population Variance In Section 3.10, we referred to lsl as a generalized_sample variance. _Analo- a generalized populoiion variance' If o2 is small in the univariate gousty, f If is io.*ái,'tt" y uãIu", are concänÍated near the mean. Similarly, a small value of lIl in thé multivariate case indicates that the y's are concentrated close to p in p_rpu." or that there is multicollinearity among the variables.- The term which mutt¡ioti¡neariÐ, indicates that the variables are highly intercorrelated'.in -1 -l a discussion case the effective dimensionality is less than p. (See Chapter 72 for the a reduced number of nr* dimensions that represent the data.) In of finding -21 -2L one or more eigenvaìues of I will be near zero 012 -¿ -1 012 prrr"n."-of multicollinearity, -2 -1 (b) (o) large I by (2.99). small I l: än¿ will be sma¡, sincé lll is the product of the eigenvalues, l: lll with figui" 4.2 shows, for the bivariate case, a comparison of a distribution Figure 4.3 Contour plots for the distributions in Figure 4'2' the smalil;l and a distribution with larger lå1. An alternative way to portray .on..ntråtion of points in the bivariate nonnal distribution is with contour plots' Figure 4.2. Each Figure 4.3 shows contour plots for the two distributions in (or multivariate-obser- univariate case. Because it is not as simple to order rank) ellìpse contains a different proportion of observation vectors y. The contours pro- vation vectors as it is univariate observìtions, not as many nonparametric in Ëigure 4.3 can be found Uy setting the density function equal to a constant and available for multivariate data' in Figuie 4.4.The bivariate normal density surface cedures are solviig for y, as illustrated be exactly multivariate normal, the multivari- a given proportion while real data may not often slicedãt a constant height traceian ellipse, which contains to the true distribu- ate normal will frequently serve as a useful approximation the observations. of our focus on the multivariate normal are the availability appears on the lefl and large l>)l tion. Other reasons lor In both Figures 4.2 and 4.3, small lll normality (see Sections 4-4 and y1 oi t.ut and graphical procedures for assessing appears on th; right. In Figure 4.3a, there is a larger conelâtion between normal in ¿.S) an¿ tfreïiãespreád use of procedures based on the multivariate unä yr. In Figure 4.3å, the variances are larger (in the natural directions)' In packages. Fortunately, many of the procedures based on multivariate g.n".ãt, for any number of variables p, a decrease in intercorrelations among ,o¡í*u." normality are robust to departures from normality' t-he variables oi an increase in the variances will lead to a larger lll.

4.1.4 Diversity of Applications of the Multivariate Normal f07, trl Nearly all the inferential procedures we discuss in this book are based on the multivariate normal distribution. We acknowledge that a major motivation for the widespread use of the multivariate normal is its mathematical tractability' From the multivariate noÍnal assumption, a host of useful procedures can be derived. Practical alternatives to the multivariate normal are fewer than in the

(û) (à) larse I small lX I l: Figure 4.4 Constant density contour for bivariate normal Figure 4.2 Bivariate normal densities I

99 OF MULTIVARIATE NORMAL RANDOIÚ VARIABLES 98 THE MULTIVARIATE NORMAL DISTRIBUTION PROPERTIES multivariate normal distribution' with a. Any subset of the y's in y has a 4.2 PROPERTIES OF MULTIVARIATE NORN{AL RANDOM subvector of ¡-rand covari- mean vector .onrrri.lig'or"th. co.r.sponding VARIABLES submatrix E' To illus- ance matrix .o*pottã of tott"iponding .of 'nt containing the first r i.^i", 1", v\ = Oi,i," ',y,) denote ihe subvector we list some of the properties of a random p x I vectof y from a multivariate of the remaining p r ele- elements ot y una"yi= 1¡i,.r, " ',yp) consist - normal distribution No(U., I): ments- Thus y, p' and X are partitioned as 1. Normality of linear combinations of the variables in y a' If a is a vector of constants' the linear function a'y = atyt * a2!2 "l "' * aryo is distributed as N(a'p,a'Ia). The mean and variance of a'y were ,=(î;) ,,=(î;) i;)' giuån pr"uiously in (3'64) ãnd (3.65) as E(a'y) = a'F' and.var(a'y) = a'Ea '=(i;ì tor uny random vector y. we now have the additional attribute that a'y has is rx r' Then y' is distributed as a (univariare) normal distribution if y is Nn(p, I). This is a fundamental where y, and p1 are rx I and Ërl and cov(yr) = Irr hold for any result, since we will often deal with linear combinations' ¡¿,fp',ï',).Hlå,-again, E(yr) = ¡r', in'ittit *ày' nut if y is ¡r-variate normal' then b. If A is a constant 4xp matrix of rank q, where q S p, lhen Ay con- random vector partiii;n"á sists of q linear combinations of the variables in y, with distribution y, is r-variate normal' result' each in y has the univariate nor- lvo(Ar, Å>.1'1. Here, again, E(Ay) = A¡r and cov(Ay) = AËA', in gen- b. As a special case of the above )¡ is not true' If the density (3.69). we now have the addìtional feature mal distribution N1¡',r, o¡¿)' The converse of this ur given in (3.68) and But y is multi- ".ä1, à1, it does not necessarily foìlow that that the 4 variables in Ay have a multivariate normal distribution. of each )i in y it not 2. Standardized variables variate normal. If y is Nu(t-t,2), a standardized vector z can be obtained in two ways: Inthenextthreeproperties,letlheobservationvectorbepartitionedin.totwo let x represent_some additional (4.4) subvector-s denoted ny y aná x. or, altematively, z=(T')-t(y-p), y' Then' as in (3'43) and (3'44)' variables to be considetJ ulong ;ith those in where å = T'T is factored using the Cholesky procedure in Section 2'7, or .""(l) =(i; z=(2t/2¡-r(Y-P), (4.5) '(l) =(Ë;) i:) *here trl/2 is the symmetric square root matrix of I defined in (2.103) such 5. Independence (4.4) (4.5), follows from property lb that z is that I = >t/2>L/2.In either or it y and x are independent if år'" = O'. a. The subvectors ^ as Nu(O, I); that is, the z's are independently distributed as N(0' 1). distributed b. Two individual variables )¡ and ); are independent if ot = 0' standardized vector of random variables has Thus in the muÍtivariate case, a Notethatthisisnortrueformanynonnormalrandomvariables'asillus- means equal to 0, all variances equal 1o 1, and all correlations equal to all trated in Section 3.2.1. 0. 6. Conditional distribution 3. Chi-square distribution Ifyandxarenotindependent,then!-,'..1O'andtheconditionaldistribution A chí-square random variable with p degrees of freedom is defined as the of y given x,.f$lx), is multivariate normal with sum of squares ofp independent standard normal random variables. Thus if z is the standardized veciordefined in (4.4) or (4.5), then f,?=¡ z? =z'zhas (4.1) the x2-distribution with p degrees of freedom,,denoted x3 o, xz(p). From E(ylx) = pr, + lr.,E,.l(x - P^) either (4.4) or (4.5) we obtain z'z = (! - p)'I-' (y - ¡r.). Hence and IfyisN,,(Þ,I), then(y - ¡r)'I-r(y - p)isx3. (4.6) cov(ylx) - !.r) - tr,*Il.lE,. (4.8) 4. Normality of marginal distributions 100 TTTE MULTIVARIATE NORMAL DISTRIBIJTION PROPERTIES OF MULTIVARIATE NORMAL RANDOM VARIABLES 101

Note that E(ylx) is a linear function of x, while cov(ylx) does not depend cov(¡, z), we express x and z as functions of u, on x. The linear trend in (4.7) extends to any pair of variables. Thus to use (4.7) as a check on normality, one can examine bivariate scatter plots of all (4."7), we have the pairs of variables and look for any nonlinear trends. In ¡ = (0,t,(i) = (o,r)u = a'u justification for using the covariance or correlation to measure the relation- ship between two bivariate normal random variables. As noted in Section z=y-px=(|,-B)u=b'u. 3.2.1, the covariance and correlation are good measures of relationship only for variables with linear trends and are generally unsuitable for nonnormal random variabìes with a curvilinear relationship. The matrix l'.!.j in @.7) Now is called lhe matrix of regression cofficients because it relates E(ylx) to x' The sample counterpart of this matrix appears in (10.37). cov(x, Z) = cov(a'u, b'u) 7. Distribution of the sum of two subvectors a'Ib tby (3.66)l If y and x are the same size (both p x l) and independent, then = = (0,,)(ol - 6,,,o1)( y + x is No(F..,, + 1r.,, å¡, + X.,..) (4.e) ?í)(-'u) i) = øy, - ßo1. (4.r0) y - x is Np(py - ¡r..,E-,._', + X.r.,). Since cov(x, z) = 0, we obtain p = orr/o1 and y - px becomes Here, again, the mean vector and for y + x hold in general. But if y and x are multivariate normal. then y + x is multivariate normal. (J vx ,_ o?*. To illustrate property 6, we discuss the conditional distribution for the bivari- ate normal. Let By property la above, the density of y - (or*f ol)x is normal with

'=(i) õYt u(, -1.)= þv't; - "þx where

and E(u) = (r) cov(u) = ! = (,;,';ù b'Xb - == var(b'u) = By definition f(ylx)= g(y,x)/h(x), where g(y,x) is the joint density of y and "u,(, #ù x and h(x) is the density of x. Hence (,,-;r)G;.";t)( s(y,x) = f(ylx)h(x) r) ¡ ovx = oi - --;. and because the right side is a product, we seek a function of y and x that is 'o; independent ofx and whose density can serve as/(ylx). Since linear functions of 1' ¿n¿ ¡ are normal by property la above, we consider y - Sx and seek the For a given value of x,y can be expressed as y - Bx+(y- 0x), where 0¡ is value of Ê so that y - px and x are independent. a fixed quantity corresponding to the given value of.x and y - px is a random Since e = y - ßx and -r are normal and independent, cov(x, z) = 0. To find deviation. Then/(ylx) is normal, with 102 THE MULTIVARIATE NORMAL DISTRIBUTION ESTIMATION IN THE MULTIVARIATE NORMAL 103

E(ylx) = px + E(y - ßx) ßx+ þy' FP' otx . L(yt,yz,. - . ,Yt = ,f(r,, rr, x) + ,Þ,I) fl =þv* ß(x - tt,) = lty --;oi \x - l-tr) ÊI o3r r-$i- v)''-t (t i- v) tz var(yl,r) - øj ol' = n å ru* | ¿-2'!=r0 i-ù'2-

4.3 ESTIMATION IN THE MULTIVARIATE NORMAL To see that p = i maximizes the , we write the exponent of (4.13) in a different form. By adding and subtracting i, the exponent in (4.13) 4.3.1 Maximum Likelihood Estimation becomes When a distribution such as the multivariate normal is assumed to hold for a population, estimates of the are often found by the method of maximum likelihood. This technique is conceptually simple: The observation :>(Y¡ - Y + Y - P)'l-r(Y¡ - Y + i - P)' vectors yt,!2,...,yn ate considered to be known and values of p and E are i=l sought that maximize the joint density of the y's, called the likelihood fi,mctíon. For the multivariate normal, the maximum likelihood estimates of p and I are When this is expanded in terms of Y¡ - J and ] - p, two of the four resulting terms vanish because I¡(y¡ - Ð = 0, and (4.13) becomes

I 2- | (v n(i 2- | (i (4.11) L ¿- Z!=r(t i-Ð' ¡-i)/2- - tt\' - tL)/2 (4.t4) lr=i = (t/2t)'nlll'/z -- - positive definite, p)'>-tff I 0 and 0 < and Since I-r. is -n(y - - ù/2 e-n(i-t)'2-'(j-p) ( l, with the maximum occurring when the exponent is 0. Therefore, 'L is maximized when ¡i = y. The maximum likelihood estimator of the population conelation matrix is the sample correlation matrix, that is, Ê =1Ë,r,- Ð(y¡ - D' j=l Ê'=R' sav =1wn Relationships among multinormal variables are linear, as can be seen in (4.7). = t, Ø.tz) Thus the estimators S and R serve well for the multivariate normal because they # measure only linear relationships (see Sections 3.2.1 and 4.2).They are not as useful for some nonnormal distributions. where S is the sample covariance matrix defined in (3.20) and (3.25). Since I 4.3.2 Distribution of and S has diviso^r n instead of n- 7, it is biased lsee (3.31)], and we usually use S in i place of I. For the distribution of ] = fT , y,f n, we can distinguish two cases. \We now give a justification of J as the maximum likelihood estimator of p. Because the y,'s constitute a random sample, they are independent, and the (a) V/hen j is based on a random sample ! t,!2, .. . , y, from a multivariate joint density is the product of the densities of the y's. The likelihood function normal distribution No(¡r.,1), y is No(u.,I/n). is, therefore, (b) When ! is based on a random sample !t,y2,...,],, from a nonnormal 105 IO4 THE MULIIVARIATE NORMAL DISTRIBUTION ASSESSING MULTIVARIATE NORMALITY

multivariate population with mean vector p, and covariance matrix X, 4.4 ASSESSING MUI.'TIVARIATE NORMALITY for large n, y is approximately No(V,2/n). More formally, this result is procedures have been suggested for evaluating known as the multivariate central limit theorem: If J is the mean vector Many tests and graphical vector p whether a data setlikãly originated from a multivariate normal population. One of a random sample Y t,Yz, .. . , yn from a population with mean for univariate normality. Excel- and covariance matrix Ë, then as /t -) -, the distribution of ,/n(j * tt) possibility is to check each variable separately cases have been given by approaches Np(O, >). ient reviJws for both the univariate and multivariate Gnanadesikan (1977, pp. 161-195) and Seber (1984, pp. 14l-155)' We give a representative sample of univariate and multivariate methods in Sections 4.4.1 There are p variances in S and ( covariances, for a total of !) and 4.4.2, respectively.

p+ (Ð =p+p(p-r)/2=p(p+l)/2 4,4.1 Investigating Univariate Normality When we have several variables, checking each for univariate normality should and (2) nor- distinct entries. The joint distribution of these p(p + l)/2 distinct variables in not be the sole approach, because (1) the variables are correlated not guarantee joint normality. on the W = (n l)S = Ð(y¡ - is the Wishart distribution, denoted by mality of the individual variables does - l¡Oi - f)' individual normality. Wr(n- 1,X), where n - I is the degrees of freedom. other hand, it is true that multivariate normality implies normal, the vector is not The Wishart distribution is the multivariate analogue of the x2-distribution, Hence if even one of the separate variables is not variables may therefore and it has similar uses. As noted in property 3 of Section 4.2, a yz random vari- multivariate normal. An initial check on the individual able is defined formally as the sum of squares of independent standard normal be useful. plot that com- (univariate) random variables: A basic graphical approach for checking normality is the Q-Q pares of a sample against the population quantiles of the univariate normal. If the points are close to a straight line, there is no indication of depar- (Y' --P)3 ture from normality. Deviation from a straight line indicates nonnormality (at t? is pattern may reveal the FL¿ "t = Z-Jf -2 x,' least for a large sample). In fact, the type of nonlinear Èl i=l type of departure from normality. Some possibilities are illustrated in Figure 4.5. are similar to the more familiar percentiles, which are expressed If ! is substituted for ¡.r, then L(yr - y)z /o2 = (n - l)sz /o2 is x3-,. Similarly, Quantiles the formal definition of a V/ishart random variable is in terms of percent; a test score at the 90th percentile, for example, is above 907o of the test scores and below l\Vo of them. Quantiles are expressed in terms of fractions or proportions. Thus the 90th percentile score becomes the 0.9 score. - pXv¡ - p)' is wr(n,Z), (4.1s) The sample quantiles for the Q-Q plot are obtained as follows. First å,r, we rank the observations )1,)2,...,)¿ ând denote the ordered values by y(\,y(z),.",)0,)i thus y1t¡ S y1z¡ I "' S y(n). Then the point y1¡ is the i/n quantile, where y,, yr, . . .,yn are independently distributed as Np(F, I). When y is sub- rà.pi" quantiie. For example, if n = 20, y1z¡ is the * = 'lS because stituted for p, the distribution remains Wishart with one less degree of freedom: .35 of the sample is less than or equal to y1z¡. The fraction if n is often changed to (;- )) /n as a continuity correction. If n =20, (¡- å) fn ranges from.025 to .975 and more evenly covers the interval from 0 to 1. \ryith this convention, y1¡ is designated as the (, - /n sample quantile. (r¡ l)S * wr(n 1,2). (4.l6) å) - = | {v, Ð(y¡ - Ð' is - The population quantiles for the Q-Q plot are similarly defined correspond- Èl ing to ( i - \) /n.If we denote these by et,Qz,...,qn,then 4¡ is the value below is, which a prõportion ( t - å) f n of the observations in the population lie, that Finally, we note that when from a multivariate normal distribution, ( i - *) /n is the probabiliiy of getting an observation less than or equal to q¡. J and S are independent. Formãlly, qi canbe found for the standard normal random variable y with dis- 106 THE MULTIVARIATE NORMAL DISTRIBUTION ASSESSING MULTIVARIATE NORMALITY IO7

The population need not have the same mean and variance as the sample, since changes in mean and variance merely change the slope and intercept ofthe plotted line in the Q-Q plot. Therefore, we use the standard normal distribution, q, values can easily be found from a table of cumulative standard normal Quantiles of and the a distribution probabilities. We then plot the pairs (q¡, y1¡) and examine the resulting Q-Q plot with heavier for linearitY. tails than the normal Special graph pape¡ called normal probability paper, is available that elim- inatés the need to look up the qi values. We need only plot ( , - +) f n in place of q¡, that is, plot the pairs [(; - å) /",yt¡] and look for linearityãs before. As an even easier alternative, most general-purpose statistical software programs Quantiles of the normal now routinely provide normal probability plots of the pairs (q¡,yç). The Q-Q plots provide a good visual check on normality and are considered to be adequate for this purpose by many researchers. For those who desire a more objective procedure, several hypothesis tests are available. We give three of these that have good properties and are computationally tractabìe. Quantiles ol a distribution We discuss first a classical approach based on the following measures of with thinner and : tails than the normal

Ji>(y, - t)3 Quantiles of the normal Ël Ju, = (4.18) [å,,,-,r)

Quantiles of a positively and skewed distribution

,f 0,-Ðo bz= i=l (4.19) Quantiles of the normal Figure 4.5 Typical Q-Q plots for nonnormal data. [å,,,-Ð'] tribution N(0, 1) by solving These are sample estimates of the population skewness and kurtosis pârame- ters r,/0r and_pz, respectively. When the population is normal, 1/p1 = 0 and (Þ(q¡) i-i- ßz 3.lf < 0, we have negative skewness; > 0, the skewness = P(y < q¡) = (4.17) = t/h if JPt n' is positive. Positive skewness is illustrated in Figure 4.6.If Pz < 3, we have negative kurtosis, and if Éz > 3, there is positive kurtosis. A distribution with which would require numerical integration or tables of the cumulative standard negative kurtosis is characterized by being flatter than the normal distribution, normal distribution, iÞ(x). Another benefit of using (t - +) /ri instead of i/n is that is, less peaked, with heavier flanks and thinner tails. A distribution with that nf n = I would make 4n = *. positive kurtosis has a higher peak than the normal, with an excess of values 109 108 THE MULTIVARIATE NORMAL DISTRIBUTION ASSESSING MULTTVARIATE NORMALITY

Table 4.2, from D'Agostino and Pearson (1973), gives values for ð and 1/À. To use åz as a test of normality, we can use Table 4.3, obtained from D'Agostino 0.15 and Tietjen (1971), which gives simulated percentiles of b2 fot selected values of n in the 7 3 n <50. charts of percentiles of åz for 20 < n < 200 can be found in D'Agostino and Pearson (1973). 0.1 Our second test for normality was given by D'Agostino (1971)' The obser- (y(n),and we calculate vations !t,12,...,!, àte ordered as )q1¡ Sy(2) < "' 0-05

]ø* 'r],,, 246I10L2t4 D_ ã['- (4.22) Figure 4,6 A distribution with positive skewness. "z\O, - ù' near the mean and in the tails but with thinner flanks. Positive and negative kurtosis are illustrated in Figure 4.7. and The test of normality can be carried out using the exact percentage points for Jh tnTable 4.1 for 4 < n<25, as given by Mulholland (1977). Alternatively, for n 2 8 the function I as defined by . JÃtn - eJõ-\ (4.23) ' - .02998598

"6 (4.20) s(Jb) = ¿ tinh-'å- A table of percentiles for I, given by D'Agostino (1972) for 101n < 250, is provided in Table 4.4. The final test we report is by Lin and Mudholkar (1980). The test is approximately N(0, 1), where is

sinh-r¡=ln(x*J**l¡. (4.21)) 1+r z tanh-'r - *ln. , (4.24) = ' l-r

where r is the sample correlation of the r¡ pairs (y¡, x¡), i = 1,2,. . . , r't, with x¡ defined as

+lro (4.2s) Ln'

If the y's are normal, z is approximately N(0, 3/n). A more accurate upper l00cv Figure 4.7 Distributions with positive and negative kurtosis compared to the normal. percentile is given by 110 THE MULTIVARIATE NORMAL DISTR]BUTION ASSESSING MULTIVARIATE NORMALITY llr

D? = e, y)'S-'(y¡ - i). Ø.27) zs = o,lu" + fi(u3" - 3u)72,,1, (4.26) -

Gnanadesikan and Kettenring 0972) showed that if the y,'s are multivariate with normal, then

nD? , 3 't.324 53.005 (4.28) vn- a-' øo = iÞ-'(a) t)' n nZ n3 ln - I 55.06 1.70r- plot, the I¿N _ , has a beta distribution, which is related to the F. To obtain a 8-Q nn' valueSu¡,U?,'..'l,lttaferankedtogive\l¡

and (p+lXn+l)(n+3)^ (4.38) zt = EKn+1ll,7>Elu,P

ß2., = P¡1t P)'t-'(Y t.)12. (4.33) > - - is approximately yz with ln7 + l)(p + 2) degrees of freedom. Reject if zr large values (distribution x'nr. Wittt b2,p, onthe othei hand, we wish to reject for Since third-order central moments for the multivariate normal distribution are tää peaked¡'är small values (distribution too flat). For the upper 2.5Vo points zero, B7,o = 0 when y is N(p, I). It can also be shown that for multivariate of b2,p use normal y, (4.39) ßr,o=p@+2). (4.34)

If we define which is approximately N(0, l). For the lower 25Va points we have two cases: (a) when 50 < ¡¿ < 400, use

B¡j = (y¡ - y)'Ê-t$j - Y), (4.3s) -p(p+2)Qt+p+1)/n (4.40) where I = I¡(yi - i)0¡ - !)'/n is the maximum likelihood estimator (4.12), then sample estimates of and are given by h,p 82,, which is approximately N(0,1); (b) when n2400, use z2 as given by (4.39)' tests based on b1,p and b,r,! by Fortran programs for the above ,ut" -given programs for three additionaltests for Table 4.1 Comparison of Number of Siotani er aj. (i985). They alsoþrovide Subsets of Sizes 2 and 3 multivariate normality and several tests for univariate normality. Many of these programs are apparently unavailable elsewhere. The three multivariate tests are, , (r) (t) briefly, as follows: 6 15 20 1. A test based on the third and fourth central moments I 28 56 l0 45 120 * pù] (4.41) t2 66 220 E[(y¡ - pù0¡ - t'¡)(y* 15 t0s 455 and 115 tt4 THE MULTIVARIATE NORMAL DISTRIBUTION OUTLIERS

sensitive to extreme observations than are the standard estimators y ELO: - tr¡)b,¡ - tt¡)(y* - trt)Ot - ttt)l (4.42) are less and S. Under normality, (4.41) is zero and (4"42) is equal To o¡¡o¡¡'r o¡¡a¡¡ * o¡¡ø;¡. Estimates of (4.41) and (4.42) are obtained and compared to 0 and 4.5.1 Outliers in Univariate Samples stsr/ + .r¡¡s;¿ * s¡/Ðr, respectively. Beck- Excellent surveys of the useful literature on outliers have been given by I A multivariate generalization of the Shapiro-Wilk test: Define z¡ = We man ancl Cook (1983), Hawkins (1980), and Bamett and Lewis (1978)' c'y ¡, i = 1,2, . . ., n, where c is a constant vector, and have been absrract a few highlights from Beckman and Cook. Many techniques exper- proposed for detecting outliers in the residuals from regression, designed ll S.-.1 i**tr, and so on. But we will be concerned only with simple random samples a¡lzt¡t Ð' L - from the normal distribution. Outliers âre also known as discordant observa- from what was expected and I4z(c) = (4.43) tìons or cotxtanlinanîs, which impty a discrepancy population, respectively. z)' an origin from a nontarget )(zi - Thãre are two principal approaches for dealing with outliers. The first is i=l identfficarion, which usually involves deletion of the outlier(s) but may alter- nadv¿ly provide important information about the model or the data. The second where z1r¡ I z1:¡ I . " I zçr¡ are the ordered values of zr,¿2"...,2tt and the method involves accommodation, by modifying the method of analysis or the .Ii's are coefficients tabulated in Shapiro and Wilk (1965). The hypothesis model. Robust methods, in which the influence of outliers is reduced, are the of multivariate normality is accepted if most familiar example of modification of the analysis. An example of a correc- tion to the model is a mixture model that combines two normals with different max[]V(c)l > /<, (4.44) variânces, Sometimes used to accommodate contaminants. For example, Marks and Rao (1978) accommodated a particular type of outlier due to patient fatigue where k corresponds to the desired significance level, a. by a mixture of two normal distributions. 3. A directional : This test is based on an alternative defini- In small or moderate sized univariate samples, visual methods of identifying tion of the multivariate normal distribution suggested by properly la of outliers are the most frequently used. Tests are also available if a less subjective Section 4.2: lf a'y is N(a'p,a'Xa) for all a. then y is Np(F, I). First the approach is desired. proposed to account for outliers. data vectors !t,!2,....:,y, aÍe standardized by z¡ = (S'/t)-'(y, - Í), i = Two types of slippage models have been 1,2,...,12, where Sr/z is the square root matrix given in (2. 103). Then Under the mean slippag¿ model, all observations have the same variance, but (pop- each z¡ is multiplied by a direction vector do to obtain u¡ = dloz¡ i - one or more of the observations arise from a distribution with a different the observations 1,2,.. . ,n. The u's are approximately normal if the y's are multivariate ulation) mean. In the variance slippage model, one or more of normal. Several values of do are used to check for normality in different arise from a model with larger (population) variance but the same mean. Thus directions. Various univariate normal tests can be applied to the ¿r's. in the mean slippage model, the b;lk of the observations arise from N(¡r, o2), while the outliers originate from N(p + 0 , oz). For the variance slippage model, the main distribution would again be N(p, o2¡, with the outliers coming from 4.5 OUTLIERS N(p.,aoz) where a > L These models have led to the development of tests for rejection of outliers. We now briefly discuss some of these tests. The detection of outliers has been of concern to and other scien- For a single outlier, most tests are based on the maximum studentized resid- tists for over a century. Many authors have claimed that the researcher can ual, typically expect up to 107o of the observations to have enors in measurement or recording. Occasional stray observations from a different population than the target population are also fairly common. We do not attempt a complete I Ii-)'l (4.4s) ITIâX 7¡ = maxi I . summary of the vast literature covering univariate outliers, but we do review i lr I some major concepts and suggested procedures in Section 4.5.1 before moving -l to the multivariate case in Section 4.5.2. An alternative to detection of outliers If the largest or smallest observation is rejected, one could then examine the n- 1 is to use robust estimators of p and I (see Rencher 1997, Section l.l0) that remaining observations for another possible outlier, and so on. This procedure rt7 116 THE MULTIVARIATE NORMAL DISTRIBUTION OUTLIERS is called a consecutive /¿st. However, if there are two or more outliers, the less extreme ones will often make it difficult to detect the most extreme one, due to inflation of both mean and variance. This effect is called ntasking. Ferguson (1961) showed that the maximum studentized residual (4.45) is more powerful than most other techniques for detecting intermediate or large shifts in the mean and gave the following guidelines for small shifts:

1. For outliers with small positive shifts in the mean, tests based on sample skewness are best. 2. For outliers with small shifts in the mean in either direction, tests based on the sample kurtosis are best. 3. For outliers with small positive shifts in the variance, tests based on the sample kurtosis are best.

Figures 4'8 Bivariate sample showing three types of outliers' Because of the masking problem in consecutive tests, block tests have been proposed for simultaneous rejection of ft > 1 outliers. These tests work well if ft is known, but in practice, it is usually not known. If the value we conjecture (A third for ft is too small, we incur the risk of failing to detect any outliers because of turn out to be related to methods of assessing multivariate normality' first masking. If we set ft too large, there is a high risk of rejecting more outliers approach based on principal components is-given in Section 12'4') The Wilks' than there really are, an effect known as swamping. rå,no¿, due to Wilki (1963), is designed for detection of a single outlier. statistic is 4.5.2 Outliers in Multivariate Samples l(n - 2)S-¡l In the case of multivariate data, the problems in detecting outliers are intensified n = max -i¡, (4.46) for several reasons: J)Sl-,

1. For p > 2 the data cannot be readily plotted to pinpoint the outliers. where S is the usual sample covariance matrix and S-¡ is obtained from the same 2. Multivariate data cannot be ordered as can a univariate sample, where sample with the ith observation deleted. It turns out that n' can be expressed in extremes show up readily on either end. terms of D?,,1 = max¡(Y, - Ð'S-t(Y, - Ð as 3. An observation vector may have a large recording error in one of its com- ponents or smaller enors in several components. 4. A multivariate outlier may reflect slippage in mean, variance, or correla- W I. '-:----:-!1t"D?a (4.47) tion. This is illustrated in Figure 4.8. Observation I causes a small shift = - (n - t)' in means and variances of both )¡ and y2 but has little effect on the cor- relation. Observation 2 has little effect on means and variances, but it reduces the correlation somewhat. Observation 3 has a major effect on thus basing a tesr for an outlier on the distances Dl used in Section 4.4.2 in means, variances, and correlation. a graphicai procedure for checking multivariate normality. Table 4.6 gives the upp"i S und lEo critical values for Df,,¡ from Barnett and Lewis (1978). Of course, as in the univariate case, one approach to outlier identification Yang and Lee (1987) provide an F-test of w as given by (4-47).Defrne or accommodation is to use robust methods of estimation. Such methods mini- mize the influence of outliers in estimation or model fitting. However, an outlier sometimes fumishes valuable information, and the specific pursuit of outliers n-p-l I (4.48) can be very worthwhile. ¡r^ - | -l] , i=1,2,...,n. We present two methods of multivariate outlier identification, both of which pt t-nn?/Ø-t)2 119 118 THE MULTIVARIATE NORMAL DISTRIBUTION OUTLIERS

Then the F¡ are independently and identically distributed as Fp,n-p-t, and a test Table 4,2 Values of Df for the Ramus Bone can be constructed in terms of ffiâx¡ F¡: Data in lable 3.8

Observation Observation Number ü Number D?

1 0.7588 11 2.8301 r(maxrr , f 1- P(allF¡ -ì(y- p)l.By varying the function g, distributions with shorter or longer tails than the normal can be obtained. of course, the critical value of å2,p would have to be adjusted to o 0.2 0.4 0.6 0.8 I correspond to the distribution, but rejection for large values would be a locally ui best invariant test. Figure 4.9 Q-Q plot of a¡ and u¡ for the ramus bone data of Table 3.8. t2l 720 THE MULTIVARIATE NORMAL DISTRIBUTION PROBLEMS as 11.63. In our case, the largest n! is nl= 11.03, which does not exceed the PROBLEMS critical value. This does not surprise us, since the test was designed to detect a single outlier, and we may have at least three. 4.1 Consider the two covariance matrices We next calculate b¡p and b2,p as given by (4.36) and (4.37): /14 8 3\ t6 6 t'=(:; ¡"={e a bt,p = 11.338 bz,p = 28.884. i) \r 2 ?)

Show that låzl > and that tr(Iz) < tr(år)' Thus the generalized In Table 4.5, the upper .01 critical value for b¡o is 9.9; the upper .005 criti- lI¡l pop- variance of'population i it gr.ut., than the generalized variance of cal value for b2,, is27.l. Thus both by,p aîd b2,p exceed their critical values, ulation t, even though the iotal variance is less' Comment on why this and we have significant skewness and kurtosis, apparently caused by the three is true in terms of the correlations. observations with large values of Df . The bivariate scatter plots are given in Figure 4.10. The three values are (4.4), show that E(z) g and cov(z) = I' clearlyseparatefromtheotherobservationsintheplotof y1 versusya. InTable 4.2 For z= (T')-'(y- p) as in = 3.8, the gth, l2th, and 20th values ofya are not unusual, nor are the 9th, 12th, from the and 20th values of yr. However, the increase from y¡ to y4 is exceptional in 4.3 Show thar the form of the likelihood function in (4.13) follows each case. If these values are not due to errors in recording the data and if this previous expression. sample is representative, then we appear to have a mixture of two populations. This should be taken into account in making inferences. (4'13) has the 4.4 Show that by adding and subtracting J, the exponent of form given in (4.14), that is.

I v + i - p)'r-'(Yi - i + i - P) 2 )tv' - J1 Êl +-ñ-ì1 g,)'r-t(y p). = å å''' - y)'I-'ty¡ - J) 2- - EEE i=l l.:"1 to the 1,...1 4.5 Show that Jh and b2 as given in (4. l8) and (4.19) are invariant t,..1 transformatioû z¡ = al¡ * b. l------;.-l t.".'t t-tr 4.6 Show that if y is Nr(f,E), then ßz,p= p(p + 2) as in (4'34)' l.ll'.t tt Iri 1...'l 4.7 Show that by,p and b2,,, as given by (a.36) ancl (4.37) are invariant under l.-l the transformãtion z¡ = Ayi + b, where A is nonsingular' Thus b1,, and b2.o do not depend on the units of measurement; the variables could even II II be standardized. !L t...tt'l l.l 4.8 Show that F¡,¡ = [(n p l)/p](llw - 1) as in (4.49)' L- | L.-l - - !t !z

Figure 4.10 Scatter plots for the ramus bone data in Table 3.8. 4.9 Suppose y is N3(p,I), where 122 THE MULTIVARIATE NORMAL DISTRIBUTION PROBLEMS t23

.= ,i (1) '= (_i ù .=(i) '=(-ä sr)

are independent? (a) Find the distribution of ¿ = 2¡,1- y2+3y. Which of the following random variables (b) Find the joint distribution of zr - lt+Ìz+)3 and z?=yt -yz+Zyt. (a) y¡andy2 (d) (c) Find the distribution of y2. þ1,y2)andy3 (b) y¡ andy3 (e) þ1,y3)andy2 (d) Find the joint distribution of y¡ and y3. (c) y2 andy3 (e) Find the joint distribution of yt, yz, anO j(lr +l,z). 4.14 Suppose y is l/a(¡r.,I), with 4.10 Suppose y is N3(p, X) øth p and å given in the previous problem. (a) Find a vector z such rhar z= (T')-r(y- F) is N3(0,I) as in (4.4). (b) Find a vector z such that z=(I.t/2¡-t1, - F) is Ns(O,0 as in (4.5). (c) What is the distribution of (y - p)'X-'(y - p)? .Ii]'[:iii] 4.11 Suppose y is Na(p,I), where Which of the following random variables are independent?

(a) yl andy2 (fl y3 andya ft) y¡ andy2 andy3 . (b) yr andy: (g) (y¡,y2)andy3 (l) ylandy2andya [.i] ' ['i .: (c) y¡ andya (h) (yq,¡,2) andya (m) (yr,yz) and (y3,ya) i'] (y¡,y3) (n) and(y2,ya) (d) yz andy¡ (i) andya (yr,y:) (e) yz andy¿ (i) y¡ and(¡'2,ya) (a) Find the distribution of z - 47,y - 2y2 + yt - 3yc. (b) Find the joint distribution of zr - lt + lz + lt + yq and 72 = -2yt + 3y2+73-2ya. 4.15 Assume y and x are subvectors, each 2 x 1, where (c) Find the joint distribution of zr = 3yr +yz- 4yt_!q, zz - _yt _3yz+ Ys - ZYq, and ¿3 = 2y1 + 2y2 + 4y - 5ya. N¿(p,r) (d) Whar is the distribution of y3? (l) is (e) What is the joint distribution of y2 and ya? (f) Find the joinr with disrriburion of yr, å(/r +yz), å(yr +y2+y), and l{tr+tr+h+tù. -3 4.12 Suppose y is Na(p, I) with ¡r and å given in the previous problem. r _?t 0 (a) p=l-l I= Find a vectorz such that z=(T')-t(y - p) is N¿(0,Ð, as in (4.4). l3l (b) Find a vecror z such rhat z = (2t/2¡-t 1t - p) is N4(0, I), as in (4.5). L rJ Ii (c) What is rhe distriburion of (y - p),ã-t(y - pX (a) Find E(ylx) by (a.7). 4.13 Suppose y is N3(p,I), with (b) Find cov(ylx) by (a.8). I24 THE MULTIVARIATE NoRMAL DISTRIBUTIoN PROBLEMS 125

4.16 Suppose y and x are subvectors, such that y is 2x I and x is 3x l, with (c) D'Agostino's test using D and Y given in (4.22) and (4-23) p and partitioned å accordingly: (d) The test by Lin and Mudholkar using z defined in (4.24)

150 86 4.21 For the probe word data in Table 3.7, check for multivariate normality s- and outliers using the following tests: pt= 15 I s08 (a) Calculate D] as in (4.27) for each observation. 0 6 84 a (b) the largest value of Dl with the critical value in Table 4.6' Ii] -t 50 ll Compare (c) Compute u¡ and u¡ in (4.28) and (4.29) and plot them. Is there an Assume that ( {) is distribured as Ns(p.,!). indication of nonlinearity or outliers? (a) Find E(ylx) by Ø.7). (d) Calculate b1.p and b2.p in (4.36) and (4.37) and compare them with (b) Find cov(ylx) by (a.8). critical values in Table 4.5.

4.17 Suppose that ]¡, yz, . . . , ],, is a random sample from a nonnormal multi 4.22 Six hematology variables were measured on 5l workers (Royston 1983): variate population with mean p and covariance matrix X. If n is large, what is the approximate distribution of each of the following? (a) J"$ - t;.) yr = hemoglobin concentration y¿ = lymphocyte count (b) y- yr = packed cell volume )s = neutrophil count )¡ = white blood cell count )6 = serum lead concentration 4.18 For the ramus bone data treated in Example 4.5.2, check each of the four variables for univariate normality using the following techniques; (a) Q-p plots The data are given in Table 4.3. Check each of the six variables for univariate normality using the following tests. (Ð Jbt and b2 as given by (a.1S) and (4.19) (c) D'Agostino's test using D and Y given in (4.22) and (4.23) (a) Q-Q plots (d) The test by Lin and Mudholkar using z defined in (4.24) @ Jbt and å2 as given by (4.18) and (4.19) (c) D'Agostino's test using D and Y given in @.22) and (4.23) 4-19 For the calcium data in Table 3.5, check for multivariate normality and (4.24) outliers using the following tests: (d) The test by Lin and Mudholkar using z defined in (a) Calculate Df as in (4.27) for each observarion. (b) Compare the largest value of Di with the critical value in Table A.6. 4.23 For the hematology data in Table 4.3, check for multivariate normality (c) Compute u¡ and u¡ in (4.28) and (4.2g) and plot rhem. Is rhere an using the following techniques: indication of nonlinearity or outliers? (a) Calculate Df as in (4.2T for each observation. (d) Calculate b1,, and b2,o in (4.36) and (4.37) and compare them with critical values in Table 4.5. (b) Compare the largest value of Df with the critical value in Table 4.6 (extrapolate).

4.20 For the probe word data in Table 3.7, check each of the five variabres (c) Compute u¡ Ãnd u¡ in (4.28) and (4.29) and plot them. Is there an for univariate normality and outliers using the following tests: indication of nonlinearity or outliers? (a) Q-p plots (d) Calculate b¡,p and b2,n in (4.36) and (4.37) and compare them with @ Jh and bz as given by (a.1S) and (4.19) critical values in Table 4.5. 126 THE MULTIVARIATE NORMAL DISTRIBUTION

Table 4.3 Hematology Data

Observation Number -),1 .y2 CHAPTER 5 I 13.4 39 4100 l4 25 I7 2 t4.6 46 s000 t5 30 20 J 13.5 42 4500 l9 2t 18 4 15.0 46 4600 23 16 l8 5 14.6 44 5 100 l'7 3l 19 6 14.0 41 4900 20 24 19 Tests on One or Two Mean Vectors 7 t6.4 49 4300 ?l t? l8 8 14.8 44 4400 Ió 26 79 9 15.2 46 4r00 2') 13 27 t0 t5.5 48 8400 34 42 36 ll 15.2 47 s600 ?6 27 22 t2 r 6.9 50 5100 28 t'7 23 l3 14.8 44 4700 20 23 l4 t6.2 45 5600 26 25 19 5.1 MULTIVARIATE VERSUS UNIVARIATE TESTS l5 14.7 43 4000 23 13 l7 l6 14.7 42 3400 I 22 13 t7 t 6.5 45 5400 l8 32 t7 Hypothesis testing in a multivariate context is more complex than in a univariate l8 t5.4 45 6900 28 36 21 setting. The number of parameters may be staggering. The p-variate normal l9 l5.r 45 4600 l'7 29 l7 for example, has p means, p variances, and ( covariances, where 20 t4.z 46 4Z0o l4 25 28 distribution, !) 2l 15.9 46 5200 8 34 16 ( {) represents the number of pairs among the p variables. The total number of 22 16.0 47 4'700 25 t4 l8 parameters is 23 t7.4 50 8600 31 39 t1 24 14.3 43 5500 20 3r 19 ?5 14.8 44 4?00 l5 24 29 26 t4.9 13 4300 9 32 t7 27 l5-5 4s 5200 l6 30 20 p+p.('r)=lne+3). 28 t4.5 43 3900 I8 18 25 29 14.4 45 6000 t7 37 23 30 14.6 44 4'100 73 2l 27 Each corresponds to a hypothesis that could be formulated. Addi- 31 r 45 7900 43 23 23 5.3 tionally, we might well be interested in testing hypotheses about subsets of JZ 14.9 45 3400 l'7 t5 24 33 15.8 47 6000 23 32 2l these parameters or about functions of them. In some cases, we have the added 34 14.4 44 '7'100 3I 39 23 dilemma of choosing among competing test statistics. 35 46 3700 I] 23 23 14.7 We first discuss the motivation for testing p variables multivariately rather 36 14.8 43 5200 25 lo )) 37 15.4 45 6000 l0 25 18 than, or in addition to, univariately. There are at least four arguments for a 38 16.2 50 8 t00 32 38 18 multivariate approach to hypothesis testing: 39 t5.0 45 4900 t7 26 24 40 l5.r 47 6000 22 33 t6 4t r ó.0 46 4600 20 22 22 1. The use of p univariate tests inflates the Type I error Íate, d., whereas 42 15.3 48 5500 20 23 23 the multivariate test preserves the exact a level. For example, if we do 43 t4.5 4t 6200 20 36 2l p 41 14.2 41 4900 26 20 20 = l0 separate univariate tests at the .05 level, the probability of at least '1200 45 ¡ 5.0 45 40 25 25 one false rejection is greater than .05. If the variables were independent 46 t4.2 46 5800 22 3t z2 (they rarely are), we would have (under Ë/6) 47 t4.9 45 8400 6t t't l7 48 t6.2 48 3r00 t2 15 l8 49 14.5 45 4000 20 18 20 P(at least one rejection) = I - P(all l0 tests accept) 50 16.4 49 6900 35 72 24 5l 14.7 44 7800 38 34 16 = I - (.95)ro = .40.

The resulting overall o of.40 is not an acceptable error rate. Typically,

127