STAT564/STAT476: Multivariate Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 6: MANOVA Multivariate analysis of variance (MANOVA) generalizes ANOVA to allow multivariate responses. We'll start by reviewing ANOVA (the balanced case), particularly to develop the notation consistent with the MANOVA presentation. STAT476/STAT576 March 6, 2015 1 / 93 ANOVA review In balanced one-way ANOVA, there are k samples one from each of k different populations, each with n observations. The populations being sampled might be individuals subjected to different treatments in a medical experiment, crops being given different fertlizer/watering regimes. If it is not an experiment, the different populations might represent different groups, such as different varieties of a crop, or different ethnicities/nationalities for people. Values from observations within a particular group are denoted by yij , where i = 1;:::; k denotes the sample and j = 1;::: n denotes the observation within the sample. Note that this is different notation from the previous chapter where the first index represented the row (observation) and the second index represented the column (variable). STAT476/STAT576 March 6, 2015 2 / 93 ANOVA review STAT476/STAT576 March 6, 2015 3 / 93 ANOVA review The ith group has mean n 1 X y = y i: n yij j=1 and total n X yi: = yyij j=1 The ANOVA model is that each observation is due to an overall mean, a treatment (or population) mean, and an unobserved error term yij = µ + αi + "ij = µi + "ij where i = 1;:::; k and j = 1;:::; n. STAT476/STAT576 March 6, 2015 4 / 93 ANOVA review The null hypothesis is H0 : µ1 = ··· = µk , and the alternative is H1 : µi 6= µj for some i 6= j. (alternatively we could express this interms of αi s instead of µi s). The model assumes that all populations have the same variance σ2. Assuming this, we wish to test whether the means differ for the different populations. The basic idea of ANOVA is that if the null hypothesis is true, then the common variance σ2 can be estimated either by averaging the variances of the separate samples, or by using the sample standard deviation of the sample means. STAT476/STAT576 March 6, 2015 5 / 93 ANOVA review The pooled standard deviation is k k n 1 X 1 X X s2 = s2 = (y − y )2 e k i k(n − 1) ij i i=1 i=1 j=1 The sample standard deviation of the sample means is k 1 X s2 = (y − y )2 y k − 1 i: :: i=1 where y :: is the mean of the observations over both groups (samples) and observations. You can also think of it as the mean of the sample means k n k 1 X X 1 X y = y = y :: nk ij k i: i=1 j=1 i=1 STAT476/STAT576 March 6, 2015 6 / 93 ANOVA review Under the null hypothesis and model assumptions, both sample variances are related to σ2: 2 2 2 2 E(se ) = σ ; E(nsy ) = σ If the null hypothesis is false (but all populations have the same variance), 2 2 2 then it is still the case that E(se ) = σ . However, the variability of nsy is higher because there is variability due to both the sample means and the variability of the population means themselves. In this case k n X E(ns2) = σ2 + α y k − 1 i i=1 STAT476/STAT576 March 6, 2015 7 / 93 ANOVA review 2 2 The ratio of nsy and se has an F distribution under the hypothesis. It is 2 2 partly feasible to work out the distribution because nsy and se are independent random variables (under H0). This is a consequence of y and s2 being independent for samples from a normal distribution and from the assumption that each of the k samples is independent. The numerator and denominator are therefore each related to χ2 random variables, and the ratio of χ2 random variables is related to the F distribution. STAT476/STAT576 March 6, 2015 8 / 93 ANOVA review 2 nsy SSH=(k − 1) MSH F = 2 = = se SSE=(k(n − 1)) MSE has an Fk−1;k(n−1) distribution. Note that the expected value of the numerator divided by the expected value of the denominator is equal to 1; however the expected value of a ratio is typically not the ratio of the expected values, and we have k(n − 1) n − k E(F ) = = k−1;k(n−1) k(n − 1) − 2 n − k − 2 This is close to 1 for large nk when n is much larger than k. Note that the expected value of an F random variable only depends on the denominator degrees of freedom. STAT476/STAT576 March 6, 2015 9 / 93 ANOVA review Hypothesis testing is done as a one-sided test, only rejecting H0 for sufficiently large F . The F distribution is skewed to the right, and the p-value is the area under the curve to the right of the observed F value. STAT476/STAT576 March 6, 2015 10 / 93 MANOVA MANOVA generalizes both the Hotelling T 2, which allows two populations with multiple variables on each, and ANOVA, which allows one variable but with two or more populations. For the MANOVA set up, we have observation vectors yij from sample i = 1;::: k, with j = 1;:::; n indexing the observation. Each observation vector yij is a p-dimensional multivariate normal vector with mean vector µi and common covariance matrix Σ. The set up can be written in a way analgous to balanced one-way ANOVA with individual observations replaced with observation vectors. STAT476/STAT576 March 6, 2015 11 / 93 MANOVA STAT476/STAT576 March 6, 2015 12 / 93 MANOVA The model can be written as yij = µ + αi + "ij = µi + "ij where we assume yij ∼ Np(µi ; Σ) STAT476/STAT576 March 6, 2015 13 / 93 MANOVA For r = 1;:::; p, we can also write the model as 0 1 0 1 0 1 yij1 µi1 "ij1 Byij2C Bµi2C B"ij2C B C = B C + B C B . C B . C B . C @ . A @ . A @ . A yijr µir "ijr so that for each variable r = 1;:::; p, the model is yijr = µir + "ijr The null and alternative hypotheses are H0 : µ1 = ··· = µk ; H1 : µi 6= µj for at least one pair i 6= j i.e., that each population has the same mean vector and that that at least two populations have different mean vectors, or that at least two populations have at least one variable with different means. STAT476/STAT576 March 6, 2015 14 / 93 MANOVA The null hypothesis can be written also as p sets of k − 1 equalities: µ11 = µ21 = ··· = µk1 µ12 = µ22 = ··· = µk2 . = . = = . µ1p = µ1p = ··· = µkp This is a total of p(k − 1) equalities, and any one of these failing is sufficient to make H0 false. STAT476/STAT576 March 6, 2015 15 / 93 MANOVA Analogous to the SSH (Sums of squares hypothesis) and SSE (sums of squares for error), we have k k X X 1 1 H = n (y − y::)(y − y::)0 = y y0 − y y0 i: i: n i: i: kn :: :: i=1 i=1 k n X X X 1 X E = (y − y )(y − y )0 = y y0 − y y0 ij i: ij i: ij ij n i: i: i=1 j=1 ij i STAT476/STAT576 March 6, 2015 16 / 93 MANOVA The E and H matrices are both p × p, but not necessarily full rank. The rank of H is min(p; vH ), where vH is the degrees of freedom associated with the hypothesis, i.e. k − 1. We can think the pooled covariance matrix as E S = pl (n − 1)k with E(Spl) = Σ However, if the sample mean vectors were equal for each population, then we would have H = 0. STAT476/STAT576 March 6, 2015 17 / 93 MANOVA STAT476/STAT576 March 6, 2015 18 / 93 MANOVA STAT476/STAT576 March 6, 2015 19 / 93 MANOVA The E and H matrices can be used in different ways to test the null hypothesis. Wilks' Test Statistic is jEj Λ = jE + Hj The null is rejected if Λ < Λα,p;vH ;vE where vH is the degrees of freedom for the hypothesis, k − 1, and vE is degrees of freedom for error, k(n − 1). Critical values are in Table A9. The test statistic can instead be converted to an F , but there are different cases. STAT476/STAT576 March 6, 2015 20 / 93 MANOVA STAT476/STAT576 March 6, 2015 21 / 93 Properties of Wilks' Λ I We need vE = (n − 1)k ≥ p for the determinants to be positive I The degrees of freedom for error and hypothesis are the same as for univariate ANOVA I The distribution of Λp;vH ;vE is the same as Λp;vE ;vH . This saves some space for the table of critical values. I Wilks' Λ can be written as min(p;vH ) Y 1 Λ = 1 + λi i=1 −1 where λi is the ith eigenvalue of E H. Here s = min(p; vH ) is the rank of s, which is also the number of nonzero eigenvalues of E−1H. I Λ is in the interval [0,1]. If the sample mean vectors were all equal (for example, if they were all equal to their expected values under the null), then H = 0. STAT476/STAT576 March 6, 2015 22 / 93 Properties of Wilks' Λ I Increasing the number of variables p decreases the critical value for Λ needed to reject the null hypothesis. This means that it is more difficult to reject H0 (since we reject for small Λ) unless the null hypothesis is false for the new variables.