Lecture 3. Inference About Multivariate Normal Distribution

Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X1;:::; Xn be i.i.d. Np(µ, Σ). We are interested in evaluation of the maximum likelihood estimates of µ and Σ. Recall that the joint density of X1 is − 1 1 0 −1 f(x) = j2πΣj 2 exp − (x − µ) Σ (x − µ) ; 2 p n for x 2 R . The negative log likelihood function, given observations x1 = fx1;:::; xng, is then n n 1 X `(µ, Σ j xn) = log jΣj + (x − µ)0Σ−1(x − µ) + c 1 2 2 i i i=1 n n n 1 X = log jΣj + (x¯ − µ)0Σ−1(x¯ − µ) + (x − x¯)0Σ−1(x − x¯): 2 2 2 i i i=1 ~ On this end, denote the centered data matrix by X = [x~1;:::; x~n]p×n, where x~i = xi − x¯. Let n 1 1 X S = X~ X~ 0 = x~ x~0 : 0 n n i i i=1 n Proposition 1. The m.l.e. of µ and Σ, that jointly minimize `(µ, Σ j x1 ), are µ^MLE = x¯; MLE Σb = S0: Note that S0 is a biased estimator of Σ. The sample variance{covariance matrix S = n n−1 S0 is unbiased. For interval estimation of µ, we largely follow Section 7.1 of Härdleand Simar (2012). First note that since µ 2 Rp, we need to generalize the notion of intervals (primarily defined for R1) to higher dimension. A simple extension is a direct product of marginal intervals: for intervals a < x < b and c < y < d, we obtain a rectangular region f(x; y) 2 R2 : a < x < b; c < y < dg. A confidence region A 2 Rp is composed of the values of a function of (random) obser- n vations X1;:::; Xn. A = A(X1 ) is a confidence region of size 1 − α 2 (0; 1) for parameter µ if p P (µ 2 A) ≥ 1 − α; for all µ 2 R : (Elliptical confidence region) Corollary 7 in lecture 2 provides a pivot which paves n−p ¯ 0 −1 ¯ a way to construct a confidence region for µ. Since p (X − µ) S0 (X − µ) ∼ Fp;n−p and ¯ 0 −1 ¯ p P (X − µ) S0 (X − µ) < n−p F1−α;p;n−p = 1 − α; p A = µ 2 p :(X¯ − µ)0S−1(X¯ − µ) < F R 0 n − p 1−α;p;n−p is a confidence region of size 1 − α for parameter µ. (Simultaneous confidence intervals) Simultaneous confidence intervals for all linear combinations of elements of µ, a0µ for arbitrary a 2 Rp, provides confidence of size 1 − α 0 for all intervals covering a µ including the marginal means µ1; : : : ; µp. We are interested in evaluating lower and upper bounds L(a) and U(a) satisfying 0 p p P (L(a) < a µ < U(a); for all a 2 R ) ≥ 1 − α; for all µ 2 R : First consider a single confidence interval by fixing a particular vector a. To evaluate 0 0 0 0 a confidence interval for a µ, write new random variables Yi = a Xi ∼ N1(a µ, a Σa)(i = 2 (a0µ−a0X¯ )2 1; : : : ; n), whose squared t-statistic is t (a) = n a0Sa ∼ F1;n−1. Thus, for any fixed a, 2 P (t (a) ≤ F1−α,1;n−1) = 1 − α: (1) Next, consider many projection vectors a1; a2;:::; aM (M is finite only for convenience). The simultaneous confidence intervals of the type similar to (1) are then M ! \ 2 P ft (ai) ≤ h(α)g ≥ 1 − α; i=1 for some h(·). By collecting some facts, 2 2 1. maxa t (a) ≤ h(α) implies t (ai) ≤ h(α) for all i. 2 ¯ 0 −1 ¯ 2. maxa t (a) = n(µ − X) S (µ − X), 3. and Corollary 7 in lecture 2, n−1 p we have for h(α) = n n−p F1−α;p;n−p, M ! \ 2 2 P ft (ai) ≤ h(α)g ≥ P max t (a) ≤ h(α) = 1 − α: a i=1 Proposition 2. Simultaneously for all a 2 Rp, the interval a0X¯ ± ph(α)a0Sa contains a0µ with probability 1 − α. Example 1. From the Golub gene expression data, with dimension d = 7129, take the first and 1674th variables (genes), to focus on the bivariate case (p = 2). There are two populations: 11 observations from AML, 27 from ALL. Figure 1 illustrates the elliptical confidence region of size 95% and 99%. Figure 2 compares the elliptical confidence region with the simultaneous 0 0 confidence intervals for a1 = (1; 0) and a2 = (0; 1) . 2 4 x 10 Golub data, ALL (red) and AML(blue) 2.5 2 1.5 gene #1674 1 0.5 0 −500 −400 −300 −200 −100 0 100 gene #1 Figure 1: Elliptical confidence regions of size 95% and 99%. 4 x 10 Golub data, ALL (red) and AML(blue) 2.5 2 1.5 gene #1674 1 0.5 0 −500 −400 −300 −200 −100 0 100 gene #1 Figure 2: Simultaneous confidence intervals of size 95% 3.2 Hypotheses testing Consider testing a null hypothesis H0 : θ 2 Ω0 against an alternative hypothesis H1 : θ 2 Ω1. The principle of likelihood ratio test is as follows: Let L0 be the maximized likelihood under 3 H0 : θ 2 Ω0, and L1 be the maximized likelihood under θ 2 Ω0 [ Ω1. The likelihood ratio statistic, or sometimes called Wilks statistic, is then L W = −2 log( 0 ) ≥ 0 L1 The null hypothesis is rejected if the observed value of W is large. In some cases the exact distribution of W under H0 can be evaluated. In other cases, Wilks' theorem states that for large n (sample size), L 2 W =)χν; where ν is the number of free parameters in H1, not in H0. If the degrees of freedom in Ω0 is q and the degrees of freedom in Ω0 [ Ω1 is r, then ν = r − q. Consider testing hypotheses on µ and Σ of multivariate normal distribution, based on n-sample X1;:::; Xn. case I: H0 : µ = µ0;H1 : µ 6= µ0; Σ is known. In this case, we know the exact distribution of the likelihood ratio statistic 0 −1 2 W = n(x¯ − µ0) Σ (x¯ − µ0) ∼ χp; under H0. case II: H0 : µ = µ0;H1 : µ 6= µ0; Σ is unknown. ¯ The m.l.e. under H1 areµ ^ = x and Σb = S0. The restrictedp m.l.e. of Σ under H0 is 1 Pn 0 0 Σb(0) = n i=1(xi − µ0)(xi − µ0) = S0 + δδ , where δ = n(x¯ − µ0). The likelihood ratio statistic is then 0 W = n log jS0 + δδ j − n log jS0j: It turns out that W is a monotone increasing function of 0 −1 −1 δ S δ = n(x¯ − µ0)S (x¯ − µ0); which is the Hotelling's T 2(n − 1) statistic. case III: H0 : Σ = Σ0;H1 :Σ 6= Σ0; µ is unknown. We have the likelihood ratio statistic −1 −1 W = −n log jΣ0 S0j − np + ntrace(Σ0 S0): This is the case where the exact distribution of W is difficult to evaluate. For large 2 n, use Wilks' theorem to approximate the distribution of W by χν with the degrees of freedom ν = p(p + 1)=2. Next, consider testing the equality of two mean vectors. Let X11;:::; X1n1 be i.i.d. Np(µ1; Σ) and X21;:::; X2n2 be i.i.d. Np(µ2; Σ). 4 case IV: H0 : µ1 = µ2;H1 : µ1 6= µ2; Σ is unknown. Since ¯ ¯ n1 + n2 X1 − X2 ∼ Np(µ1 − µ2; Σ); n1n2 (n1 + n2 − 2)SP ∼ Wp(n1 + n2 − 2; Σ); we have Hotelling's T 2 statistic for two-sample problem 2 n1n2 ¯ ¯ 0 −1 ¯ ¯ T (n1 + n2 − 2) = (X1 − X2) SP (X1 − X2); n1 + n2 and by Theorem 5 in lecture 2 n1 + n2 − p − 1 2 T (n1 + n2 − 2) ∼ Fp;n1+n2−p−1: (n1 + n2 − 2)p 2 Similar to case II above, the likelihood ratio statistic is a monotone function of T (n1 + n2 − 2). 3.3 Hypothesis testing when p > n In the high dimensional situation where the dimension p is larger than sample size (p > n−1 2 or p > n1 +n2 −2), the sample covariance S is not invertable, thus the Hotelling's T statistic, which is essential in the testing procedures above, cannot be computed. We survey important proposals for testing hypotheses on means in high dimension, low sample size data. A basic idea in generalizing a test procedure for the p > n case is to base the test on a computable test statistic which is also an estimator for kµ − µ0k or kµ1 − µ2k. In case II (one sample), Dempster (1960) proposed to replace S−1 in Hotelling's statistic −1 by (trace(S)Ip) . He showed that under H0 : µ = 0, nX¯ 0X¯ T = ∼ F ; approximately; D trace(S) r;(n−1)r 2 for r = (trace(Σ)) , a measure of sphericity of Σ. An estimatorr ^ of r is used in testing. trace(Σ2) −1 Bai and Saranadasa (1996) proposed to simply replace S in Hotelling's statistic by Ip, ¯ 0 ¯ ¯ 0 ¯ 0 ¯ 0 ¯ yielding TB = nX X. However X X is not an unbiased estimator of µ µ since E(X X) = 1 0 n trace(Σ) + µ µ. They showed that the standardized statistic nX¯ 0X¯ − trace(S) nX¯ 0X¯ − trace(S) MB = = q sd(b nX¯ 0X¯ − trace(S)) 2(n−1)n 2 1 2 (n−2)(n+1) trace(S ) − n (trace(S)) has asymptotic N(0; 1) distribution for p; n ! 1.

Lecture 3. Inference About Multivariate Normal Distribution

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support