Edgeworth Expansion of Wilks' Lambda Statistic

Journal of Multivariate Analysis 97 (2006) 1958–1964 www.elsevier.com/locate/jmva Edgeworth expansion of Wilks’ lambda statistic Hirofumi Wakaki Department of Mathematics, Hiroshima University, Kagamiyama 1-3-1, Higashi–Hiroshima 739-8526, Japan Received 19 May 2005 Abstract An asymptotic expansion of the null distribution of the Wilks’lambda statistic is derived when some of the parameters are large. Cornish–Fisher expansions of the upper percent points are also obtained. A monotone transformation which reduces the third and the fourth order cumulants is also derived. In order to study the accuracy of the approximation formulas, some numerical experiments are done, with comparing to the classical expansions when only the sample size tends to infinity. © 2006 Published by Elsevier Inc. AMS 1991 subject classification: primary 62H10secondary 62E20 Keywords: MANOVA; High dimension; Likelihood ratio; Monotone transformation; Edgeworth expansion 1. Introduction Consider one-way MANOVA in which the equality of q + 1 mean vectors is tested. Let Vb and Vw be the between and the within matrices of sum of squares and products, respectively. Assume normal populations with common covariance matrix . Then the likelihood ratio test is based on the statistic −1 = det{Vw(Vb + Vw) }, (1.1) and Vw ∼ Wp(n, ), Vb ∼ Wp(q, ) under the null hypothesis, where n depends on the sample sizes and is assumed to be larger than p. E-mail address: [email protected]. 0047-259X/$ - see front matter © 2006 Published by Elsevier Inc. doi:10.1016/j.jmva.2005.10.010 H. Wakaki / Journal of Multivariate Analysis 97 (2006) 1958–1964 1959 The criterion (1.1) was introduced by Wilks [9] and is called Wilks’ lambda-criterion. We denote the null distribution of as p(q, n). It is known that is distributed as the product of some independent beta random variables under the null hypothesis. So the exact null distribution function is complicated and hard to use. When n is large relative to p and q, an asymptotic expansion based on the formula of Box [2] can be used. Let = − 1 + + = 1 2 + 2 − M n 2 (p q 1), 48 pq(p q 5) (1.2) and f = pq. Then −4 Pr{−M log x}=G (x) + {G + (x) − G (x)}+O(M ). (1.3) f M2 f 4 f Distributional results for and other test statistics can be found in Muirhead [5, Chapter 10], Anderson [1, Chapter 8] and Siotani et al. [6, Chapter 6]. Numerical experiments show that the approximation based on (1.3) is poor when p is large relative to n. Tonda and Fujikoshi [7] derived an asymptotic expansion formula when q is fixed, n →∞, p →∞with p/n → c ∈ (0, 1) by using the fact that p(q, n) = q (p, n − p + q). Wakaki et al. [8] derived asymptotic expansion formulas for three test statistics, , Lawley–Hotelling’s trace and Bartlett–Nanda–Pillai trace in the same setup with Tonda and Fujikoshi [7]. Since the leading term of the expansion is normal distribution we refer there formula as “normal type”,while the formulas like (1.3) are referred as “chi-square type”. Their numerical experiments show that when p is large the normal type approximations perform better than the chi-square type. However, when q is large both normal type and chi-square approximations has poor performance. In this paper we derive an asymptotic expansion of the null distribution of when q is also large. 2. Edgeworth expansion for log From q n − p + j p q (p, n − p + q) = beta , , 2 2 j=1 the hth order moment of is given by n−p+q n+q q + h q E(h) = 2 2 , (2.1) n+q + n−p+q q 2 h q 2 where q − − q(q−1)/4 j 1 q 1 q (a) = a − , R(a) > . 2 2 j=1 Hence the cumulants generating function of − log can be expanded as ⎧ ⎫ q ⎨ n−p+j − n+j ⎬ 2 it 2 log E[exp(−it log )]= log ⎩ n+j − n−p+j ⎭ j=1 2 it 2 ∞ s (−1) − n − p + q − n + q = (s 1) − (s 1) , (2.2) s! q 2 q 2 s=1 1960 H. Wakaki / Journal of Multivariate Analysis 97 (2006) 1958–1964 where q j − 1 (s)(a) = (s) a − (s = 0, 1, ···;a>0), q 2 j=1 and (s)(a) is the polygamma function defined as ⎧ ⎪ ∞ 1 1 ⎪ −C + − (s = ) s ⎨ + + 0 (s) d k=0 1 k k a (a) = log (a) = ∞ + . (2.3) da ⎪ (−1)s 1s! ⎩ (s = 1, 2, ···) + s+1 k=0 (k a) Here C is the Euler’s constant. The expansion (2.2) and the series representation in (2.3) give the sth order cumulant (s) of − log as − n − p + q − n + q (s) = (−1)s (s 1) − (s 1) q 2 q 2 q ∞ = fs(j + 2k), (2.4) j=1 k=0 where 2 s 2 s f (x) = (s − 1)! − . s n − p + x n + x Since fs is a decreasing and convex function, j+1 k+1 fs(x + 2y)dy dx j k j+1/2 k+1/2 <fs(j + 2k) < fs(x + 2y)dy dx. j−1/2 k−1/2 Therefore, we obtain upper and lower bounds for (s) as 1 b(s)(n − p + 1,p,q)<(s) <b(s) n − p − ,p,q , 2 pq b(2)(l,p,q)= 2 log 1 + , (l + p + q)l s−1 − ! (s) = 2 (s 3) b (l,p,q) − ls 2 − − − l s 2 l s 2 l s 2 × 1 − − + (2.5) l + q l + p l + p + q for s 3. We note that b(s)(l,p,q)>0ifl,p and q are positive. H. Wakaki / Journal of Multivariate Analysis 97 (2006) 1958–1964 1961 Table 1 The orders of the standardized cumulants ˜ (s) ε − − − (i) l ∼ p ∼ q?1, p ∼ l?q ∼ 1, ∼ ε (s 2) l 1/2 p?l?q ∼ 1, p?l ∼ q?1 − − − (ii) l?p?q?1, l?p ∼ q?1 ∼ ε (s 2) (pq) 1/2 − − − (iii) l?p?q ∼ 1 ∼ ε (s 2) p 1/2 ∼ ? ? ? ? ? ∼ −(s−2) −1/2 (iv) p l q 1, p l q 1 ε (lq) −1/2 ? ? ? ∼ ? ? ∼ q −1 −(s−2) 2 q (v) p q l 1, p q l 1 log l ε l log l − − − − (vi) p?q?l ∼ 1, p ∼ q?l ∼ 1 ∼ (log q) 1ε (s 2) (log q) 1/2 (vii) p?q ∼ l ∼ 1, l?p ∼ q ∼ 1 ∼ 1 Assume at least one of l = n − p, p, q tends to infinity. We use the following notations: a b a ∼ b ⇔ and are bounded, b a b a?b ⇔ → 0. a Let the standardized cumulant ˜ (s) be defined as − ˜ (s) = (s)((2)) s/2 (s 3). (2.6) Then using (2.5) the order of the standardized cumulants can be calculated in several cases. The results are summarized in Table 1. Note that the orders in the cases that q?p are obtained by using the fact that p(q, n) = (s) q (p, n − p + q). Table 1 shows that the standardized cumulants can be expressed as ˜ = εs−2(s) with ε → 0 and (s) = O(1) in all cases except (vii). Only chi-square type approximation can be used in the second case of (vii). Neglecting the cumulants ˜ (s) (s 5), we obtain the Edgeworth expansion of the Wilks’ distribution as in the following theorem. − log − (1) Theorem 1. In each case of (i)–(vi) in Table 1, the null distribution of T = can ((2))1/2 be expanded as 1 (3) Pr {T z} = (z) − (z) ˜ h2(z) 6 1 1 + ˜ (4)h (z) + (˜ (3))2h (z) + O(˜ (5)), (2.7) 24 3 72 5 (1) (2) (s) where , and ˜ (s 3) are given by (2.4) and (2.6), and hs(z)’s are the Hermite poly- nomials given by 2 3 h1(z) = z, h2(z) = z − 1,h3(z) = z − 3z, 4 5 3 h4(z) = z − 6z + 3,h5(z) = z − 10z + 15z. Note that the order of the remainder term in each case of (i) to (iv) is O(ε3), while the order in each case of (v) and (vi) is o(ε3). 1962 H. Wakaki / Journal of Multivariate Analysis 97 (2006) 1958–1964 We note that if p or q are even number, (s)’s are simplified as ⎧ − s ⎪ q/2 1 p 2 ⎨⎪ (s − 1)! (q :even), n − p + 2j + k (s) = j=0 k=1 ⎪ q p/2−1 2 s ⎩⎪ (s − )! (p ). 1 − + + :even j=1 k=0 n p j 2k By using the above theorem, we obtain the Cornish–Fisher expansion of the upper 100% point of the distribution as in the following theory. Corollary 2. Let z be the upper 100(1 − )% point of the standard normal distribution, and let ⎧ ⎫ ⎨ ˜ (3) ˜ (3) 2 ˜ (4) ⎬ z (z) = z + (z2 − 1) − z(2z2 − 5) + z(z2 − 3) . (2.8) CF ⎩ 6 6 24 ⎭ Then (5) Pr {T zCF (z)} = 1 − + O(˜ ). 3. Monotone transformation In this section we focus to the case that the order of the remainder term in (2.7) is O(ε3) for simplicity, although the derived transformation can be also applied to the cases (v) and (vi) in Table 1. Expanding the inverse transformation of zCF (z), we obtain a cubic transformation which makes the statistic more close to normal distribution. Let g(t) = c{t + εa(t2 − 1) + ε2bt3}, 1 1 1 1 + ε2 1 (4) a =− (3),b= ((3))2 − (4),c= 8 .

Edgeworth Expansion of Wilks' Lambda Statistic

1. How Different Is the T Distribution from the Normal?

1 One Parameter Exponential Families

Chapter 5 Sections

Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F

The Normal Moment Generating Function

A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess

Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families

The Singly Truncated Normal Distribution: a Non-Steep Exponential Family

Distributions (3) © 2008 Winton 2 VIII

Student's T Distribution

On Curved Exponential Families

The Multivariate Normal Distribution