The Local Power of the Gradient Test

The Local Power of the Gradient Test

The local power of the gradient test Artur J. Lemonte, Silvia L. P. Ferrari Departamento de Estat´ıstica, Universidade de Sao˜ Paulo, Rua do Matao,˜ 1010, Sao˜ Paulo/SP, 05508-090, Brazil Abstract The asymptotic expansion of the distribution of the gradient test statistic is de- rived for a composite hypothesis under a sequence of Pitman alternative hypotheses converging to the null hypothesis at rate n¡1=2, n being the sample size. Compar- isons of the local powers of the gradient, likelihood ratio, Wald and score tests reveal no uniform superiority property. The power performance of all four criteria in one- parameter exponential family is examined. Key words: Asymptotic expansions; Chi-square distribution; Gradient test; Likeli- hood ratio test; Pitman alternative; Power function; Score test; Wald test. 1 Introduction The most commonly used large sample tests are the likelihood ratio (Wilks, 1938), Wald (Wald, 1943) and Rao score (Rao, 1948) tests. Recently, Terrell (2002) proposed a new test statistic that shares the same first order asymptotic properties with the likelihood ratio (LR), Wald (W ) and Rao score (SR) statistics. The new statistic, referred to as the gradient statistic (ST ), is markedly simple. In fact, Rao (2005) wrote: “The suggestion by Terrell is attractive as it is simple to compute. It would be of interest to investigate the performance of the [gradient] statistic.” The present paper goes in this direction. > Let x = (x1; : : : ; xn) be a random vector of n independent observations with prob- ability density function ¼(x j θ) that depends on a p-dimensional vector of unknown > parameters θ = (θ1; : : : ; θp) . Consider the problem of testing the composite null hy- > > > > pothesis H0 : θ2 = θ20 against H1 : θ2 6= θ20, where θ = (θ1 ; θ2 ) , θ1 = (θ1; : : : ; θq) > and θ2 = (θq+1; : : : ; θp) , θ20 representing a (p¡q)-dimensional fixed vector. Let ` be the Pn total log-likelihood function, i.e. ` = `(θ) = l=1 log ¼(xl j θ). Let U(θ) = @`=@θ = > > > (U1(θ) ; U2(θ) ) be the corresponding total score function partitioned following the partition of θ. The restricted and unrestricted maximum likelihood estimators of θ are b b> b> > e e> > > θ = (θ1 ; θ2 ) and θ = (θ1 ; θ20) , respectively. 1 The gradient statistic for testing H0 is e > b e ST = U(θ) (θ ¡ θ): e e > b Since U1(θ) = 0, ST = U2(θ) (θ2 ¡ θ20). Clearly, ST has a very simple form and does not involve knowledge of the information matrix, neither expected nor observed, and no matrices, unlike W and SR. Asymptotically, ST has a central chi-square distribution with p ¡ q degrees of freedom under H0. Terrell (2002) points out that the gradient statistic “is not transparently non-negative, even though it must be so asymptotically.” His Theorem 2 implies that if the log-likelihood function is concave and is differentiable at θe, then ST ¸ 0. In this paper we derive the asymptotic distribution of the gradient statistic for a com- posite null hypothesis under a sequence of Pitman alternatives converging to the null hypothesis at a convergence rate n¡1=2. In other words, the sequence of alternative hy- ¡1=2 > potheses is H1n : θ2 = θ20 + n ², where ² = (²q+1; : : : ; ²p) . Similar results for the likelihood ratio and Wald tests were obtained by Hayakawa (1975) and for the score test, by Harris & Peers (1980). Comparison of local power properties of the competing tests will be performed. Our results will be specialized to the case of the one-parameter exponential family. A brief discussion closes the paper. 2 Notation and preliminaries Our notation follows that of Hayakawa (1975, 1977). We introduce the following log- likelihood derivatives 2 3 ¡1=2 @` ¡1 @ ` ¡3=2 @ ` yr = n ; yrs = n ; yrst = n ; @θr @θr@θs @θr@θs@θt their arrays > y = (y1; : : : ; yp) ; Y = ((yrs)); Y::: = ((yrst)); the corresponding cumulants ·rs = E(yrs);·r;s = E(yrys); 1=2 1=2 1=2 ·rst = n E(yrst);·r;st = n E(yryst);·r;s;t = n E(yrysyt) and their arrays K = ((·r;s)); K::: = ((·rst)); K:;:: = ((·r;st)); K:;:;: = ((·r;s;t)): We make the same assumptions as in Hayakawa (1975). In particular, it is assumed that the ·’s are all O(1) and they are not functionally independent; for instance, ·r;s = ¡·rs. 2 Relations among them were first obtained by Bartlett (1953a,b). Also, it is assumed that Y is non-singular and that K is positive definite with inverse K¡1 = ((·r;s)) say. For triple-suffix quantities we use the following summation notation Xp Xp K::: ± a ± b ± c = ·rstarbsct; K:;:: ± M ± b = ·r;stmrsbt; r;s;t=1 r;s;t=1 where M is a p £ p matrix and a, b and c are p £ 1 column vectors. > > > The partition θ = (θ1 ; θ2 ) induces the corresponding partitions: " # " # " # Y Y K K K11 K12 Y = 11 12 ; K = 11 12 ; K¡1 = ; 21 22 Y21 Y22 K21 K22 K K > > > a = (a1 ; a2 ) , etc. Also, Xp Xp K2:: ± a2 ± b ± c = ·rstarbsct: r=q+1 s;t=1 Using a procedure analogous to that of Hayakawa (1975), we can write the asymptotic ¡1=2 expansion of ST for the composite hypothesis up to order n as 1 S = ¡(Zy + »)>Y (Zy + ») ¡ p K ± (Zy + ») ± Y ¡1y ± Y ¡1y T 2 n ::: 1 ¡ p K ± (Zy + ») ± (Z y ¡ ») ± (Z y ¡ ») + O (n¡1); 2 n ::: 0 0 p ¡1 where Z = Y ¡ Z0, " # " # ¡1 ¡1 Y11 0 Y11 Y12 Z0 = ; » = ²; 0 0 ¡Ip¡q Ip¡q being the identity matrix of order p ¡ q. We can now use a multivariate Edgeworth Type A series expansion of the joint density function of y and Y up to order n¡1=2 (Peers, 1971), which has the form · 1 f = f 1 + p (K ± K¡1y ± K¡1y ± K¡1y ¡ 3K ± K¡1 ± K¡1y) 1 0 6 n :;:;: :;:;: ¸ 1 ¡ p K ± K¡1y ± D + O(n¡1); n :;:: where ½ ¾ p 1 Y f = (2¼)¡p=2jKj¡1=2 exp ¡ y>K¡1y ±(y ¡ · ); 0 2 rs rs r;s=1 0 D = ((dbc)), dbc = ± (ybc ¡ ·bc)=±(ybc ¡ ·bc), with ±(¢) being the Dirac delta function (Bracewell, 1999), to obtain the moment generating function of ST , M(t) say. 3 ¡1=2 From f1 and the asymptotic expansion of ST up to order n , we arrive, after long algebra, at µ ¶ ¡ 1 (p¡q) t > M(t) = (1 ¡ 2t) 2 exp ² K ² 1 ¡ 2t 22:1 · ¸ 1 £ 1 + p (A d + A d2 + A d3) + O(n¡1); n 1 2 3 ¡1 where d = 2t=(1 ¡ 2t), K22:1 = K22 ¡ K21K11 K12, 1¡ ¢ A = ¡ K ± K¡1 ± ²¤ + 4K ± A ± ²¤ + K ± A ± ²¤ + K ± ²¤ ± ²¤ ± ²¤ ; 1 4 ::: :;:: ::: ::: 1¡ ¢ A = ¡ K ± K¡1 ± ²¤ ¡ K ± A ± ²¤ ¡ 2K ± ²¤ ± ²¤ ± ²¤ ; 2 4 ::: ::: :;:: " # " # ¡1 ¡1 1 ¤ ¤ ¤ ¤ K11 K12 K11 0 A3 = ¡ K::: ± ² ± ² ± ² ; ² = ²; A = : 12 ¡Ip¡q 0 0 ¡(p¡q)=2 > When n ! 1, M(t) ! (1¡2t) expf2t¸=(1¡2t)g, where ¸ = ² K22:1²=2, and hence the limiting distribution of ST is a non-central chi-square distribution with p ¡ q degrees of freedom and non-centrality parameter ¸. Under H0, i.e. when ² = 0, M(t) = ¡(p¡q)=2 ¡1 (1 ¡ 2t) + O(n ) and, as expected, ST has a central chi-square distribution with p ¡ q degrees of freedom up to an error of order n¡1. Also, from M(t) we may obtain the ¡1=2 first three moments of ST up to order n as 2A 8(A + A ) ¹0 (S ) = p ¡ q + ¸ + p 1 ; ¹ (S ) = 2(p ¡ q + 2¸) + 1p 2 1 T n 2 T n and 6(A + 2A + A ) ¹ (S ) = 8(p ¡ q + 3¸) + 1 p 2 3 : 3 T n 3 Main result The moment generating function of ST in a neighborhood of θ2 = θ20 can be written, after some algebra, as µ ¶ ¡ 1 (p¡q) t > y M(t) = (1 ¡ 2t) 2 exp ² K ² 1 ¡ 2t 22:1 · ¸ 1 X3 £ 1 + p a (1 ¡ 2t)¡k + O(n¡1); n k k=0 4 where 1© a = Ky ± (K¡1)y ± (²¤)y ¡ (4K + 3K )y ± Ay ± (²¤)y 1 4 ::: :;:: ::: y ¤ y ¤ y ¤ y ¡ 2(K::: + 2K:;::) ± (² ) ± (² ) ± (² ) y ¤ y ¤ yª ¡ 2(K2:: + K2;::) ± ² ± (² ) ± (² ) ; 1© (1) a = ¡ Ky ± (K¡1 ¡ A)y ± (²¤)y 2 4 ::: y ¤ y ¤ y ¤ yª ¡ (K::: + 2K:;::) ± (² ) ± (² ) ± (² ) ; 1 a = ¡ Ky ± (²¤)y ± (²¤)y ± (²¤)y; 3 12 ::: > > > and a0 = ¡(a1+a2+a3). The symbol “y” denotes evaluation at θ = (θ1 ; θ20) . Inverting M(t), we arrive at the following theorem, our main result. Theorem 1 The asymptotic expansion of the distribution of the gradient statistic for test- ing a composite hypothesis under a sequence of local alternatives converging to the null hypothesis at rate n¡1=2 is 1 X3 Pr(S · x) = G (x) + p a G (x) + O(n¡1); (2) T f;¸ n k f+2k;¸ k=0 where Gm;¸(x) is the cumulative distribution function of a non-central chi-square variate with m degrees of freedom and non-centrality parameter ¸. Here, f = p ¡ q, ¸ = > y ² K22:1²=2 and the ak’s are given in (1). If q = 0, the null hypothesis is simple, ²¤ = ¡² and A = 0. Therefore, an immediate consequence of Theorem 1 is the following corollary. Corollary 1 The asymptotic expansion of the distribution of the gradient statistic for test- ing a simple hypothesis under a sequence of local alternatives converging to the null hypothesis at rate n¡1=2 is given by (2) with f = p, ¸ = ²>Ky²=2 and 1 1© ª a = Ky ± ² ± ² ± ²; a = ¡ Ky ± (K¡1)y ± ² ¡ 2Ky ± ² ± ² ± ² ; 0 6 ::: 1 4 ::: :;:: 1© ª 1 a = Ky ± (K¡1)y ± ² ¡ (K + 2K )y ± ² ± ² ± ² ; a = Ky ± ² ± ² ± ²: 2 4 ::: ::: :;:: 3 12 ::: 4 Power comparisons between the rival tests To first order ST , LR, W and SR have the same asymptotic distributional properties under either the null or local alternative hypotheses.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us