On the Power of the F-Test for Linear Restrictions in a Regression Model

On the Power of the F-test for Linear Restrictions in a Regression Model

Willim E. Griffiths University of Melbourne

R. Carter Hill Louisiana State University

8 May 2020

Abstract

Consider testing a joint null hypothesis on regression coefficients where some of the restrictions in the null hypothesis are true and some are false. We show that an F-test that includes the valid restrictions frequently has greater power than the test that excludes the valid restrictions, a result that may be counterintuitive.

Keywords: noncentrality parameter; joint hypotheses; surplus restrictions

JEL codes: C12, C20

Corresponding author William Griffiths Department of Economics University of Melbourne Vic 3010 Australia [email protected]

1. Introduction

Consider the linear regression model E(|yX ) X with covariance matrix 2 I . We assume X is of dimension (N  K ) with rank K , that NK , and that y is normally distributed, conditional on X.

We are concerned with the power of the F-test for a set of linear restrictions HR0 :  r which can be written as

R11 r   R22 r

where Rr11 1 and Rr22 2. That is, the first set of restrictions is false and the second set is true.

We assume that R1 is of dimension J1  K , R2 is J2  K , and J12JK.

We ask the question: which test has the greater power, the F-test for HR0 :  r, or the F-test for

HR01:  r 1? Since the latter test omits the superfluous valid restrictions Rr22 2, intuition suggests it should have the greater power. In fact, Ruud (2000, p.235) “proves” that the noncentrality

parameters for the two tests are equal, and argues that the power for HR01:  r 1 will be greater because of the reduction in the numerator degrees of freedom. There is an error in Ruud’s proof, however. We show that, except under special circumstances, the noncentrality parameter for testing

HR01:  r 1 is less than the non-centrality parameter for testing HR0 :  r, and hence, whether

testing HR01:  r 1 is more powerful will depend on the competing effects of changes in the noncentrality parameter, the degrees of freedom in the numerator, and the critical value of the test.

We then go on to ask: what happens if the model is re-estimated subject to the restrictions

Rr22 2 which are known to be true, and the F-test for HR01:  r 1 is applied to this restricted model? We show that the noncentrality parameter for this F-test is identical to that for testing

HR0 :  r in the unrestricted model. This result brings the earlier result more in line with our intuition. While it might seem counterintuitive for the noncentrality parameter to be larger when valid

restrictions are included in the null hypothesis HR0 :  r, our intuition does suggest that it should be

larger when testing HR01:  r 1 in a correctly restricted model. We conclude by exploring the 3

conditions under which the noncentrality parameter for testing HR01:  r 1 from the restricted model

is equal to the noncentrality parameter for testing HR01:  r 1 from the unrestricted model.

To summarise, we are interested in comparing the noncentrality parameters for the F-test under the following three scenarios.

1. Testing HR0 :  r when Rr11 1 and Rr22 2.

2. Testing HR01:  r 1 when Rr11 1 and Rr22 2.

3. Testing HR01:  r 1 when Rr11 1 and the valid restrictions Rr22 2 have been imposed on

the model.

Let the three noncentrality parameters for these cases be 12, and 3 , respectively. We show that

132, and we specify the conditions under which 132.

The F-statistic for the complete set of restrictions is HR0 :  r is

 1 1 RˆˆrRXXRRr ˆ 2   F   JJ12

1  where ˆ X XXy is the least squares estimator and ˆ 2 yXˆˆ yX () NK  is the variance estimator. The noncentrality parameter is

1 R rRXXRRr  2 1  

In Section 2 we illustrate our results with some simple examples. Section 3 is devoted to proofs of the general results. Section 4 contains some numerical examples of the power differences. A concluding remark is provided in Section 5.

2. Simple Examples

Consider a model with two explanatory variables x1 and x2 which are expressed in terms of

N N deviations from their means such that i1 x1i  0 , i1 x2i  0 and E yXiii|  x11 x 2 2.

Example 1

For the first example we compare noncentrality parameters for null hypotheses H01:0 2

and H01:0 when 1 0 and 2 0 . For testing H01:0 2 , we have 4

10  0 1 1 X X 2  RrRXXR   2 01  0  and

N 1  22x  1 11i1 i 1122 0 XX  0

For testing H01:0, we have Rr1110, 0, and

1 1 1 1 2  212 10XX   0

1 N 1 x2 2 i1 2i 2 1 2  NNxx22 N xx ii1112ii  i  1 12 ii

22N 11i1 x i 2 2 1 r 

2 where rxxxx222 NNN is the squared sample correlation between x and x . iii111iii121  1 2 1 2

Thus, we have 12, with equality holding when x1 and x2 are uncorrelated, a result that

contradicts Ruud’s general result that 12.

For the third scenario, we impose the restriction 2 0 , leading to the model E yXii|  x11.

Then, for testing H01:0 , we see immediately that

22N x 11 i1 i  312

While it may seem strange that 12, it is clear that we would expect 32; otherwise, we could

increase the noncentrality parameter for testing H01:0 by simply including irrelevant explanatory variables in the model.

Example 2

For a second example, consider the hypothesis H01:0,  1 2 where 1 0 and 12. In this case

R1 10  0 Rr   R2 11  0 5

The noncentrality parameter for testing the joint hypothesis H01:0,  1 2 is

1 1 101  11   1 112   0  XX    11  01  0

2 1 NNxx22 N xx NN2 ii1112ii  i  1 12 ii 10xxx212iii  11  0 ii11 1 2 1 NN    11 xx x2  01  0 ii1112ii 1 i

2 1 NNxx22 N xx NNN22 ii1112iii1 12ii xxxx2212iiii    0 iii1111 2 1 NN NNN   xxxxxxx2222 0 ii112121212iiiiiii  iii  111

2 1 NNxx22 N xx NN2 ii1112ii  i  1 12 ii xzx22iii 0 ii11 1 2 1 NN   zx z2  0 ii11ii2 i

22 where zxx. It can be shown that NNx22xxxzxzx N NN 22  N , iii12 ii1112i i  i  1 12 ii  ii  11 i 2 i  i  1 ii 2 and hence that

22N z  1 i1 i 1 2

Omitting 12 from the null hypothesis and testing only H01:0 yields the same noncentrality parameter as in Example 1, namely

22N 11i1 x i 2 2 2 1 r 

To test H01:0 on the model with the restriction 12 imposed, we note that the restricted model is

E  yXiiii|   x121 x z 1 and the noncentrality parameter is

22N z 1  i1 i  312

After some algebra, we can show that

2222NN 111ii11zxii2 12 22 1 r 

2  NN2 1 xxx2 0 22N ii11212iii  i1 x2i 6

222N Thus, 132. Moreover, when x1 and x2 are uncorrelated, 121i1 x 2i  0 ; in contrast to Example 1, the strict inequality always holds.

3. General Case

To explore the general case of testing HR0 :  r when Rr can be partitioned as

Rr11   Rr22 0 with 0, we adopt the notation used by Ruud (2000, p.233-235) and set AARXXR2 1 , with

A being lower triangular, partitioned conformably with R,

A11 0 A   AA21 22 and with the inverse of A given by

B11 0 B   B21B 22

It follows that

11 1 R XX R R  XX  R 22RXX R 1112 R XX11 R R  XX  R 2122 (1)

AA11 11 AA 11 21   AA21 11 AA 21 21 AA 21 21 and

1 211R XX11 R AA  A  A BB    (2) B11BBBBB 11 21 21 21 22   B22BBB 21 22 22

The noncentrality parameter can be written as

1 1  2   1 0 RXX R  0

BB11 11 BB 21 21 BB 21 22  0  BB22 21 BB 22 22 0

BB11 11  BB 21 21  7

1 Now, from (1), BB AA112 R XX  R , and thus, 11 11 11 11 1  1

1 2 R XX1 R  B B  1112121

Discarding the surplus restrictions Rr22 from the null hypothesis and testing HR01:  r 1, we get the noncentrality parameter

1 2 RXXR1  21 1

Since BB21 21 is nonnegative, we have 12; discarding the surplus restrictions decreases the noncentrality parameter and hence may decrease the power of the test if it is not compensated by a reduction in the degrees of freedom. We provide some insights into the conditions under which

BB21 21  0 and hence 12 after showing that 31.

Estimating  subject to the valid restrictions Rr22 yields the restricted estimator

1 ˆˆ X XRRXXR11  R  ˆ r R 22  2 2 2 with covariance matrix (see, for example, Judge et al. 1988, p.237-239)

1 Vˆ | X22 XX11 XX R R XX  1 R R XX  1 R  22  2 2 

ˆ The non-centrality parameter for testing HR01:  r 1 using R is

1 1 22RXXR 1111  RXXRRXXRRXXR     31 1122221     1 AA  AA  AA   AA 1 AA   11 11 11 21 21 21 22 22 21 11

1 The term AA AA AA AA1 AA  can be recognised as the top-left block in the 11 11 11 21 21 21 22 22 21 11

partitioned inverse of AA . The inverse of AA is BB whose top-left block is BB11 11 BB 21 21 . Hence,

3111121211BB  BB   

To examine the conditions under which 123, suppose that the model can be rewritten as

Ey(| X ) X11 X 2 2 (3) 8

where X1 is N  K1 with J11 K , X 2 is  N  K2  with J22 K , and J12JKKK 1 2 . Also suppose that the restrictions Rr are of the form

R11111 Rr0  R     (4) R22222 0 Rr 

with dimensions conformable to the partition X   XX12 . There is no overlap between the

coefficients in the invalid restrictions Rr11 1 1 and those in the valid restrictions Rr22 2 2 . We

encountered this situation in Example 1 where we found that 123 if x1 and x2 were

uncorrelated. Thus, we suspect the requirement for 123 in the more general set up in (3) and

(4) would be that X1 and X 2 are orthogonal.

Noting that

1 1 BB BB22 RXXR 1111  RXXRRXXR    RXXR   11 11 21 21 1 1 1 2 2 2 2 1

1 1 it follows that BB BB BB 2 R XX  R if 11 11 21 21 11 11 1 1

1 2 RXXRRXXRRXXR11    1 0 122221  

Partitioning X X 1 in line with (3),

1 QQ11 12 XX   QQ21 22

1 1 we find that R2XXRRQR 2 22 22 22, R1XXRRQR 2 11 12 22, and

1 2 R XX11 R R XX  R R XX   1 R 2 R Q R  R Q R   1 R Q R  1 2 2  2 2  1 11 12 22  22 22 22  22 21 11

This quantity will be zero if Q12  0 which in turn will be true if X1 and X 2 are orthogonal. Thus, the three noncentrality parameters are equal if the valid coefficient restrictions are separate from the invalid ones, and the explanatory variables associated with the valid and invalid restrictions are

orthogonal. As we saw in Example 2, it is not sufficient for X1 and X 2 to be orthogonal; we also require the two sets of restrictions to be separable. 9

4. Numerical Examples

Example 1

To illustrate how the power of a test that includes a valid hypothesis can be greater than the test

that excludes it, consider the regression model yxeiiii12  ,1,,40   . Let xi 10, i 1, , 20

40 40 2 and xii 20, 21, ,40 . We have N  40 ,  i1 xi  600 and i1 xi 10,000 . Let the true

2 parameter values be 1 100 , 2 10 and 2500 . First consider testing the joint hypothesis

H 01:100,9  2 against the alternative H11: 100 and/or  2 9 . At the 5% level of

significance we reject the joint null hypothesis if the F-test statistic is greater than the critical value

F0.95,2,38  3.24482 . The noncentrality parameter is

XX 1 100 11100  2 9 2   2 9 1 40 600 0  01 2500 600 10,000 1  4

The probability of rejecting the joint null hypothesis is the probability that a value from a non-central

F-distribution with noncentrality parameter 4 will exceed F0.95,2,38  3.24482 . This value for the

PrF 3.24482 0.38738 test power is  2,38, 4  .

Now we omit the true hypothesis 1 100 , and test H 02:9 against H12:9 using an

F-test. The test critical value is the 95th percentile of the F-distribution, F0.95,1,38  4.09817 . The noncentrality parameter of the F-distribution for this single hypothesis is

1 212  0 22901  XX  1

1 1 1 40 600 0  01 2500 600 10,000 1   0.4

The probability of rejecting the null hypothesis H 02:9 versus H12:9 when 2 10 is

PrF 4.09817 0.09457  1,38, 0.4  . Removing the true hypothesis has reduced the power of the test. 10

Example 2

All the above results have been conditional on X. To investigate the effect of relaxing this

assumption, we again use the model in Example 1, but generate 10,000 samples of x from a

N 15,1.62  distribution, and, correspondingly, 10,000 samples of (yx | ) from Nx100 10 , 502  .

This choice of x gives values from 10.2 to 19.8 with probability over 0.99, mimicking the range of x-

values in Example 1. Using the F-test, the percentage of rejections for the single hypothesis was

0.0569, and for the joint hypothesis, 0.3545. Using the chi-square test, with chi-square value equal to

the F-value multiplied by the numerator degrees of freedom, the single hypothesis rejection

percentage was 0.0640, and the joint test rejection percentage was 0.3911. Removing the true

hypothesis reduced the simulated power of the test.

5. Concluding Remark

We have shown that, except under special circumstances, including valid superfluous restrictions

in an F-test for linear restrictions in a regression model increases the non-centrality parameter of the

F-statistic, and hence can also improve its power. This apparently counterintuitive result makes sense

when we observe that the noncentrality parameter from testing both sets of restrictions jointly is identical to that obtained when the F-test for the invalid restrictions is applied to a model that has been restricted in line with the valid restrictions.

References

Judge, G.G., R.C. Hill, W.E. Griffiths, H. Lütkepohl and T.-C. Lee (1988), Introduction to the Theory

and Practice of Econometrics, second edition, New York: John Wiley and Sons.

Ruud, P. A. (2000), An Introduction to Classical Econometric Theory, Oxford: Oxford University

Press.