<<

1 Hypothesis testing

A statistical test is a method of making a decision about one hypothesis (the null hypothesis) in comparison with another one (the alternative) using a sample of observations of known size. A statistical test is not a proof per se. Accepting the null hypothesis (H0) doesn’t it is true, but just that the available observations are not incompatible with this hypothesis, and that there is not enough evidence to favour the over the null hypothesis.

There are 3 steps:

1. First specify a Null Hypothesis, usually denoted H0, which describes a model of interest. Usually, we express H0 as a restricted version of a more general model. 2. Then, construct a test , which is a (because it is a function of other random variables) with two features:

(a) it has a known distribution under the Null Hypothesis (usually, nor- mal or chi-square, t or F). Its distribution is known either because we assume enough about the distribution of the model disturbances to get small-sample distributions, or we assume enough to get asymp- totic distributions. (b) this known distribution may depend on , but not on parameters (this is called pivotality: a test statistic is pivotal if it satisfies this condition).

3. Check whether or not the sample value of the test statistic is very far out in its distribution.

When we perform a test, we may end up rejecting the null hypothesis even though it is true. In this case we are committing the so-called Type I error. The probability of type I error is the significance level (or size) of the test. It is also possible that we fail to reject the null hypothesis even though it is false. In this case we are committing the so-called Type II error. The probability of type II error is the power of the test.

2 Testing under Normality Assumption 2.1 Properties of OLS Suppose

Y = Xβ + ε, 2 ε ∼ N(0N , σ IN ),

1 and X is full rank with rank K. Then,

0 −1 0 2 0 −1 (i) βb = (X X) X Y ∼ N β, σ (X X) e0e (ii) ∼ χ2(N − K) σ2 e0e (iii) s2 = is an unbiased of σ2 and is independent of βb, N − K where e = Y − βXb .

Proof. (i) The fact that the disturbances are independent mean-zero normals, 2 0 0 2 ε ∼ N(0, σ IN ), implies E [X ε] = 0K and E [εε ] = σ IN ,so the OLS estimator is still BLUE: h i E βb = β,

V (βb) = σ2(X0X)−1.

Write out βb as a function of ε as follows: βb = (X0X)−1X0Y = β + (X0X)−1X0ε is a linear combination of a normally distributed vector. Since, for any vector x ∼ N(µ, Σ), (a + Ax) ∼ N(a + µ, AΣA0) (See Kennedy’s All About Appendix), we have

0 −1 0 2 0 −1 βb ∼ N β + 0K , (X X) X σ IN X(X X) , or 2 0 −1 βb ∼ N β, σ (X X) . (ii) The residual vector can be written as

e = Y − Xβb

= Y − PX Y

= (IN − PX )Y

= MX Y

= MX Xβ + MX ε

= MX ε

where MX is the ”residual projection matrix” that creates the residuals from a regression of something on X:

0 −1 0 MX = I − X(X X) X = I − PX

0 The matrix MX is a projection matrix that is, MX is symmetric (i.e. MX = 2 MX ), idempotent (i.e. MX = MX ), and rank(MX )=rank(I)-rank(PX )=N − tr(PX ) = N − K.

2 We can then write

0 0 0 e e ε 0 ε ε ε = M M = M σ2 σ X X σ σ X σ ε e0e ε0 ε Since ∼ N(0,I), it follows that = M ∼ χ2(rank(M )) = σ σ2 σ X σ X χ2(N − K) e0e  e0e  (iii) From (ii) we have E = N −K, so that E s2 = E = σ2. σ2 N − K To proof that s2 and βb are independent, it is sufficient to show that the normal random variables e and βb are uncorrelated.

0 −1 0 0 −1 2 0 −1 cov(e, βb) = cov(MX ε, (X X) X y) = MX cov(ε, y)X(X X) = σ MX X(X X) = 0 because MX X = 0

2.2 Test of equalities We consider 3 types of tests of equalities: single linear, multiple linear, and general nonlinear. Tests of equalities are fully specified when you specify the Null hypothesis: the Null is either true or not true, and you don’t care how exactly it isn’t true, just that it isn’t true.

2.2.1 Single linear tests: z-test and t-test A single linear test could be written as

Rβ − r = 0, where R is 1 × K and r is a scalar. The discrepancy vector, db, is the sample value of the null hypothesis evaluated at the sample estimate of β, βb. Using the terminology above, for a linear hypothesis,

db= Rβb − r,

Even if the hypothesis is true, we would not expect db to be exactly zero, because βb is not exactly equal to β. However, if the hypothesis is true, we would expect dbto be close to 0. In contrast if the hypothesis is false, we’d have no real prior about where we’d see db.

1. The z-test” is performed by the ”z-statistic” defined by

1 0 −1 0−1/2 Tz = R(X X) R db∼ N(0, 1), σ It is called a ”z-test” because it follows a standard and standard normal variables are often denoted ”z”.

3 2. The ”t-test” is performed by using s in place of σ in the above formula 2 σ when σ is unknown. This corresponds to taking s times Tz. σ σ T s = T ∼ N(0, 1). z s z s σ So, if we need to figure out the distribution of s N(0, 1). Recall that: s2 e0e (N − K) = ∼ χ2 , σ2 σ2 N−K then r s s s2 χ2 = = N−K , σ σ2 N − K the square-root of a chi-square divided by its own degrees of freedom. Returning to the expression, we have σ σ T s = T ∼ N(0, 1) z s z s N(0, 1) ∼ . q 2 χN−K N−K

The distribution of a normal divided by a square root of a chi-square divided by its own degrees of freedom is called a ”Student’s t” distribution, denoted tN−K , where N − K is the number of degrees of freedom in the denominator. The test statistic is then called a ”t Test Statistic”

1 −1/2 N(0, 1) T = R(X0X)−1R0 d ∼ = t , t b q 2 N−K s χN−K N−K

where ”tN−K ” ”t distribution with N − K degrees of freedom” which means ”a standard normal divided by the square root of a chi- square divided by its own degrees of freedom”. 3. The z-test and the t-test are related in practise. The z-test requires knowledge of σ2, whereas the t-test uses an estimate of it. However, when the sample is very large, the estimate of σ2 is very close to its true value, so the estimate is ’almost’ the same as prior knowledge. This means that the z-test and t-test have nearly the same distribution when the sample is large. 4. Examples:

(a) An exclusion restriction, e.g., that the second variable does not be- long in the model would have R =  0 1 0 ... 0  , r = 0.

4 (b) A symmetry restriction, e.g., that the second and third variables had identical effects, would have

R =  0 −1 1 0 ... 0  , r = 0.

(c) A value restriction, e.g, that the second variable’s coefficient is 1, would have

R =  0 1 0 ... 0  , r = 1.

2.2.2 Multiple Linear Tests: the finite-sample F-test A multiple linear test could be written as

Rβ − r = 0, where R is a J × K matrix of rank J and r is a J−vector.

1 0 −1 0−1/2 1. Since Tz = R(X X) R db∼ N(0,IJ ), we have σ

1 0 0 −1 0−1 0 2 db R(X X) R db= T Tz ∼ χ . σ2 z J

2 This provides a test statistic distributed as a χJ , called a . This test requires knowledge of σ2. 2. If we substitute in s2 for σ2, then we have

2 2 1 0 0 −1 0−1 σ 2 χJ 2 db R(X X) R db= 2 χJ = 2 s s χN−K /N − K

by the same reasoning as for the Single Linear t-test. If we divide the numerator by J, we get a ratio of chi-squareds divided by their own degrees of freedom which follows the so-called F distribution, with degrees of freedom given by its numerator and denominator degrees of freedom. This test uses the estimate s2. The resulting test statistic is then called a ” F-test statistic”

2 1 0 0 −1 0−1 χJ /J F = 2 db R(X X) R db= 2 ∼ FJ,N−K . Js χN−K /N − K

3. Examples:

5 (a) A set of exclusion restrictions, e.g., that the second and third vari- ables do not belong in the model, would have

 0 1 0 ... 0  R = , 0 0 1 ... 0  0  r = . 0

(b) A set of symmetry restrictions, that the first, second and third vari- ables all have the same coefficients, would have

 1 −1 0 ... 0  R = , 0 1 −1 ... 0  0  r = . 0

(c) Given that we write the restriction as Rβ − r = 0 for both single and multiple linear hypotheses, you can think of the single hypothesis as a case of the multiple hypothesis.

3 Testing without the Normality Assumption

Without the Normality assumption, we can still get the ”approximate” distri- butions of test statistics in large samples. In this case, the law of large numbers, the and the Slutsky’s lemma can be invoked.

3.1 Properties of OLS estimators Suppose

Y = Xβ + ε, 0 E [X ε] = 0K , 0 2 E [εε ] = σ IN ,

Let βb = (X0X)−1X0Y and e = Y − βXb be the OLS estimator and residual. Then, the following properties follow from the law of large numbers, the central limit theorem and the Slutsky’s lemma:

approx 2 0 −1 (i) βb =∼N→∞ N β, σ (X X) e0e (ii) s2 = is a of σ2. N − K √   X0X −1 X0ε Proof. (i) βb = β + (X0X)−1X0ε, so N βb − β = √ . N N

6 0 √   X ε 1 PN approx 2 0  By the central limit theorem, √ = N Xiεi ∼ N 0, σ (X X) , N N i=1 N→∞ th where Xi is the i row of X. (ii)

e0e ε0M ε N ε0ε ε0X(X0X)−1X0ε s2 = = X = − (1) N − K N − K N − K N N ! N ε0ε ε0X X0X −1 X0ε = − (2) N − K N N N N (3)

ε0ε 1 X0ε By the law of large numbers, = PN ε2 →P E ε2 = σ2, and = N N i=1 i N 1 PN X ε →P E [X0ε] = 0 , as N → ∞. N i=1 i i K 3.2 Wald Tests 3.2.1 Linear hypothesis Consider a multiple linear hypothesis with a and possibly non- normal (but finite ) ε:

Y = Xβ + ε, 0 E [X ε] = 0K , 0 2 E [εε ] = σ IN ,

H0 : Rβ − r = 0.

Here, ε may be nonnormal (for example uniform) as long as ε is finite-variance. Since βb ∼approx N(β, σ2(X0X)−1) the Wald vector defined by 1 0 −1 0−1/2 TW v = σ R(X X) R db is asympotically approximately a vector of standard normals N(0J ,IJ ). Hence its inner product, called a Wald Test, is asymptotically approxi- mately a chi-square:

0 1 0 0 −1 0−1 approx 2 TW = T TW v = db R(X X) R db∼ χ . W v σ2 N→∞ J Moreover, because s2 is asymptotically equal to σ2 the approximation is not affected by replacing σ2 with s2 when σ2 is unknown. Thus, we have that

s s0 s 1 0 0 −1 0−1 approx 2 T = T T = db R(X X) R db∼ χ . W W v W v s2 N→∞ J The Wald Statistic approximately follows the chi-square distribution as the sample size gets really large, even if one uses s2.

7 3.2.2 Nonlinear hypothesis A multiple nonlinear test could be written as

c(β) = 0, where c is a J−vector function of β. Consider the model in which we have a set of J nonlinear restrictions c(β) = 0 that we wish to test:

Y = Xβ + ε, 0 E [X ε] = 0K , 0 2 E [εε ] = σ IN ,

H0 : c(β) = 0.

The discrepancy vector db gives the distance between the sample value of the hypothesis and its hypothesized value of 0:   db= c βb .

Since we have not assumed normality of the ε, all we have for the finite-sample distribution of βb is a mean and variance: h i E βb = β, h i V βb = σ2(X0X)−1.

Application of the delta-method allows us to calcuate an approximate asymp-   totic distribution of c βb :By first-order Taylor approximation, we have

      c βb ≈ c(β) + ∇β0 c (β) βb − β = ∇β0 c (β) βb − β

where ∇β0 c (β) is the matrix of derivatives of the vector-function c(β) with 0 respect to the row-vector β (each row of ∇β0 c (β) gives the derivatives of an element of c(β) with respect to β). approx 2 0 −1 Since βb − β ∼N→∞ N(0, σ (X X) ), then

  approx 2 0 −1 0 c βb ∼N→∞ N 0J , σ ∇β0 c (β)(X X) (∇β0 c (β)) .   Since βb goes to β asymptotically, we can replace ∇β0 c (β) with ∇β0 c βb :

 0   approx 2   0 −1    c βb ˜N→∞ N 0J , σ ∇β0 c βb (X X) ∇β0 c βb .

8 Now, we use this information to create the Wald Statistic. Premultiplying the sample value of the hypothesis by the minus-one-half matrix of its variance gives the Wald Vector distributed as a vector of standard normals:

 0−1/2 1   0 −1      approx TW v = ∇β0 c βb (X X) ∇β0 c βb c βb ˜ N (0J ,IJ ) . σ N→∞

Finally, we take the inner product of this to create the Wald Statistic

0  0−1 1     0 −1      approx 2 TW = c βb ∇β0 c βb (X X) ∇β0 c βb c βb ˜ χ . σ2 N→∞ J

Since this is an approximate asymptotic result, it also works with s instead of σ:

0  0−1 1     0 −1      approx 2 TW = c βb ∇β0 c βb (X X) ∇β0 c βb c βb ˜ χ . s2 N→∞ J

4 Testing the exogeneity of regressors: the Haus- man test

Suppose we have the model

Y = Xβ + ε, 0 2 E [εε ] = σ IN ,

This model can be estimated by Instrumental variables using 2SLS method. However, if the regressors X are exogeneous, then 2SLS estimator is less effi- cient (i.e. larger variance). Therefore, it is important to test for exogeneity first, in order to avoid using an IV estimator that is: (i) More computationally intensive (two stages is more difficult than one) and (ii) less efficient.

An exogeneity test for the regresssors X could be written as

0 H0 : E [X ε] = 0K , Instrumental variable estimation requires to find an instrument Z such that

E (Z0X) 6= 0, and E (Z0ε) = 0 where rank(Z)=J > K. Let βbOLS be the ordinary estimator and βb2SLS be two-stage least squares instrumental variable estimation of β:

0 −1 0 0 −1 0 βbOLS = (X X) X Y, βb2SLS = (X PZ X) X PZ Y

9 0 −1 0 where PZ = Z(Z Z) Z

If the regressors X are endogeneous, then the OLS estimates should differ from the endogeneity-corrected 2SLS estimates (as long as the instruments are exogenous). An exogeneity test can therefore be based on the difference between βb2SLS and βbOLS. The test of this hypothesis is called a Hausman Test. By the central limit theorem, we have, under H0,

approx 2 0 −1 approx 2 0 −1 βbOLS −β ∼N→∞ N 0, σ (X X) , βb2SLS −β ∼N→∞ N 0, σ (X PZ X ) Hence,

approx 2  0 −1 0 −1 βb2SLS − βbOLS ∼N→∞ N 0, σ (X PZ X) − (X X) Now, we use this information to create a Wald Statistic. Premultiplying the difference of the two estimators by the minus-one-half matrix of its variance gives the Wald Vector distributed as a vector of standard normals:

1  0 −1 0 −1−1/2   approx Tw = (X PZ X) − (X X) βb2SLS − βbOLS ∼ N (0,IK ) σ N→∞ Hence its inner product is asymptotically approximately a chi-square:

0 0 1    0 −1 0 −1−1   approx 2 T Tw = βb2SLS − βbOLS (X PZ X) − (X X) βb2SLS − βbOLS ∼ χ (K) w σ2 N→∞ Since this is an approximate asymptotic result, it also works with s2 instead of σ2. So the Hausman-Wald test statistic for the exogeneity of regressors is

0 1    0 −1 0 −1−1   approx 2 Hw = βb2SLS − βbOLS (X PZ X) − (X X) βb2SLS − βbOLS ∼ χ (K) s2 N→∞

0 −1 0 −1 Note: we are assuming here that the matrix V = (X PZ X) − (X X) is positive definite. The more general approach is to take a generalized inverse (instead of the inverse) in the formula of Hw. In that case the statistic Hw has an asymptotic approximate chi-squared distribution with degree of freedom equal to the rank of the matrix V .

5 Testing the validity of instruments: Overiden- tification test

This is a test that will tell you if the instruments Z are uncorrelated with the error term ε, an essential condition for the validity of instrumental variables. When this condition is not satisfied, the IV estimator could be inconsistent.

10 The degree of overidenfication of an overidentified model is defined to be J − K, where J=rank(Z) is the number of instruments and K=rank(X) the number of regressors. The model we wish to test is

0 2 Y = Xβ + ε, E [εε ] = σ IN , 0 H0 : E [Z ε] = 0

Denote e = Y − Xβb2SLS the residual of the 2SLS estimation of the model.

The test statistic is based on the vector 1 ε P e = P − P X(X0P X)−1X0P  σ Z Z Z Z Z σ which is the projection of the residual vector e over the vector space of in- struments Z. Orthogonality between e and Z therefore implies that PZ e is equal to 0. An appropriate test statistic can therefore be based on the squared 0 eulidean distance between PZ e and 0, that is, (PZ e) PZ e.

0 −1 0 The rank of PZ is J and the rank of PZ X(X PZ X) X PZ is K, so the rank 0 −1 0 of the (idempotent, symmetric) matrix PZ − PZ X(X PZ X) X PZ is J − K. ε Since is of mean 0 and variance matrix I, σ 1 1 1 e0P e = ( P e)0( P e) σ2 Z σ Z σ Z is the sum of squares of J −K things that have mean 0 and variance 1. By the central limit theorem, each of these things is approximately asymptotically nor- 1 mal N(0, 1). Hence e0P e is approximately asymptotically chi-square with σ2 Z J − K degrees of freedom.

Since this is an approximate asymptotic result, it also works with s2 instead e0e of σ2, where σ2 = = is a consistent estimator of σ2. b N So the test statistic for the validity of instruments is given by

1 0 approx 2 Q = 2 e PZ e ∼N→∞ χ (J − K) σb The same test statistic can be obtained by considering the synthetic regres- sion e = Zγ + u 0 −1 0 The estimate of γ is γb = (Z Z) Z e and the predicted value of e from the 0 −1 0 synthetic regression is eb = Zγb = Z(Z Z) Z e. The sum of squares of this

11 value (the explained sum of squares) is by definition

0 0 0 −1 0 0 −1 0 ESS = eb eb = e Z(Z Z) Z Z(Z Z) Z e = e0Z(Z0Z)−1Z0e 0 = e PZ e, while the total sum of squares is TSS = e0e. By definition, the R2 of this syn- ESS e0P e thetic model is R2 = = Z . TSS e0e So 0 0 e PZ e e PZ e 2 Q = 2 = N 0 = NR σb e e where R2 is from the second stage regression of the predicted errors e on all exogeneous factors in the model.

6 Testing heteroskedasticity: the White test

The model we wish to test is

Y = Xβ + ε, 0 E [X ε] = 0K , 0 2 H0 : E [εε ] = σ IN ,

The alternative hypothesis of the White test is that the error variance is affected by any of the regressors and their squares (and possibly their cross product). It therefore tests whether or not any heteroskedasticity present causes the variance matrix of the OLS estimator to differ from its usual formula. The steps to build the test are as follows:

1. Run the OLS regression and take the vector of residuals e = Y − Xβb 2. Run the synthetic regression of the residuals over the regressors and their squares e = Xγ + X2θ + u where X2 is the N ×K − 1 matrix of the squares of the regressors (exclud- ing the column of constants). Now, take the coefficient of determination of this synthetic regression, R2. 3. The White test statistic is defined by

W = NR2

and is approximately asymptotically distributed as χ2(2K).

12