1 Hypothesis Testing 2 Testing Under Normality Assumption
Total Page:16
File Type:pdf, Size:1020Kb
1 Hypothesis testing A statistical test is a method of making a decision about one hypothesis (the null hypothesis) in comparison with another one (the alternative) using a sample of observations of known size. A statistical test is not a proof per se. Accepting the null hypothesis (H0) doesn't mean it is true, but just that the available observations are not incompatible with this hypothesis, and that there is not enough evidence to favour the alternative hypothesis over the null hypothesis. There are 3 steps: 1. First specify a Null Hypothesis, usually denoted H0, which describes a model of interest. Usually, we express H0 as a restricted version of a more general model. 2. Then, construct a test statistic, which is a random variable (because it is a function of other random variables) with two features: (a) it has a known distribution under the Null Hypothesis (usually, nor- mal or chi-square, t or F). Its distribution is known either because we assume enough about the distribution of the model disturbances to get small-sample distributions, or we assume enough to get asymp- totic distributions. (b) this known distribution may depend on data, but not on parameters (this is called pivotality: a test statistic is pivotal if it satisfies this condition). 3. Check whether or not the sample value of the test statistic is very far out in its sampling distribution. When we perform a test, we may end up rejecting the null hypothesis even though it is true. In this case we are committing the so-called Type I error. The probability of type I error is the significance level (or size) of the test. It is also possible that we fail to reject the null hypothesis even though it is false. In this case we are committing the so-called Type II error. The probability of type II error is the power of the test. 2 Testing under Normality Assumption 2.1 Properties of OLS estimators Suppose Y = Xβ + "; 2 " ∼ N(0N ; σ IN ); 1 and X is full rank with rank K. Then, 0 −1 0 2 0 −1 (i) βb = (X X) X Y ∼ N β; σ (X X) e0e (ii) ∼ χ2(N − K) σ2 e0e (iii) s2 = is an unbiased estimator of σ2 and is independent of βb, N − K where e = Y − βXb . Proof. (i) The fact that the disturbances are independent mean-zero normals, 2 0 0 2 " ∼ N(0; σ IN ), implies E [X "] = 0K and E ["" ] = σ IN ,so the OLS estimator is still BLUE: h i E βb = β; V (βb) = σ2(X0X)−1: Write out βb as a function of " as follows: βb = (X0X)−1X0Y = β + (X0X)−1X0" is a linear combination of a normally distributed vector. Since, for any vector x ∼ N(µ, Σ), (a + Ax) ∼ N(a + µ, AΣA0) (See Kennedy's All About Variances Appendix), we have 0 −1 0 2 0 −1 βb ∼ N β + 0K ; (X X) X σ IN X(X X) ; or 2 0 −1 βb ∼ N β; σ (X X) : (ii) The residual vector can be written as e = Y − Xβb = Y − PX Y = (IN − PX )Y = MX Y = MX Xβ + MX " = MX " where MX is the "residual projection matrix" that creates the residuals from a regression of something on X: 0 −1 0 MX = I − X(X X) X = I − PX 0 The matrix MX is a projection matrix that is, MX is symmetric (i.e. MX = 2 MX ), idempotent (i.e. MX = MX ), and rank(MX )=rank(I)-rank(PX )=N − tr(PX ) = N − K. 2 We can then write 0 0 0 e e " 0 " " " = M M = M σ2 σ X X σ σ X σ " e0e "0 " Since ∼ N(0;I), it follows that = M ∼ χ2(rank(M )) = σ σ2 σ X σ X χ2(N − K) e0e e0e (iii) From (ii) we have E = N −K, so that E s2 = E = σ2. σ2 N − K To proof that s2 and βb are independent, it is sufficient to show that the normal random variables e and βb are uncorrelated. 0 −1 0 0 −1 2 0 −1 cov(e; βb) = cov(MX "; (X X) X y) = MX cov("; y)X(X X) = σ MX X(X X) = 0 because MX X = 0 2.2 Test of equalities We consider 3 types of tests of equalities: single linear, multiple linear, and general nonlinear. Tests of equalities are fully specified when you specify the Null hypothesis: the Null is either true or not true, and you don't care how exactly it isn't true, just that it isn't true. 2.2.1 Single linear tests: z-test and t-test A single linear test could be written as Rβ − r = 0; where R is 1 × K and r is a scalar. The discrepancy vector, db, is the sample value of the null hypothesis evaluated at the sample estimate of β, βb. Using the terminology above, for a linear hypothesis, db= Rβb − r; Even if the hypothesis is true, we would not expect db to be exactly zero, because βb is not exactly equal to β. However, if the hypothesis is true, we would expect dbto be close to 0. In contrast if the hypothesis is false, we'd have no real prior about where we'd see db. 1. The z-test" is performed by the "z-statistic" defined by 1 0 −1 0−1=2 Tz = R(X X) R db∼ N(0; 1); σ It is called a "z-test" because it follows a standard normal distribution and standard normal variables are often denoted "z". 3 2. The "t-test" is performed by using s in place of σ in the above formula 2 σ when σ is unknown. This corresponds to taking s times Tz. σ σ T s = T ∼ N(0; 1): z s z s σ So, if we need to figure out the distribution of s N(0; 1). Recall that: s2 e0e (N − K) = ∼ χ2 ; σ2 σ2 N−K then r s s s2 χ2 = = N−K ; σ σ2 N − K the square-root of a chi-square divided by its own degrees of freedom. Returning to the expression, we have σ σ T s = T ∼ N(0; 1) z s z s N(0; 1) ∼ : q 2 χN−K N−K The distribution of a normal divided by a square root of a chi-square divided by its own degrees of freedom is called a "Student's t" distribution, denoted tN−K , where N − K is the number of degrees of freedom in the denominator. The test statistic is then called a "t Test Statistic" 1 −1=2 N(0; 1) T = R(X0X)−1R0 d ∼ = t ; t b q 2 N−K s χN−K N−K where "tN−K " means "t distribution with N − K degrees of freedom" which means "a standard normal divided by the square root of a chi- square divided by its own degrees of freedom". 3. The z-test and the t-test are related in practise. The z-test requires knowledge of σ2, whereas the t-test uses an estimate of it. However, when the sample is very large, the estimate of σ2 is very close to its true value, so the estimate is 'almost' the same as prior knowledge. This means that the z-test and t-test statistics have nearly the same distribution when the sample is large. 4. Examples: (a) An exclusion restriction, e.g., that the second variable does not be- long in the model would have R = 0 1 0 ::: 0 ; r = 0: 4 (b) A symmetry restriction, e.g., that the second and third variables had identical effects, would have R = 0 −1 1 0 ::: 0 ; r = 0: (c) A value restriction, e.g, that the second variable's coefficient is 1, would have R = 0 1 0 ::: 0 ; r = 1: 2.2.2 Multiple Linear Tests: the finite-sample F-test A multiple linear test could be written as Rβ − r = 0; where R is a J × K matrix of rank J and r is a J−vector. 1 0 −1 0−1=2 1. Since Tz = R(X X) R db∼ N(0;IJ ), we have σ 1 0 0 −1 0−1 0 2 db R(X X) R db= T Tz ∼ χ : σ2 z J 2 This provides a test statistic distributed as a χJ , called a Wald Test. This test requires knowledge of σ2. 2. If we substitute in s2 for σ2, then we have 2 2 1 0 0 −1 0−1 σ 2 χJ 2 db R(X X) R db= 2 χJ = 2 s s χN−K =N − K by the same reasoning as for the Single Linear t-test. If we divide the numerator by J, we get a ratio of chi-squareds divided by their own degrees of freedom which follows the so-called F distribution, with degrees of freedom given by its numerator and denominator degrees of freedom. This test uses the estimate s2. The resulting test statistic is then called a " F-test statistic" 2 1 0 0 −1 0−1 χJ =J F = 2 db R(X X) R db= 2 ∼ FJ;N−K : Js χN−K =N − K 3. Examples: 5 (a) A set of exclusion restrictions, e.g., that the second and third vari- ables do not belong in the model, would have 0 1 0 ::: 0 R = ; 0 0 1 ::: 0 0 r = : 0 (b) A set of symmetry restrictions, that the first, second and third vari- ables all have the same coefficients, would have 1 −1 0 ::: 0 R = ; 0 1 −1 ::: 0 0 r = : 0 (c) Given that we write the restriction as Rβ − r = 0 for both single and multiple linear hypotheses, you can think of the single hypothesis as a case of the multiple hypothesis.