Inference in Normal Regression Model
Total Page:16
File Type:pdf, Size:1020Kb
Inference in Normal Regression Model Dr. Frank Wood Remember I We know that the point estimator of b1 is P(X − X¯ )(Y − Y¯ ) b = i i 1 P 2 (Xi − X¯ ) I Last class we derived the sampling distribution of b1, it being 2 N(β1; Var(b1))(when σ known) with σ2 Var(b ) = σ2fb g = 1 1 P 2 (Xi − X¯ ) I And we suggested that an estimate of Var(b1) could be arrived at by substituting the MSE for σ2 when σ2 is unknown. MSE SSE s2fb g = = n−2 1 P 2 P 2 (Xi − X¯ ) (Xi − X¯ ) Sampling Distribution of (b1 − β1)=sfb1g I Since b1 is normally distribute, (b1 − β1)/σfb1g is a standard normal variable N(0; 1) I We don't know Var(b1) so it must be estimated from data. 2 We have already denoted it's estimate s fb1g I Using this estimate we it can be shown that b − β 1 1 ∼ t(n − 2) sfb1g where q 2 sfb1g = s fb1g It is from this fact that our confidence intervals and tests will derive. Where does this come from? I We need to rely upon (but will not derive) the following theorem For the normal error regression model SSE P(Y − Y^ )2 = i i ∼ χ2(n − 2) σ2 σ2 and is independent of b0 and b1. I Here there are two linear constraints P ¯ ¯ ¯ (Xi − X )(Yi − Y ) X Xi − X b1 = = ki Yi ; ki = P(X − X¯ )2 P (X − X¯ )2 i i i i b0 = Y¯ − b1X¯ imposed by the regression parameter estimation that each reduce the number of degrees of freedom by one (total two). Reminder: normal (non-regression) estimation I Intuitively the regression result from the previous slide follows the standard result for the sum of squared standard normal random variables. First, with σ and µ known n n 2 X X Yi − µ Z 2 = ∼ χ2(n) i σ i=1 i=1 and then with µ unknown n 1 X S2 = (Y − Y¯ )2 n − 1 i i=1 2 n ¯ 2 (n − 1)S X Yi − Y = ∼ χ2(n − 1) σ2 σ i=1 and Y¯ and S2 are independent. 1 Reminder: normal (non-regression) estimation cont. I With both µ and σ unknown then p Y¯ − µ n ∼ t(n − 1) S because p Z n(Y¯ − µ)/σ p Y¯ − µ T = = = n pW /ν p[(n − 1)S2/σ2]=(n − 1) S Here the numerator follows from ¯ ¯ Y − µY¯ Y − µ Z = = q σY¯ σ2 n Another useful fact : Student-t distribution Let Z and χ2(ν) be independent random variables (standard normal and χ2 respectively). We then define a t random variable as follows: Z t(ν) = q χ2(ν) ν This version of the t distribution has one parameter, the degrees of freedom ν Distribution of the studentized statistic To derive the distribution of the statistic b1−β1 first we do the sfb1g following rewrite b −β b − β 1 1 1 1 = σfb1g sfb g sfb1g 1 σfb1g where s 2 sfb1g s fb1g = 2 σfb1g σ fb1g Studentized statistic cont. And note the following MSE 2 s fb g P ¯ 2 MSE SSE 1 = (Xi −X ) = = σ2fb g σ2 σ2 σ2(n − 2) 1 P 2 (Xi −X¯ ) where we know (by the given theorem) the distribution of the last 2 term is χ and indep. of b1 and b0 SSE χ2(n − 2) ∼ σ2(n − 2) n − 2 Studentized statistic final But by the given definition of the t distribution we have our result b − β 1 1 ∼ t(n − 2) sfb1g because putting everything together we can see that b1 − β1 z ∼ q sfb1g χ2(n−2) n−2 Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily Confidence Interval for β1 Since the \studentized" statistic follows a t distribution we can make the following probability statement b − β P(t(α=2; n − 2) ≤ 1 1 ≤ t(1 − α=2; n − 2)) = 1 − α sfb1g Remember dF (y) I Density: f (y) = dy R y I Distribution (CDF): F (y) = P(Y ≤ y) = −∞ f (t)dt −1 R y I Inverse CDF: F (p) = y s.t. −∞ f (t)dt = p Interval arriving from picking α I Note that by symmetry t(α=2; n − 2) = −t(1 − α=2; n − 2) I Rearranging terms and using this fact we have P(b1 − t(1 − α=2; n − 2)sfb1g ≤ β1 ≤ b1 + t(1 − α=2; n − 2)sfb1g) = 1 − α I And now we can use a table to look up and produce confidence intervals Using tables for Computing Intervals I The tables in the book (table B.2 in the appendix) for t(1 − α=2; ν) where Pft(ν) ≤ t(1 − α=2; ν)g = A I Provides the inverse CDF of the t-distribution I This can be arrived at computationally as well Matlab: tinv(1 − α=2; ν) 1 − α confidence limits for β1 I The 1 − α confidence limits for β1 are b1 ± t(1 − α=2; n − 2)sfb1g I Note that this quantity can be used to calculate confidence intervals given n and α. I Fixing α can guide the choice of sample size if a particular confidence interval is desired I Give a sample size, vice versa. I Also useful for hypothesis testing Tests Concerning β1 I Example 1 I Two-sided test I H0 : β1 = 0 I Ha : β1 6= 0 I Test statistic b − 0 t∗ = 1 sfb1g Tests Concerning β1 I We have an estimate of the sampling distribution of b1 from the data. I If the null hypothesis holds then the b1 estimate coming from the data should be within the 95% confidence interval of the sampling distribution centered at 0 (in this case) b − 0 t∗ = 1 sfb1g Decision rules ∗ if jt j ≤ t(1 − α=2; n − 2); conclude H0 ∗ if jt j > t(1 − α=2; n − 2); conclude Hα Absolute values make the test two-sided Intuition p-value is value of α that moves the green line to the blue line Calculating the p-value I The p-value, or attained significance level, is the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected. I This can be looked up using the CDF of the test statistic. I In Matlab Two-sided p-value 2 ∗ (1 − tcdf (jt∗j; ν)) Inferences Concerning β0 I Largely, inference procedures regarding β0 can be performed in the same way as those for β1 I Remember the point estimator b0 for β0 b0 = Y¯ − b1X¯ Sampling distribution of b0 I The sampling distribution of b0 refers to the different values of b0 that would be obtained with repeated sampling when the levels of the predictor variable X are held constant from sample to sample. I For the normal regression model the sampling distribution of b0 is normal Sampling distribution of b0 I When error variance is known E(b0) = β0 1 X¯ 2 σ2fb g = σ2( + ) 0 P 2 n (Xi − X¯ ) I When error variance is unknown 1 X¯ 2 s2fb g = MSE( + ) 0 P 2 n (Xi − X¯ ) Confidence interval for β0 The 1 − α confidence limits for β0 are obtained in the same manner as those for β1 b0 ± t(1 − α=2; n − 2)sfb0g Considerations on Inferences on β0 and β1 I Effects of departures from normality The estimators of β0 and β1 have the property of asymptotic normality - their distributions approach normality as the sample size increases (under general conditions) I Spacing of the X levels The variances of b0 and b1 (for a given n and σ2) depend strongly on the spacing of X Sampling distribution of point estimator of mean response I Let Xh be the level of X for which we would like an estimate of the mean response Needs to be one of the observed X's I The mean response when X = Xh is denoted by E(Yh) I The point estimator of E(Yh) is Y^h = b0 + b1Xh We are interested in the sampling distribution of this quantity Sampling Distribution of Y^h I We have Y^h = b0 + b1Xh 0 I Since this quantity is itself a linear combination of the Yi s it's sampling distribution is itself normal. I The mean of the sampling distribution is EfY^hg = Efb0g + Efb1gXh = β0 + β1Xh Biased or unbiased? Sampling Distribution of Y^h I To derive the sampling distribution variance of the mean P response we first show that b1 and (1=n) Yi are uncorrelated and, hence, for the normal error regression model independent I We start with the definitions X 1 Y¯ = ( )Y n i ¯ X (Xi − X ) b = k Y ; k = 1 i i i P 2 (Xi − X¯ ) Sampling Distribution of Y^h I We want to show that mean response and the estimate b1 are uncorrelated 2 Cov(Y¯ ; b1) = σ fY¯ ; b1g = 0 I To do this we need the following result (A.32) n n n 2 X X X 2 σ f ai Yi ; ci Yi g = ai ci σ fYi g i=1 i=1 i=1 when the Yi are independent Sampling Distribution of Y^h Using this fact we have n n n X 1 X X 1 σ2f Y ; k Y g = k σ2fY g n i i i n i i i=1 i=1 i=1 n X 1 = k σ2 n i i=1 n σ2 X = k n i i=1 = 0 So the Y¯ and b1 are uncorrelated Sampling Distribution of Y^h I This means that we can write down the variance 2 2 σ fY^hg = σ fY¯ + b1(Xh − X¯ )g alternative and equivalent form of regression function I But we know that the mean of Y and b1 are uncorrelated so 2 2 2 2 σ fY^hg = σ fY¯ g + σ fb1g(Xh − X¯ ) Sampling Distribution of Y^h I We know (from last lecture) σ2 σ2fb g = 1 P 2 (Xi − X¯ ) MSE s2fb g = 1 P 2 (Xi − X¯ ) I And we can find 1 X nσ2 σ2 σ2fY¯ g = σ2fY g = = n2 i n2 n Sampling Distribution of Y^h I So, plugging in, we get σ2 σ2 σ2fY^ g = + (X − X¯ )2 h P 2 h n (Xi − X¯ ) I Or 1 (X − X¯ )2 σ2fY^ g = σ2 + h h P 2 n (Xi − X¯ ) Sampling Distribution of Y^h Since we often won't know σ2 we can, as usual, plug in S2 = SSE=(n − 2), our estimate for it to get our estimate of this sampling distribution variance 1 (X − X¯ )2 s2fY^ g = S2 + h h P 2 n (Xi − X¯ ) No surprise::: I The sampling distribution of our point estimator for the output is distributed as a t-distribution with two degrees of freedom Y^ − EfY g h h ∼ t(n − 2) sfY^hg I This means that we can construct confidence intervals in the same manner as before.