Inference in Normal Regression Model

Inference in Normal Regression Model Dr. Frank Wood Remember I We know that the point estimator of b1 is P(X − X¯ )(Y − Y¯ ) b = i i 1 P 2 (Xi − X¯ ) I Last class we derived the sampling distribution of b1, it being 2 N(β1; Var(b1))(when σ known) with σ2 Var(b ) = σ2fb g = 1 1 P 2 (Xi − X¯ ) I And we suggested that an estimate of Var(b1) could be arrived at by substituting the MSE for σ2 when σ2 is unknown. MSE SSE s2fb g = = n−2 1 P 2 P 2 (Xi − X¯ ) (Xi − X¯ ) Sampling Distribution of (b1 − β1)=sfb1g I Since b1 is normally distribute, (b1 − β1)/σfb1g is a standard normal variable N(0; 1) I We don't know Var(b1) so it must be estimated from data. 2 We have already denoted it's estimate s fb1g I Using this estimate we it can be shown that b − β 1 1 ∼ t(n − 2) sfb1g where q 2 sfb1g = s fb1g It is from this fact that our confidence intervals and tests will derive. Where does this come from? I We need to rely upon (but will not derive) the following theorem For the normal error regression model SSE P(Y − Y^ )2 = i i ∼ χ2(n − 2) σ2 σ2 and is independent of b0 and b1. I Here there are two linear constraints P ¯ ¯ ¯ (Xi − X )(Yi − Y ) X Xi − X b1 = = ki Yi ; ki = P(X − X¯ )2 P (X − X¯ )2 i i i i b0 = Y¯ − b1X¯ imposed by the regression parameter estimation that each reduce the number of degrees of freedom by one (total two). Reminder: normal (non-regression) estimation I Intuitively the regression result from the previous slide follows the standard result for the sum of squared standard normal random variables. First, with σ and µ known n n 2 X X Yi − µ Z 2 = ∼ χ2(n) i σ i=1 i=1 and then with µ unknown n 1 X S2 = (Y − Y¯ )2 n − 1 i i=1 2 n ¯ 2 (n − 1)S X Yi − Y = ∼ χ2(n − 1) σ2 σ i=1 and Y¯ and S2 are independent. 1 Reminder: normal (non-regression) estimation cont. I With both µ and σ unknown then p Y¯ − µ n ∼ t(n − 1) S because p Z n(Y¯ − µ)/σ p Y¯ − µ T = = = n pW /ν p[(n − 1)S2/σ2]=(n − 1) S Here the numerator follows from ¯ ¯ Y − µY¯ Y − µ Z = = q σY¯ σ2 n Another useful fact : Student-t distribution Let Z and χ2(ν) be independent random variables (standard normal and χ2 respectively). We then define a t random variable as follows: Z t(ν) = q χ2(ν) ν This version of the t distribution has one parameter, the degrees of freedom ν Distribution of the studentized statistic To derive the distribution of the statistic b1−β1 first we do the sfb1g following rewrite b −β b − β 1 1 1 1 = σfb1g sfb g sfb1g 1 σfb1g where s 2 sfb1g s fb1g = 2 σfb1g σ fb1g Studentized statistic cont. And note the following MSE 2 s fb g P ¯ 2 MSE SSE 1 = (Xi −X ) = = σ2fb g σ2 σ2 σ2(n − 2) 1 P 2 (Xi −X¯ ) where we know (by the given theorem) the distribution of the last 2 term is χ and indep. of b1 and b0 SSE χ2(n − 2) ∼ σ2(n − 2) n − 2 Studentized statistic final But by the given definition of the t distribution we have our result b − β 1 1 ∼ t(n − 2) sfb1g because putting everything together we can see that b1 − β1 z ∼ q sfb1g χ2(n−2) n−2 Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily Confidence Interval for β1 Since the \studentized" statistic follows a t distribution we can make the following probability statement b − β P(t(α=2; n − 2) ≤ 1 1 ≤ t(1 − α=2; n − 2)) = 1 − α sfb1g Remember dF (y) I Density: f (y) = dy R y I Distribution (CDF): F (y) = P(Y ≤ y) = −∞ f (t)dt −1 R y I Inverse CDF: F (p) = y s.t. −∞ f (t)dt = p Interval arriving from picking α I Note that by symmetry t(α=2; n − 2) = −t(1 − α=2; n − 2) I Rearranging terms and using this fact we have P(b1 − t(1 − α=2; n − 2)sfb1g ≤ β1 ≤ b1 + t(1 − α=2; n − 2)sfb1g) = 1 − α I And now we can use a table to look up and produce confidence intervals Using tables for Computing Intervals I The tables in the book (table B.2 in the appendix) for t(1 − α=2; ν) where Pft(ν) ≤ t(1 − α=2; ν)g = A I Provides the inverse CDF of the t-distribution I This can be arrived at computationally as well Matlab: tinv(1 − α=2; ν) 1 − α confidence limits for β1 I The 1 − α confidence limits for β1 are b1 ± t(1 − α=2; n − 2)sfb1g I Note that this quantity can be used to calculate confidence intervals given n and α. I Fixing α can guide the choice of sample size if a particular confidence interval is desired I Give a sample size, vice versa. I Also useful for hypothesis testing Tests Concerning β1 I Example 1 I Two-sided test I H0 : β1 = 0 I Ha : β1 6= 0 I Test statistic b − 0 t∗ = 1 sfb1g Tests Concerning β1 I We have an estimate of the sampling distribution of b1 from the data. I If the null hypothesis holds then the b1 estimate coming from the data should be within the 95% confidence interval of the sampling distribution centered at 0 (in this case) b − 0 t∗ = 1 sfb1g Decision rules ∗ if jt j ≤ t(1 − α=2; n − 2); conclude H0 ∗ if jt j > t(1 − α=2; n − 2); conclude Hα Absolute values make the test two-sided Intuition p-value is value of α that moves the green line to the blue line Calculating the p-value I The p-value, or attained significance level, is the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected. I This can be looked up using the CDF of the test statistic. I In Matlab Two-sided p-value 2 ∗ (1 − tcdf (jt∗j; ν)) Inferences Concerning β0 I Largely, inference procedures regarding β0 can be performed in the same way as those for β1 I Remember the point estimator b0 for β0 b0 = Y¯ − b1X¯ Sampling distribution of b0 I The sampling distribution of b0 refers to the different values of b0 that would be obtained with repeated sampling when the levels of the predictor variable X are held constant from sample to sample. I For the normal regression model the sampling distribution of b0 is normal Sampling distribution of b0 I When error variance is known E(b0) = β0 1 X¯ 2 σ2fb g = σ2( + ) 0 P 2 n (Xi − X¯ ) I When error variance is unknown 1 X¯ 2 s2fb g = MSE( + ) 0 P 2 n (Xi − X¯ ) Confidence interval for β0 The 1 − α confidence limits for β0 are obtained in the same manner as those for β1 b0 ± t(1 − α=2; n − 2)sfb0g Considerations on Inferences on β0 and β1 I Effects of departures from normality The estimators of β0 and β1 have the property of asymptotic normality - their distributions approach normality as the sample size increases (under general conditions) I Spacing of the X levels The variances of b0 and b1 (for a given n and σ2) depend strongly on the spacing of X Sampling distribution of point estimator of mean response I Let Xh be the level of X for which we would like an estimate of the mean response Needs to be one of the observed X's I The mean response when X = Xh is denoted by E(Yh) I The point estimator of E(Yh) is Y^h = b0 + b1Xh We are interested in the sampling distribution of this quantity Sampling Distribution of Y^h I We have Y^h = b0 + b1Xh 0 I Since this quantity is itself a linear combination of the Yi s it's sampling distribution is itself normal. I The mean of the sampling distribution is EfY^hg = Efb0g + Efb1gXh = β0 + β1Xh Biased or unbiased? Sampling Distribution of Y^h I To derive the sampling distribution variance of the mean P response we first show that b1 and (1=n) Yi are uncorrelated and, hence, for the normal error regression model independent I We start with the definitions X 1 Y¯ = ( )Y n i ¯ X (Xi − X ) b = k Y ; k = 1 i i i P 2 (Xi − X¯ ) Sampling Distribution of Y^h I We want to show that mean response and the estimate b1 are uncorrelated 2 Cov(Y¯ ; b1) = σ fY¯ ; b1g = 0 I To do this we need the following result (A.32) n n n 2 X X X 2 σ f ai Yi ; ci Yi g = ai ci σ fYi g i=1 i=1 i=1 when the Yi are independent Sampling Distribution of Y^h Using this fact we have n n n X 1 X X 1 σ2f Y ; k Y g = k σ2fY g n i i i n i i i=1 i=1 i=1 n X 1 = k σ2 n i i=1 n σ2 X = k n i i=1 = 0 So the Y¯ and b1 are uncorrelated Sampling Distribution of Y^h I This means that we can write down the variance 2 2 σ fY^hg = σ fY¯ + b1(Xh − X¯ )g alternative and equivalent form of regression function I But we know that the mean of Y and b1 are uncorrelated so 2 2 2 2 σ fY^hg = σ fY¯ g + σ fb1g(Xh − X¯ ) Sampling Distribution of Y^h I We know (from last lecture) σ2 σ2fb g = 1 P 2 (Xi − X¯ ) MSE s2fb g = 1 P 2 (Xi − X¯ ) I And we can find 1 X nσ2 σ2 σ2fY¯ g = σ2fY g = = n2 i n2 n Sampling Distribution of Y^h I So, plugging in, we get σ2 σ2 σ2fY^ g = + (X − X¯ )2 h P 2 h n (Xi − X¯ ) I Or 1 (X − X¯ )2 σ2fY^ g = σ2 + h h P 2 n (Xi − X¯ ) Sampling Distribution of Y^h Since we often won't know σ2 we can, as usual, plug in S2 = SSE=(n − 2), our estimate for it to get our estimate of this sampling distribution variance 1 (X − X¯ )2 s2fY^ g = S2 + h h P 2 n (Xi − X¯ ) No surprise::: I The sampling distribution of our point estimator for the output is distributed as a t-distribution with two degrees of freedom Y^ − EfY g h h ∼ t(n − 2) sfY^hg I This means that we can construct confidence intervals in the same manner as before.

Inference in Normal Regression Model

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support