Midterm Review
Total Page:16
File Type:pdf, Size:1020Kb
Formal Statement of Simple Linear Regression Model Yi = β0 + β1Xi + i th I Yi value of the response variable in the i trial I β0 and β1 are parameters I Xi is a known constant, the value of the predictor variable in the i th trial I i is a random error term with mean E(i ) = 0 and variance 2 Var(i ) = σ I i = 1;:::; n Least Squares Linear Regression I Seek to minimize n X 2 Q = (Yi − (β0 + β1Xi )) i=1 I Choose b0 and b1 as estimators for β0 and β1. b0 and b1 will minimize the criterion Q for the given sample observations (X1; Y1); (X2; Y2); ··· ; (Xn; Yn). Normal Equations I The result of this maximization step are called the normal equations. b0 and b1 are called point estimators of β0 and β1 respectively. X X Yi = nb0 + b1 Xi X X X 2 Xi Yi = b0 Xi + b1 Xi I This is a system of two equations and two unknowns. The solution is given by::: Solution to Normal Equations After a lot of algebra one arrives at P(X − X¯ )(Y − Y¯ ) b = i i 1 P 2 (Xi − X¯ ) b0 = Y¯ − b1X¯ P X X¯ = i n P Y Y¯ = i n Properties of Solution th I The i residual is defined to be ei = Yi − Y^i P I i ei = 0 P ^ P I i Yi = i Yi P I i Xi ei = 0 P ^ I i Yi ei = 0 I The regression line always goes through the point X¯ ; Y¯ Alternative format of linear regression model: ∗ ¯ Yi = β0 + β1(Xi − X ) + i The least squares estimator b1 for β1 remains the same as before. ∗ ¯ The least squares estimator for β0 = β0 + β1X becomes ∗ ¯ ¯ ¯ ¯ ¯ b0 = b0 + b1X = (Y − b1X ) + b1X = Y Hence the estimated regression function is Y^ = Y¯ + b1(X − X¯ ) s2 estimator for σ2 SSE P(Y − Y^ )2 P e2 s2 = MSE = = i i = i n − 2 n − 2 n − 2 2 I MSE is an unbiased estimator of σ 2 E(MSE) = σ I The sum of squares SSE has n − 2 \degrees of freedom" associated with it. I Cochran's theorem (later in the course) tells us where degree's of freedom come from and how to calculate them. Normal Error Regression Model Yi = β0 + β1Xi + i th I Yi value of the response variable in the i trial I β0 and β1 are parameters I Xi is a known constant, the value of the predictor variable in the i th trial 2 I i ∼iid N(0; σ ) note this is different, now we know the distribution I i = 1;:::; n Inference concerning β1 Tests concerning β1 (the slope) are often of interest, particularly H0 : β1 = 0 Ha : β1 6= 0 the null hypothesis model Yi = β0 + (0)Xi + i implies that there is no relationship between Y and X. Note the means of all the Yi 's are equal at all levels of Xi . Sampling Dist. Of b1 I The point estimator for b1 is P(X − X¯ )(Y − Y¯ ) b = i i 1 P 2 (Xi − X¯ ) I For a normal error regression model the sampling distribution of b1 is normal, with mean and variance given by E(b1) = β1 σ2 Var(b ) = 1 P 2 (Xi − X¯ ) Estimated variance of b1 2 I When we don't know σ then we have to replace it with the MSE estimate I Let SSE s2 = MSE = n − 2 where X 2 SSE = ei and ei = Yi − Y^i plugging in we get σ2 Var(b ) = 1 P 2 (Xi − X¯ ) s2 Var(^ b ) = 1 P 2 (Xi − X¯ ) Recap I We now have an expression for the sampling distribution of b1 when σ2 is known σ2 b ∼ N (β ; ) (1) 1 1 P 2 (Xi − X¯ ) 2 I When σ is unknown we have an unbiased point estimator of σ2 s2 Var(^ b ) = 1 P 2 (Xi − X¯ ) Sampling Distribution of (b1 − β1)=S(b1) p I b1 is normally distributed so (b1 − β1)=( Var(b1)) is a standard normal variable I We don't know Var(b1) so it must be estimated from data. We have already denoted it's estimate I If using the estimate V^ (b1) it can be shown that b − β 1 1 ∼ t(n − 2) S^(b1) q S^(b1) = V^ (b1) Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily. 1 − α confidence limits for β1 I The 1 − α confidence limits for β1 are b1 ± t(1 − α=2; n − 2)sfb1g I Note that this quantity can be used to calculate confidence intervals given n and α. I Fixing α can guide the choice of sample size if a particular confidence interval is desired I Given a sample size, vice versa. I Also useful for hypothesis testing Tests Concerning β1 I Example 1 I Two-sided test I H0 : β1 = 0 I Ha : β1 6= 0 I Test statistic b − 0 t∗ = 1 sfb1g Tests Concerning β1 I We have an estimate of the sampling distribution of b1 from the data. I If the null hypothesis holds then the b1 estimate coming from the data should be within the 95% confidence interval of the sampling distribution centered at 0 (in this case) b − 0 t∗ = 1 sfb1g Decision rules ∗ if jt j ≤ t(1 − α=2; n − 2); accept H0 ∗ if jt j > t(1 − α=2; n − 2); reject H0 Absolute values make the test two-sided Inferences Concerning β0 I Largely, inference procedures regarding β0 can be performed in the same way as those for β1 I Remember the point estimator b0 for β0 b0 = Y¯ − b1X¯ Sampling distribution of b0 I When error variance is known E(b0) = β0 1 X¯ 2 σ2fb g = σ2( + ) 0 P 2 n (Xi − X¯ ) I When error variance is unknown 1 X¯ 2 s2fb g = MSE( + ) 0 P 2 n (Xi − X¯ ) Confidence interval for β0 The 1 − α confidence limits for β0 are obtained in the same manner as those for β1 b0 ± t(1 − α=2; n − 2)sfb0g Sampling Distribution of Y^h I We have Y^h = b0 + b1Xh 0 I Since this quantity is itself a linear combination of the Yi s it's sampling distribution is itself normal. I The mean of the sampling distribution is EfY^hg = Efb0g + Efb1gXh = β0 + β1Xh Biased or unbiased? Sampling Distribution of Y^h I So, plugging in, we get 1 (X − X¯ )2 σ2fY^ g = σ2 + h h P 2 n (Xi − X¯ ) 2 I Since we often won't know σ we can, as usual, plug in S2 = SSE=(n − 2), our estimate for it to get our estimate of this sampling distribution variance 1 (X − X¯ )2 s2fY^ g = S2 + h h P 2 n (Xi − X¯ ) No surprise::: I The sampling distribution of our point estimator for the output is distributed as a t-distribution with two degrees of freedom Y^ − EfY g h h ∼ t(n − 2) sfY^hg I This means that we can construct confidence intervals in the same manner as before. Confidence Intervals for E(Yh) I The 1 − α confidence intervals for E(Yh) are Y^h ± t(1 − α=2; n − 2)sfY^hg I From this hypothesis tests can be constructed as usual. Prediction interval for single new observation I If the regression parameters are unknown the 1 − α prediction interval for a new observation Yh is given by the following theorem Y^h ± t(1 − α=2; n − 2)sfpredg I We have 2 2 2 2 2 2 σ fpredg = σ fYh − Y^hg = σ fYhg + σ fY^hg = σ + σ fY^hg An unbiased estimator of σ2fpredg is 2 2 s fpredg = MSE + s fY^hg, which is given by 1 (X − X¯ )2 s2fpredg = MSE 1 + + h P 2 n (Xi − X¯ ) ANOVA table for simple lin. regression Source of Variation SS df MS E(MS) P ^ ¯ 2 2 2 P ¯ 2 Regression SSR = (Yi − Y ) 1 MSR = SSR=1 σ + β1 (Xi − X ) P 2 2 Error SSE = (Yi − Y^i ) n − 2 MSE = SSE=(n − 2) σ P 2 Total SSTO = (Yi − Y¯ ) n − 1 F Test of β1 = 0 vs. β1 6= 0 ANOVA provides a battery of useful tests. For example, ANOVA provides an easy test for Two-sided test Test statistic from before t∗ = b1−0 H0 : β1 = 0 sfb1g Ha : β1 6= 0 ANOVA test statistic ∗ MSR Test statistic F = MSE F Distribution 2 I The F distribution is the ratio of two independent χ random variables normalized by their corresponding degrees of freedom. ∗ I The test statistic F follows the distribution F ∗ ∼ F (1; n − 2) Hypothesis Test Decision Rule ∗ Since F is distributed as F (1; n − 2) when H0 holds, the decision rule to follow when the risk of a Type I error is to be controlled at α is: ∗ If F ≤ F (1 − α; 1; n − 2), conclude H0 ∗ If F > F (1 − α; 1; n − 2), conclude Ha General Linear Test I The test of β1 = 0 versus β1 6= 0 is but a single example of a general test for a linear statistical models. I The general linear test has three parts I Full Model I Reduced Model I Test Statistic Full Model Fit I A full linear model is first fit to the data Yi = β0 + β1Xi + i I Using this model the error sum of squares is obtained, here for example the simple linear model with non-zero slope is the \full" model X 2 X 2 SSE(F ) = [Yi − (b0 + b1Xi )] = (Yi − Y^i ) = SSE Fit Reduced Model I One can test the hypothesis that a simpler model is a \better" model via a general linear test (which is really a likelihood ratio test in disguise).