Week 3: Finish SLR Inference Then Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction Intervals II. Polynomials, log transformation, categorical variables, interactions & main effects Max H. Farrell The University of Chicago Booth School of Business Quick Recap I. We drew a line through a cloud of points Y^ = b + b X & Y Y^ = e 0 1 − I It was a good line because: 1. It minimized the SSE 2. It extracted all linear information 3. It implemented the model II. The regression model helped us understand uncertainty Y = β + β X + "; " (0; σ2) 0 1 ∼ N I Sampling distribution: estimates change as data changes σ2 b β ; σ2 σ2 1 ( 1 b1 ) b1 = 2 ∼ N (n 1)sx − 1 Our work today I. Finish SLR I Put sampling distribution to work I Communicable summaries of uncertainty II. Multiple Linear Regression Y = β + β X +β X + "; " (0; σ2) 0 1 1 2 2 ∼ N I Everything carries over from SLR I Interpretation requires one extra piece 2 Summarizing the sampling distribution Remember the two types of regression questions: 1. Model 2. Prediction Y = β0 + β1X + " Y^ = b0 + b1X Y = b0 + b1X + e 1. Properties of βk I Sign: Does Y go up when X goes up? I Magnitude: By how much? ) A confidence interval captures uncertainty about β 2. Predicting Y I Best guess for Y given (or \conditional on") X. ) A prediction interval captures uncertainty about Y 3 Confidence Intervals and Testing 0 Suppose we think that the true βj is equal to some value βj (often 0). Does the data support that guess? We can rephrase this in terms of competing hypotheses. 0 (Null) H0 : βj = βj (Alternative) H : β = β0 1 j 6 j Our hypothesis test will either reject or fail to reject the null hypothesis I If the hypothesis test rejects the null hypothesis, we have statistical support for our claim I Gives only a \yes" or \no" answer! You choose the \probability" of false rejection: α I 4 We use bj for our test about βj. 0 I Reject H0 if bj is \far" from βj ; assume H0 when close I What we really care about is: 0 how many standard errors bj is away from βj I standard error = sb1 , cf σb1 0 bj βj H0 The t-statistic is this test is zbj = − (0; 1): sbj ∼ N \Big" z makes our guess β0 look silly reject j βj j j ) I If H0 is true, then P[ zbj >2] < 0:05 = α j j 0 bj βj 0 But: zβj = − > 2 βj (bj 2sbj ) j j sbj , 62 ± 5 Confidence intervals Since b (β ; σ2 ), j ∼ N j bj bj βj 1 α = P zα/2 < − < z1−α/2 − sbj h i = P βj (bj zα/2sbj ) 2 ± Why should we care about confidence intervals? I The confidence interval completely captures the information in the data about the parameter. I Center is your estimate I Length is how sure you are about your estimate I Any value outside would be rejected by a test! 6 Real life or pretend? h i P β1 (b1 2σb1 ) = 95% 2 ± or h i P β1 (b1 2σb1 ) =0 or 1 2 ± ? True β1 7 Level, Size, and p-values The p-value is P[ Z > zβj ]. j j j j I Test with size/level = p-value almost rejects I CI of level 1 (p-value) just excludes zβj − j j Zα 2 Z1 − α 2 −|zβj| |zβj| 1 − α p/2 Level α p/2 p−value 8 Example: revisit the CAPM regression for the Windsor fund. Does Windsor have a non-zero intercept? (i.e., does it make/lose money independent of the market?). H0 : β0 = 0 H : β = 0 1 0 6 I Recall: the intercept estimate b0 is the stock's \alpha" > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** > 2*pnorm(-abs(0.003647/.001409)) [1] 0.009643399 We reject the null at α = :05, Windsor does have an \alpha" over the market. I Why set α = :05? What about at α = 0:01? 9 Now let's ask whether or not Windsor moves in a different way than the market (e.g., is it more conservative?). I Recall that the estimate of the slope b1 is the \beta" of the stock. This is a rare case where the null hypothesis is not zero: H0 : β1 = 1; Windsor is just the market (+ alpha). H : β = 1; Windsor softens or exaggerates market moves. 1 1 6 This time, R's output t/p values are not what we want (why?). > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** 10 But we can get the appropriate values easily: I Test and p-value: > b1 <- 0.935717; sb1 <- 0.029150 > zb1 <- (b1 - 1)/sb1 [1] -2.205249 > 2*pnorm(-abs(zb1)) [1] 0.02743665 I Confidence Interval > confint(windsor.reg, level=0.95) 2.5 % 97.5 % (Intercept) 0.000865657 0.006428105 mfund$valmrkt 0.878193149 0.993240873 Reject at α=:05, so Windsor softens than the market. I What about other values of α? confint(windsor.reg, level=0.99) confint(windsor.reg, level=(1-2*pt(-abs(zb1), df=178))) 11 Forecasting & Prediction Intervals The conditional forecasting problem: n I Given covariate Xf and sample data Xi;Yi , predict f gi=1 the \future" observation Yf . The solution is to use our LS fitted value: Y^f = b0 + b1Xf . I That's the easy bit. The hard (and very important!) part of forecasting is assessing uncertainty about our predictions. One method is to specify a prediction interval I a range of Y values that are likely, given an X value. 12 The least squares line is a prediction rule: Read Y^ off the line for a new X. I It's not a perfect prediction: Y^ is what we expect. ● ● Yˆ 160 ● 140 ● 120 ● ● price ● 100 ● ● ● ● ● 80 ● ● ● 60 1.0 1.5 2.0 2.5 3.0 3.5 size X 13 If we use Y^f , our prediction error has two pieces e = Y Y^ = Y b b X f f − f f − 0 − 1 f Yf E[Yf Xf ]=β0 + β1Xf " | ef { fit error b0 + b1Xf ˆ Yf } Xf 14 We can decompose ef into two sources of error: I Inherent idiosyncratic randomness (due to "). I Estimation error in the intercept and slope (i.e., discrepancy between our line and \the truth"). e = Y Y^ = (Y E[Y X ]) + E[Y X ] Y^ f f − f f − f j f f j f − f = " + (E[Y X ] Y^ ) f f j f − f = " + (β b ) + (β b )X : f 0 − 0 1 − 1 f The variance of our prediction error is thus var(e ) = var(" ) + var(E[Y X ] Y^ ) = σ2 + var(Y^ ) f f f j f − f f 15 From the sampling distributions derived earlier, var(Y^f ) is 2 var(b0 + b1Xf ) = var(b0) + Xf var(b1) + 2Xf cov(b0; b1) 1 (X X¯)2 = σ2 + f − : n (n 1)s2 − x 2 2 Replacing σ with s gives the standard error for Y^f . And hence the variance of our predictive error is 1 (X X¯)2 var(e ) = σ2 1+ + f − : f n (n 1)s2 − x 16 Putting it all together, we have that 1 (X X¯)2 Y^ Y ; σ2 1 + + f − f ∼ N f n (n 1)s2 − x A (1 α)100% confidence/prediction interval for Y is thus − f s ! 1 (X X¯)2 b + b X z s 1 + + f − : 0 1 f ± α/2 × n (n 1)s2 − x 17 Looking closer at what we'll call s 1 (X X¯)2 q s = s 1 + + f − = s2 + s2 : pred n (n 1)s2 fit − x A large predictive error variance (high uncertainty) comes from I Large s (i.e., large "'s). I Small n (not enough data). I Small sx (not enough observed spread in covariates). ¯ I Large (Xf X). − The first three are familiar... what about the last one? 18 For Xf far from our X¯, the space between lines is magnified ... Y (X X¯) f True small− (X,¯ Y¯ ) Line point of means Estimated Line (Xf X¯) large− X 19 The prediction (conf.) interval needs to widen away from X¯ ) 20 Returning to our housing data for an example ... > Xf <- data.frame(size=c(mean(size), 2.5, max(size))) > cbind(Xf,predict(reg, newdata=Xf, interval="prediction")) size fit lwr upr 1 1.853333 104.4667 72.92080 136.0125 2 2.500000 127.3496 95.18501 159.5142 3 3.500000 162.7356 127.36982 198.1013 I interval="prediction" gives lwr and upr, otherwise we just get fit I spred is not shown in this output 21 We can get spred from the predict output. > p <- predict(reg, newdata=Xf, se.fit=TRUE) > s <- p$residual.scale > sfit <- p$se.fit > spred <- sqrt(s^2+sfit^2) > b <- reg$coef > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qnorm(.975)*spred[1] [,1] [,2] [,3] [1,] 104.4667 75.84713 133.0862 > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qt(.975, df=n-2)*spred[1] [1,] 104.4667 72.92080 136.0125 I Or, we can calculate it by hand [see R code].

Week 3: Finish SLR Inference Then Multiple Linear Regression

CONFIDENCE Vs PREDICTION INTERVALS 12/2/04 Inference for Coefﬁcients Mean Response at X Vs

Choosing a Coverage Probability for Prediction Intervals

STATS 305 Notes1

Inference in Normal Regression Model

Sieve Bootstrap-Based Prediction Intervals for Garch Processes

On Small Area Prediction Interval Problems

Bayesian Prediction Intervals for Assessing P-Value Variability in Prospective Replication Studies Olga Vsevolozhskaya1,Gabrielruiz2 and Dmitri Zaykin3

4.7 Confidence and Prediction Intervals

1 Confidence Intervals

Correlations & Confidence Intervals

Lecture 32 the Prediction Interval Formulas for the Next Observation from a Normal Distribution When Σ Is Unknown

Chapters 2 and 10: Least Squares Regression