Lecture 2 Linear Regression: a Model for the Mean Sharyn O’Halloran Closer Look At

Lecture 2 Linear Regression: A Model for the Mean Sharyn O’Halloran Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness Model checking Log transformation (of Y, X, or both) U9611 Spring 2005 2 Linear Regression: Introduction Data: (Yi, Xi) for i = 1,...,n Interest is in the probability distribution of Y as a function of X Linear Regression model: Mean of Y is a straight line function of X, plus an error term or residual Goal is to find the best fit line that minimizes the sum of the error terms U9611 Spring 2005 3 Estimated regression line Steer example (see Display 7.3, p. 177) Equation for estimated regression line: Intercept=6.98 7 .73 6.5 Fitted line ^ PH 1 Y= 6.98-.73X 6 Error term 5.5 0 1 2 ltime Fitted values PH U9611 Spring 2005 4 Create a new variable ltime=log(time) Regression analysis U9611 Spring 2005 5 Regression Terminology Regression: the mean of a response variable as a function of one or more explanatory variables: µ{Y | X} Regression model: an ideal formula to approximate the regression Simple linear regression model: µ{Y | X} = β 0 + β 1X “mean of Y given X” or Unknown “regression of Y on X” Intercept Slope parameter U9611 Spring 2005 6 Regression Terminology Y X Dependent variable Independent variable Explained variable Explanatory variable Response variable Control variable Y’s probability distribution is to be explained by X b0 and b1 are the regressionregression coefficientscoefficients (See Display 7.5, p. 180) Note: Y = b0 + b1 X is NOT simple regression U9611 Spring 2005 7 Regression Terminology: Estimated coefficients β 0 + β 1X β 0 + β 1 X βˆ 0 + βˆ 1 X βˆ 0 + βˆ 1 X β 0 + β 1 βˆ 0 βˆ 1 βˆ 0 + βˆ 1 Choose β ˆ 0 and β ˆ 1 to make the residuals small U9611 Spring 2005 8 Regression Terminology FittedFitted valuevalue for obs. i is its estimated mean: ˆ Y = fiti = µ{Y | X} = β 0 + β1 X ResidualResidual for obs. i: ˆ resi = Yi - fiti ⇒ ei = Yi −Y LeastLeast SquaresSquares statistical estimation method finds those estimates that minimize the sum of squared residuals. n n 2 ˆ 2 ∑(yi − (β0 + β1xi )) = ∑(yi − y) i=1 i=1 Solution (from calculus) on p. 182 of Sleuth U9611 Spring 2005 9 Least Squares Procedure The Least-squares procedure obtains estimates of the linear equation coefficients β0 and β1, in the model yˆi = β0 + β1xi by minimizing the sum of the squared residuals or errors (ei) 2 ˆ 2 SSE = ∑ei = ∑(yi − yi ) This results in a procedure stated as 2 2 SSE = ∑ei = ∑(yi − (β0 + β1xi )) Choose β0 and β1 so that the quantity is minimized. U9611 Spring 2005 10 Least Squares Procedure The slope coefficient estimator is n CORRELATION BETWEEN X AND Y ∑(xi − X )(yi −Y ) i=1 sY ˆ STANDARD DEVIATION β1 = = rxy n s OF Y OVER THE (x − X )2 X STANDARD DEVIATION ∑ i OF X i=1 And the constant or intercept indicator is ˆ ˆ β0 = Y − β1 X U9611 Spring 2005 11 Least Squares Procedure(cont.) Note that the regression line always goes through the mean X, Y. Think of this regression line as Relation Between Yield and Fertilizer the expected value 100 of Y for a given ) value of X. 80 60 40 Trend line Bushel/Acre That is, for any value of the ( independent variable there is 20 Yield Yield a single most likely value for 0 the dependent variable 0 100 200 300 400 500 600 700 800 Fertilizer (lb/Acre) U9611 Spring 2005 12 Tests and Confidence Intervals for β0, β1 Degrees of freedom: (n-2) = sample size - number of coefficients Variance {Y|X} σσ2= (sum of squared residuals)/(n-2) Standard errors (p. 184) Ideal normal model: the sampling distributions of β0 and β1 have the shape of a t-distribution on (n-2) d.f. Do t-tests and CIs as usual (df=n-2) U9611 Spring 2005 13 P values for Ho=0 Confidence intervals U9611 Spring 2005 14 Inference Tools Hypothesis Test and Confidence Interval for mean of Y at some X: Estimate the mean of Y at X = X0 by ˆ ˆ µˆ{Y | X 0 } = β 0 + β 1 X 0 ˆ Standard Error of β0 1 ( X − X ) 2 ˆ ˆ 0 SE [µ{Y | X 0 }] = σ + 2 n (n − 1)s x Conduct t-test and confidence interval in the usual way (df = n-2) U9611 Spring 2005 15 Confidence bands for conditional means confidence bands in simple regression have an hourglass shape, narrowest at the mean of X the lfitci command automatically calculate and graph the confidence bands U9611 Spring 2005 16 Prediction PredictionPrediction of a future Y at X=X0 Pred(Y | X 0 ) = µˆ{Y | X 0} StandardStandard errorerror ofof predictionprediction: 2 2 SE[Pred(Y | X 0 )] = σˆ + (SE[µˆ(Y | X 0 )]) Variability of Y Uncertainty in about its mean the estimated mean 95%95% predictionprediction intervalinterval: Pred(Y | X 0 ) ± tdf (.975)*SE[Pred(Y | X 0 )] U9611 Spring 2005 17 Residuals vs. predicted values plot After any regression analysis we can automatically draw a residual-versus-fitted plot just by typing U9611 Spring 2005 18 Predicted values (yhatyhat) After any regression, the predict command can create a new variable yhat containing predicted Y values about its mean U9611 Spring 2005 19 Residuals (ee) the resid command can create a new variable e containing the residuals U9611 Spring 2005 20 The residual-versus-predicted-values plot could be drawn “by hand” using these commands U9611 Spring 2005 21 Second type of confidence interval for regression prediction: “prediction band” This express our uncertainty in estimating the unknown value of Y for an individual observation with known X value Command: lftci with stdf option Additional note: Predict can generate two kinds of standard errors for the predicted y value, which have two different applications. Confidence bands for conditional means (stdp) Confidence bands for individual-case predictions (stdf) 3 3 2 2 1 Distance Distance 1 0 0 -1 -500 0 500 1000 -500 0 500 1000 VELOCITY VELOCITY Confidence bands for conditional means (stdp) 3 95% confidence interval for µ{Y|1000} 2 Distance 1 confidence band:band a set of confidence intervals 0 for µ{Y|X0} -500 0 500 1000 VELOCITY Confidence bands for individual-case predictions (stdf) 3 95% prediction interval 2 for Y at X=1000 1 Distance Calibration interval:interval 0 values of X for which Y0is in a prediction interval -1 -500 0 500 1000 U9611 Spring 2005 VELOCITY 24 Notes about confidence and prediction bands Both are narrowest at the mean of X Beware of extrapolation The width of the Confidence Interval is zero if n is large enough; this is not true of the Prediction Interval. U9611 Spring 2005 25 Review of simple linear regression 1. Model with µ{Y | X} = β 0 + β 1X constant variance. var{Y | X} = σ 2 n n 2. Least squares: ˆ 2 choose estimators β 1 = ∑(X i − X )(Yi −Y ) /∑(X i − X ) . β0 and β1 i=1 i=1 to minimize the sum of ˆ ˆ squared residuals. β 0 = Y − β1 X ˆ ˆ resi = Yi − β0 − β1 X i (i =1,..,n) n ˆ 2 2 3. Properties σ = ∑ resi /(n − 2) of estimators. i=1 ˆ 2 SE(β1) = σˆ / (n −1)sx ˆ 2 2 U9611 SE(β0Spring) = 2005σˆ / (1/ n) + X /(n −1)sx26 Assumptions of Linear Regression A linear regression model assumes: Linearity: µ {Y|X} = β0 + β1X Constant Variance: var{Y|X} = σ2 Normality Dist. of Y’s at any X is normal Independence Given Xi’s, the Yi’s are independent U9611 Spring 2005 27 Examples of Violations Non-Linearity The true relation between the independent and dependent variables may not be linear. For example, consider campaign fundraising and the probability of winning an election. P(w) Probability of The probability of Winning an winning increases with Election each additional dollar spent and then levels off after $50,000. $50,000 Spending U9611 Spring 2005 28 Consequences of violation of linearity : If “linearity” is violated, misleading conclusions may occur (however, the degree of the problem depends on the degree of non-linearity) U9611 Spring 2005 29 Examples of Violations: Constant Variance Constant Variance or Homoskedasticity The Homoskedasticity assumption implies that, on average, we do not expect to get larger errors in some cases than in others. Of course, due to the luck of the draw, some errors will turn out to be larger then others. But homoskedasticity is violated only when this happens in a predictable manner. Example: income and spending on certain goods. People with higher incomes have more choices about what to buy. We would expect that there consumption of certain goods is more variable than for families with lower incomes. U9611 Spring 2005 30 Violation of constant variance X10 Relation between Income X 8 and Spending violates Spending homoskedasticity X6 ε 8 ε 6 ε6=−+((YabX66 )) ε 9 ε9=−+((YabX99 )) X4 ε 7 X 2 As income increases so X X9 ε 5 7 do the errors (vertical distance from the X X 5 3 predicted line) X1 income U9611 Spring 2005 31 Consequences of non-constant variance If “constant variance” is violated, LS estimates are still unbiased but SEs, tests, Confidence Intervals, and Prediction Intervals are incorrect However, the degree depends… U9611 Spring 2005 32 Violation of Normality Non-Normality Nicotine use is characterized by a large number of people Frequency of not smoking at all and Nicotine use another large number of people who smoke every day.

Load more