Multiple Linear Regression (MLR) Handouts
Total Page:16
File Type:pdf, Size:1020Kb
Multiple Linear Regression (MLR) Handouts Yibi Huang • Data and Models • Least Squares Estimate, Fitted Values, Residuals • Sum of Squares • How to Do Regression in R? • Interpretation of Regression Coefficients • t-Tests on Individual Regression Coefficients • F -Tests for Comparing Nested Models MLR - 1 You may skip this lecture if you have taken STAT 224 or 245. However, you are encouraged to at least read through the slides if you skip the video lecture. MLR - 2 Data for Multiple Linear Regression Models Multiple linear regression is a generalized form of simple linear regression, when there are multiple explanatory variables. SLR MLR x y x1 x2 ::: xp y case 1: x1 y1 x11 x12 ::: x1p y1 case 2: x2 y2 x21 x22 ::: x2p y2 . .. case n: xn yn xn1 xn2 ::: xnp yn I For SLR, we observe pairs of variables. For MLR, we observe rows of variables. Each row (or pair) is called a case, a record, or a data point I yi is the response (or dependent variable) of the ith case I There are p explanatory variables (or covariates, predictors, independent variables), and xik is the value of the explanatory variable xk of the ith case MLR - 3 Multiple Linear Regression Models yi = β0 + β1xi1 + ::: + βpxip + "i In the model above, 2 I "i 's (errors, or noise) are i.i.d. N(0; σ ) I Parameters include: β0 = intercept; βk = regression coefficient (slope) for the kth explanatory variable; k = 1;:::; p 2 σ = Var("i ) = the variance of errors I Observed (known): yi ; xi1; xi2;:::; xip 2 Unknown: β0, β1; : : : ; βp, σ , "i 's I Random variables: "i 's, yi 's 2 Constants (nonrandom): βk 's, σ , xik 's MLR - 4 MLR is just like SLR. The least squares estimate (βb0;:::; βbp) for (β0; : : : ; βp) is the intercept and slopes of the (hyper)plane with the minimum sum of squared vertical distance to the data points n X 2 (yi − βb0 − βb1xi1 − ::: − βbpxip) i=1 Fitting the Model | Least Squares Method Recall for SLR, the least squares 18 estimate (βb0; βb1) for (β0; β1) is the 16 intercept and slope of the straight 14 line with the minimum sum of 12 squared vertical distance to the data points 10 8 Xn 2 Y = % in poverty (yi − βb0 − βb1xi ) : 6 i=1 75 80 85 90 95 X = % HS grad MLR - 5 Fitting the Model | Least Squares Method Recall for SLR, the least squares 18 estimate (βb0; βb1) for (β0; β1) is the 16 intercept and slope of the straight 14 line with the minimum sum of 12 squared vertical distance to the data points 10 8 Xn 2 Y = % in poverty (yi − βb0 − βb1xi ) : 6 i=1 75 80 85 90 95 MLR is just like SLR. The least squares estimateX = (%βb0 ;:::;HS gradβbp) for (β0; : : : ; βp) is the intercept and slopes of the (hyper)plane with the minimum sum of squared vertical distance to the data points n X 2 (yi − βb0 − βb1xi1 − ::: − βbpxip) i=1 MLR - 5 The \Hat" Notation From now on, we use the \hat" notation to differentiate I the estimated coefficient βbj from I the actual unknown coefficient βj MLR - 6 Solving the Least Squares Problem (1) To find the (βb0; βb1;:::; βbp) that minimize n X 2 L(βb0; βb1;:::; βbp) = (yi − βb0 − βb1xi1 − ::: − βbpxip) i=1 one can set the derivatives of L with respect to βbj to 0 n @L X = −2 (yi − βb0 − βb1xi1 − ::: − βbpxip) @βb0 i=1 n @L X = −2 xik (yi − βb0 − βb1xi1 − ::: − βbpxip); k = 1; 2;:::; p @βbk i=1 and then equate them to 0. This results in a system of (p + 1) equations in (p + 1) unknowns on the next page. MLR - 7 Solving the Least Squares Problem (2) Pn Pn Pn βb0 · n + βb1 i=1 xi1 + ··· + βbp i=1 xip = i=1 yi Pn Pn 2 Pn Pn βb0 i=1 xi1 + βb1 i=1 xi1 + ··· + βbp i=1 xi1xip = i=1 xi1yi . Pn Pn Pn Pn βb0 i=1 xik + βb1 i=1 xik xi1 + ··· + βbp i=1 xik xip = i=1 xik yi . Pn Pn Pn 2 Pn βb0 i=1 xip + βb1 i=1 xipxi1 + ··· + βbp i=1 xip = i=1 xipyi I Don't worry about solving the equations. R and many other softwares can do the computation for us. I In general, βbj 6= βj , but they will be close under some conditions MLR - 8 Solving the Least Squares Problem (2) Pn Pn Pn βb0 · n + βb1 i=1 xi1 + ··· + βbp i=1 xip = i=1 yi Pn Pn 2 Pn Pn βb0 i=1 xi1 + βb1 i=1 xi1 + ··· + βbp i=1 xi1xip = i=1 xi1yi . Pn Pn Pn Pn βb0 i=1 xik + βb1 i=1 xik xi1 + ··· + βbp i=1 xik xip = i=1 xik yi . Pn Pn Pn 2 Pn βb0 i=1 xip + βb1 i=1 xipxi1 + ··· + βbp i=1 xip = i=1 xipyi | {z } | {z } | {z } | {z } known known known known I Don't worry about solving the equations. R and many other softwares can do the computation for us. I In general, βbj 6= βj , but they will be close under some conditions MLR - 8 Fitted Values The fitted value or predicted value: ybi = βb0 + βb1xi1 + ::: + βbpxip I Again, the \hat" symbol is used to differentiate the fitted value ybi from the actual observed value yi . MLR - 9 Errors and Residuals I One cannot directly compute the errors "i = yi − β0 − β1xi1 − ::: − βpxip since the coefficients β0; β1; : : : ; βp are unknown. I The errors "i can be estimated by the residuals ei defined as: residual ei = observed yi − predicted yi = yi − ybi = yi − (βb0 + βb1xi1 + ::: + βbpxip) | {z } predicted yi I ei 6= "i in general since βbj 6= βj MLR - 10 Properties of Residuals Recall the least squares estimate (βb0; βb1;:::; βbp) satisfies the equations n X (yi − βb0 − βb1xi1 − ::: − βbpxip) = 0 and i=1 | {z } = yi −ybi = ei = residual n X z }| { xik (yi − βb0 − βb1xi1 − ::: − βbpxip) = 0; k = 1; 2;:::; p: i=1 Thus the residuals ei have the properties Xn Xn ei = 0 ; xik ei = 0; k = 1; 2;:::; p: i=1 i=1 | {z } | {z } Residuals add up to 0. Residuals are orthogonal to covariates. MLR - 11 Sum of Squares Observe that yi − y = (ybi − y) + (yi − ybi ) | {z } | {z } a b Squaring up both sides using the identity( a + b)2 = a2 +b2 +2ab, we get 2 2 2 (yi − y) = (ybi − y) + (yi − ybi ) + 2(ybi − y)(yi − ybi ) | {z } | {z } | {z } a2 b2 2ab Summing up over all the cases i = 1; 2;:::; n, we get SST SSR SSE zn }| { zn }| { zn }| { X 2 X 2 X 2 (yi − y) = (ybi − y) + (yi − ybi ) i=1 i=1 i=1 | {z } =ei n X + 2 (ybi − y)(yi − ybi ) i=1 | {z } = 0, see next page. MLR - 12 Pn Why i=1(ybi − y)(yi − ybi ) = 0? n X (ybi − y)(yi − ybi ) i=1 | {z } =ei n n X X = ybi ei − yei i=1 i=1 n n X X = (βb0 + βb1xi1 + ::: + βbpxip)ei − yei i=1 i=1 n n n n X X X X = βb0 ei +βb1 xi1ei + ::: + βbp xipei −y ei i=1 i=1 i=1 i=1 | {z } | {z } | {z } | {z } =0 =0 =0 =0 = 0 Pn in which we used the properties of residuals that i=1 ei = 0 and Pn i=1 xik ei = 0 for all k = 1;:::; p: MLR - 13 Interpretation of Sum of Squares =e n n n i X 2 X 2 X z }| { 2 (yi − y) = (ybi − y) + (yi − ybi ) i=1 i=1 i=1 | {z } | {z } | {z } SST SSR SSE I SST = total sum of squares I total variability of y I depends on the response y only, not on the form of the model I SSR = regression sum of squares I variability of y explained by x1;:::; xp I SSE = error (residual) sum of squares Pn 2 I = minβ0,β1,...,βp i=1(yi − β0 − β1xi1 − · · · − βpxip) I variability of y not explained by x's MLR - 14 Degrees of Freedom If the MLR model yi = β0 + β1xi1 + ::: + βpxip + "i , "i 's i.i.d. ∼ N(0; σ2) is true, it can be shown that SSE ∼ χ2 ; σ2 n−p−1 If we further assume that β1 = β2 = ··· = βp = 0, then SST SSR ∼ χ2 ; ∼ χ2 σ2 n−1 σ2 p and SSR is independent of SSE. Note the degrees of freedom of the 3 chi-square distributions dfT = n − 1; dfR = p; dfE = n − p − 1 break down similarly dfT = dfR + dfE just like SST = SSR + SSE. MLR - 15 Why SSE Has n − p − 1 Degrees of Freedom? The n residuals e1;:::; en cannot all vary freely. There are p + 1 constraints: n n X X ei = 0 and xki ei = 0 for k = 1;:::; p: i=1 i=1 So only n − (p + 1) of them can be freely varying. The p + 1 constraints comes from the p + 1 coefficients β0; : : : ; βp in the model, and each contributes one constraint @ = 0: @βk MLR - 16 Mean Square Error (MSE) | Estimate of σ2 The mean squares is the sum of squares divided by its degrees of freedom: SST SST MST = = = sample variance of Y ; dfT n − 1 SSR SSR MSR = = ; dfR p SSE SSE MSE = = = σ2 dfE n − p − 1 b SSE 2 2 I From the fact σ2 ∼ χn−p−1 and that the mean of a χk distribution is k, we know that MSE is an unbiased estimator for σ2. MLR - 17 Example: Housing Price Price BDR FLR FP RMS ST LOT BTH CON GAR LOC 53 2 967 0 5 0 39 1.5 1 0.0 0 55 2 815 1 5 0 33 1.0 1 2.0 0 Price = Selling price in $1000 56 3 900 0 5 1 35 1.5 1 1.0 0 BDR = Number of bedrooms 58 3 1007 0 6 1 24 1.5 0 2.0 0 FLR = Floor space in sq.