3. Linear Least-Squares Regression
Total Page:16
File Type:pdf, Size:1020Kb
Sociology 740 John Fox Lecture Notes 3. Linear Least-Squares Regression Copyright © 2014 by John Fox Linear Least-Squares Regression 1 1. Goals: I To review/introduce the calculation and interpretation of the least- squares regression coefficients in simple and multiple regression. I To review/introduce the calculation and interpretation of the regression standard error and the simple and multiple correlation coefficients. I To introduce and criticize the use of standardized regression coefficients I Time and interest permitting: To introduce matrix arithmetic and least-squares regession in matrix form. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 2 2. Introduction I Despite its limitations, linear least squares lies at the very heart of applied statistics: Some data are adequately summarized by linear least-squares • regression. The effective application of linear regression is expanded by data • transformations and diagnostics. The general linear model — an extension of least-squares linear • regression — is able to accommodate a very broad class of specifica- tions. Linear least-squares provides a computational basis for a variety of • generalizations (such as generalized linear models). I This lecture describes the mechanics and descriptive interpretation of linear least-squares regression. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 3 3. Simple Regression 3.1 Least-Squares Fit I Figure 1 shows Davis’s data on the measured and reported weight in kilograms of 101 women who were engaged in regular exercise. The relationship between measured and reported weight appears to • be linear, so it is reasonable to fit a line to the plot. I Denoting measured weight by \ and reported weight by [,aline relating the two variables has the equation \ = D + E[. No line can pass perfectly through all of the data points. A residual, H, • reflects this fact. The regression equation for the lth of the q = 101 observations is • \l = D + E[l + Hl = \l + Hl b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 4 Measured Weight (kg) Weight Measured 40 50 60 70 40 45 50 55 60 65 70 75 Reported Weight (kg) Figure 1. A least-squares line fit to Davis’s data on reported and measured weight. (The broken line is the line \ = [.) Some points are over-plotted. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 5 The residual • Hl = \l \l = \l (D + E[l) is the signed vertical distance between the point and the line, as showninFigure2. b I A line that fits the data well makes the residuals small. q Simply requiring that the sum of residuals, l=1 Hl, be small is futile, • since large negative residuals can offset large positive ones. P Indeed, any line through the point ([>\ ) has Hl =0. • I Two possibilities immediately present themselves:P Find D and E to minimize the absolute residuals, Hl , which leads • to least-absolute-values (LAV) regression. | | P 2 Find D and E to minimize the squared residuals, Hl , which leads to • least-squares (LS) regression. P c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 6 Y (Xi, Yi) ^ Y A BX Yi Ei ^ Yi 0 X Xi Figure 2. The residual Hl is the signed vertical distance between the point and the line. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 7 I In least-squares regression, we seek the values of D and E that minimize: q 2 2 V(D> E)= Hl = (\l D E[l) l=1 X X For those with calculus: • – The most direct approach is to take the partial derivatives of the sum-of-squares function with respect to the coefficients: CV(D> E) = ( 1)(2)(\l D E[l) CD CV(D> E) X = ( [l)(2)(\l D E[l) CE Setting these partial derivativesX to zero yields simultaneous linear • equations for D and E,thenormal equations for simple regression: Dq + E [l = \l 2 D [l + E X[l = X [l\l c 2014 by John Fox Sociology 740 ° X X X Linear Least-Squares Regression 8 Solving the normal equations produces the least-squares coefficients: • D = \ E[ q [ \ [ \ ([ [)(\ \ ) E = l l l l = l l 2 2 2 q [ ( [l) ([l [) P l P P P P P P – The formula for D implies that the least-squares line passes through the point-of-means of the two variables. The least-squares residuals therefore sum to zero. – The second normal equation implies that [lHl =0; similarly, \lHl =0. These properties imply that the residuals are uncorre- P latedwithboththe[’s and the \ ’s. P b b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 9 I For Davis’s data on measured weight (\ ) and reported weight ([): q = 101 5780 \ = =57=23 101 5731 [ = =56=74 101 ([l [)(\l \ ) = 4435= 2 X ([l [) = 4539= 4435 X E = =0=9771 4539 D =57=23 0=9771 56=74 = 1=789 × The least-squares regression equation is • Measured\ Weight =1=79 + 0=977 Reported Weight × c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 10 I Interpretation of the least-squares coefficients: E =0=977: A one-kilogram increase in reported weight is associated • on average with just under a one-kilogram increase in measured weight. – Since the data are not longitudinal, the phrase “a unit increase” here implies not a literal change over time, but rather a static comparison between two individuals who differ by one kilogram in their reported weights. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 11 Ordinarily, we may interpret the intercept D as the fitted value • associated with [ =0, but it is impossible for an individual to have a reported weight equal to zero. – The intercept D is usually of little direct interest, since the fitted value above [ =0is rarely important. – Here, however, if individuals’ reports are unbiased predictions of their actual weights, then we should have \ = [ — i.e., D =0.The intercept D =1=79 is close to zero, and the slope E =0=977 is close to one. b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 12 3.2 Simple Correlation I It is of interest to determine how closely the line fits the scatter of points. I The standard deviation of the residuals, VH, called the standard error of the regression, provides one index of fit. Because of estimation considerations, the variance of the residuals is • defined using q 2 degrees of freedom: H2 V2 = l H q 2 P The standard error is therefore • 2 Hl VH = sq 2 P Since it is measured in the units of the response variable, the standard • error represents a type of ‘average’ residual. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 13 For Davis’s regression of measured on reported weight, the sum of • 2 squared residuals is Hl = 418=9, and the standard error 418=9 VP= =2=05 kg. H 101 2 r I believe that social scientists overemphasize correlation and pay • insufficient attention to the standard error of the regression. I The correlation coefficient provides a relative measure of fit: To what degree do our predictions of \ improve when we base that prediction on the linear relationship between \ and [? A relative index of fit requires a baseline — how well can \ be • predicted if [ is disregarded? – To disregard the explanatory variable is implicitly to fit the equation \l = D0 + Hl0 – We can find the best-fitting constant D0 by least-squares, minimizing 2 2 V(D0)= H0 = (\l D0) l X X c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 14 – The value of D0 that minimizes this sum of squares is the response- variable mean, \ . The residuals Hl = \l \l from the linear regression of \ on [ • will generally be smaller than the residuals H0 = \l \ , and it is l necessarily the case that b 2 2 (\l \l) (\l \ ) – This inequality holds because the ‘null model,’ \ = D + H is X X l 0 l0 a special case of the moreb general linear-regression ‘model,’ \l = D + E[l + Hl,settingE =0. We call • 2 2 H0 = (\l \ ) l the total sum of squares for \ , abbreviated TSS,while X X 2 2 H = (\l \l) l is called the residual sumX of squaresX , and is abbreviated RSS. b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 15 The difference between the two, termed the regression sum of • squares, RegSS TSS RSS gives the reduction in squared error due to the linear regression. The ratio of RegSS to TSS, the proportional reduction in squared error, • defines the square of the correlation coefficient: RegSS u2 TSS To find the correlation coefficient u we take the positive square root of • u2 when the simple-regression slope E is positive, and the negative square root when E is negative. If there is a perfect positive linear relationship between \ and [,then • u =1. A perfect negative linear relationship corresponds to u = 1. • If there is no linear relationship between \ and [,thenRSS= TSS, • RegSS =0,andu =0. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 16 Between these extremes, u gives the direction of the linear relationship • between the two variables, and u2 may be interpreted as the proportion of the total variation of \ that is ‘captured’ by its linear regression on [. Figure 3 depicts several different levels of correlation. • I The decomposition of total variation into ‘explained’ and ‘unexplained’ components, paralleling the decomposition of each observation into a fitted value and a residual, is typical of linear models.