3. Linear Least-Squares Regression

Sociology 740 John Fox Lecture Notes 3. Linear Least-Squares Regression Copyright © 2014 by John Fox Linear Least-Squares Regression 1 1. Goals: I To review/introduce the calculation and interpretation of the least- squares regression coefficients in simple and multiple regression. I To review/introduce the calculation and interpretation of the regression standard error and the simple and multiple correlation coefficients. I To introduce and criticize the use of standardized regression coefficients I Time and interest permitting: To introduce matrix arithmetic and least-squares regession in matrix form. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 2 2. Introduction I Despite its limitations, linear least squares lies at the very heart of applied statistics: Some data are adequately summarized by linear least-squares • regression. The effective application of linear regression is expanded by data • transformations and diagnostics. The general linear model — an extension of least-squares linear • regression — is able to accommodate a very broad class of specifica- tions. Linear least-squares provides a computational basis for a variety of • generalizations (such as generalized linear models). I This lecture describes the mechanics and descriptive interpretation of linear least-squares regression. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 3 3. Simple Regression 3.1 Least-Squares Fit I Figure 1 shows Davis’s data on the measured and reported weight in kilograms of 101 women who were engaged in regular exercise. The relationship between measured and reported weight appears to • be linear, so it is reasonable to fit a line to the plot. I Denoting measured weight by \ and reported weight by [,aline relating the two variables has the equation \ = D + E[. No line can pass perfectly through all of the data points. A residual, H, • reflects this fact. The regression equation for the lth of the q = 101 observations is • \l = D + E[l + Hl = \l + Hl b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 4 Measured Weight (kg) Weight Measured 40 50 60 70 40 45 50 55 60 65 70 75 Reported Weight (kg) Figure 1. A least-squares line fit to Davis’s data on reported and measured weight. (The broken line is the line \ = [.) Some points are over-plotted. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 5 The residual • Hl = \l \l = \l (D + E[l) is the signed vertical distance between the point and the line, as showninFigure2. b I A line that fits the data well makes the residuals small. q Simply requiring that the sum of residuals, l=1 Hl, be small is futile, • since large negative residuals can offset large positive ones. P Indeed, any line through the point ([>\ ) has Hl =0. • I Two possibilities immediately present themselves:P Find D and E to minimize the absolute residuals, Hl , which leads • to least-absolute-values (LAV) regression. | | P 2 Find D and E to minimize the squared residuals, Hl , which leads to • least-squares (LS) regression. P c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 6 Y (Xi, Yi) ^ Y A BX Yi Ei ^ Yi 0 X Xi Figure 2. The residual Hl is the signed vertical distance between the point and the line. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 7 I In least-squares regression, we seek the values of D and E that minimize: q 2 2 V(D> E)= Hl = (\l D E[l) l=1 X X For those with calculus: • – The most direct approach is to take the partial derivatives of the sum-of-squares function with respect to the coefficients: CV(D> E) = ( 1)(2)(\l D E[l) CD CV(D> E) X = ( [l)(2)(\l D E[l) CE Setting these partial derivativesX to zero yields simultaneous linear • equations for D and E,thenormal equations for simple regression: Dq + E [l = \l 2 D [l + E X[l = X [l\l c 2014 by John Fox Sociology 740 ° X X X Linear Least-Squares Regression 8 Solving the normal equations produces the least-squares coefficients: • D = \ E[ q [ \ [ \ ([ [)(\ \ ) E = l l l l = l l 2 2 2 q [ ( [l) ([l [) P l P P P P P P – The formula for D implies that the least-squares line passes through the point-of-means of the two variables. The least-squares residuals therefore sum to zero. – The second normal equation implies that [lHl =0; similarly, \lHl =0. These properties imply that the residuals are uncorre- P latedwithboththe[’s and the \ ’s. P b b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 9 I For Davis’s data on measured weight (\ ) and reported weight ([): q = 101 5780 \ = =57=23 101 5731 [ = =56=74 101 ([l [)(\l \ ) = 4435= 2 X ([l [) = 4539= 4435 X E = =0=9771 4539 D =57=23 0=9771 56=74 = 1=789 × The least-squares regression equation is • Measured\ Weight =1=79 + 0=977 Reported Weight × c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 10 I Interpretation of the least-squares coefficients: E =0=977: A one-kilogram increase in reported weight is associated • on average with just under a one-kilogram increase in measured weight. – Since the data are not longitudinal, the phrase “a unit increase” here implies not a literal change over time, but rather a static comparison between two individuals who differ by one kilogram in their reported weights. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 11 Ordinarily, we may interpret the intercept D as the fitted value • associated with [ =0, but it is impossible for an individual to have a reported weight equal to zero. – The intercept D is usually of little direct interest, since the fitted value above [ =0is rarely important. – Here, however, if individuals’ reports are unbiased predictions of their actual weights, then we should have \ = [ — i.e., D =0.The intercept D =1=79 is close to zero, and the slope E =0=977 is close to one. b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 12 3.2 Simple Correlation I It is of interest to determine how closely the line fits the scatter of points. I The standard deviation of the residuals, VH, called the standard error of the regression, provides one index of fit. Because of estimation considerations, the variance of the residuals is • defined using q 2 degrees of freedom: H2 V2 = l H q 2 P The standard error is therefore • 2 Hl VH = sq 2 P Since it is measured in the units of the response variable, the standard • error represents a type of ‘average’ residual. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 13 For Davis’s regression of measured on reported weight, the sum of • 2 squared residuals is Hl = 418=9, and the standard error 418=9 VP= =2=05 kg. H 101 2 r I believe that social scientists overemphasize correlation and pay • insufficient attention to the standard error of the regression. I The correlation coefficient provides a relative measure of fit: To what degree do our predictions of \ improve when we base that prediction on the linear relationship between \ and [? A relative index of fit requires a baseline — how well can \ be • predicted if [ is disregarded? – To disregard the explanatory variable is implicitly to fit the equation \l = D0 + Hl0 – We can find the best-fitting constant D0 by least-squares, minimizing 2 2 V(D0)= H0 = (\l D0) l X X c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 14 – The value of D0 that minimizes this sum of squares is the response- variable mean, \ . The residuals Hl = \l \l from the linear regression of \ on [ • will generally be smaller than the residuals H0 = \l \ , and it is l necessarily the case that b 2 2 (\l \l) (\l \ ) – This inequality holds because the ‘null model,’ \ = D + H is X X l 0 l0 a special case of the moreb general linear-regression ‘model,’ \l = D + E[l + Hl,settingE =0. We call • 2 2 H0 = (\l \ ) l the total sum of squares for \ , abbreviated TSS,while X X 2 2 H = (\l \l) l is called the residual sumX of squaresX , and is abbreviated RSS. b c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 15 The difference between the two, termed the regression sum of • squares, RegSS TSS RSS gives the reduction in squared error due to the linear regression. The ratio of RegSS to TSS, the proportional reduction in squared error, • defines the square of the correlation coefficient: RegSS u2 TSS To find the correlation coefficient u we take the positive square root of • u2 when the simple-regression slope E is positive, and the negative square root when E is negative. If there is a perfect positive linear relationship between \ and [,then • u =1. A perfect negative linear relationship corresponds to u = 1. • If there is no linear relationship between \ and [,thenRSS= TSS, • RegSS =0,andu =0. c 2014 by John Fox Sociology 740 ° Linear Least-Squares Regression 16 Between these extremes, u gives the direction of the linear relationship • between the two variables, and u2 may be interpreted as the proportion of the total variation of \ that is ‘captured’ by its linear regression on [. Figure 3 depicts several different levels of correlation. • I The decomposition of total variation into ‘explained’ and ‘unexplained’ components, paralleling the decomposition of each observation into a fitted value and a residual, is typical of linear models.

3. Linear Least-Squares Regression

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support