Geometric Constructions Relating Various Lines of Regression
Total Page:16
File Type:pdf, Size:1020Kb
Applied Mathematical Sciences, Vol. 10, 2016, no. 23, 1111 - 1130 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2016.512735 Geometric Constructions Relating Various Lines of Regression Brian J. McCartiny Applied Mathematics, Kettering University 1700 University Avenue, Flint, MI 48504-6214 USA Copyright c 2016 Brian J. McCartin. This article distributed under the Creative Com- mons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract As a direct consequence of the Galton-Pearson-McCartin Theorem [10, Theorem 2], the concentration ellipse provides a unifying thread to the Euclidean construction of various lines of regression. These include lines of coordinate regression [7], orthogonal regression [13], λ-regression [8] and (λ, µ)-regression [9] whose geometric constructions are afforded a unified treatment in the present paper. Mathematics Subject Classification: 51M15, 51N20, 62J05 Keywords: geometric construction, linear regression, concentration ellipse, Galton-Pearson-McCartin Theorem 1 Introduction In fitting a linear relationship [6] to a set of (x; y) data (Figure 1), it is many times assumed that one (`independent') variable, say x, is known exactly while the other (`dependent') variable, say y, is subject to error. The line L (Figure 2 2) is then chosen to minimize the total square vertical deviation Σdy. This is known as the line of regression of y-on-x. Reversing the roles of x and y 2 minimizes instead the total square horizontal deviation Σdx, thereby yielding the line of regression of x-on-y. Either method may be termed coordinate regression. yDeceased January 29, 2016 1112 Brian J. McCartin This procedure may be generalized to the situation where both variables are subject to independent errors with zero means, albeit with equal variances. 2 In this case, L is chosen to minimize the total square orthogonal deviation Σd? (Figure 2). This is referred to as orthogonal regression. 1 0.5 y 0 −0.5 −1 −1.5 −1 −0.5 0 0.5 1 1.5 x Figure 1: Linear Fitting of Discrete Data This approach may be further extended to embrace the case where the error 2 2 variances are unequal. Defining σu = variance of x-error, σv = variance of y- 2 2 error and λ = σv /σu, a weighted least squares procedure [1, 16] is employed 1+m2 2 whereby λ+m2 · Σd? is minimized in order to determine L. This is referred to as λ-regression. Finally, if the errors are correlated with puv = covariance of 2 1+m2 2 errors and µ = puv/σu then λ−2µm+m2 · Σd? is minimized in order to determine L. This is referred to as (λ, µ)-regression. In the next section, the concept of oblique linear least squares approximation is introduced (Figure 3). In this mode of approximation, one first selects a direction specified by the slope s. One then obliquely projects each data point in this direction onto the line L. Finally, L is then chosen so as to minimize 2 the total square oblique deviation Σdi . It is known that this permits a unified treatment of all of the previously described flavors of linear regression [10]. Galton [7] first provided a geometric interpretation of coordinate regres- sion in terms of the best-fitting concentration ellipse which has the same first moments and second moments about the centroid as the experimental data [3]. Pearson [13] then extended this geometric characterization to orthogonal regression. McCartin [8] further extended this geometric characterization to Geometric construction of regression lines 1113 Figure 2: Coordinate and Orthogonal Regression λ-regression. Finally, McCartin [9] succeeded in extending this geometric char- acterization to the general case of (λ, µ)-regression. In a similar vein, a geomet- ric characterization of oblique linear least squares approximation [10] further generalizes these Galton-Pearson-McCartin geometric characterizations of lin- ear regression. This result is known as the Galton-Pearson-McCartin Theorem and is reviewed below (Theorem 1). The primary purpose of the present paper is to utilize the Galton-Pearson- McCartin Theorem to provide a unified approach to the Euclidean construction of these various lines of regression. As such, the concentration ellipse will serve as the adhesive which binds together these varied geometric constructions. However, before this can be accomplished, the basics of oblique least squares, linear regression and the concentration ellipse must be reviewed. 2 Oblique Least Squares n Consider the experimentally `observed' data f(xi; yi)gi=1 where xi = Xi +ui and yi = Yi + vi. Here (Xi;Yi) denote theoretically exact values with corre- sponding random errors (ui; vi). It is assumed that E(ui) = 0 = E(vi), that 2 2 successive observations are independent, and that V ar(ui) = σu, V ar(vi) = σv , Cov(ui; vi) = puv irrespective of i. Next, define the statistics of the sample data corresponding to the above population statistics. The mean values are given by n n 1 X 1 X x¯ = xi;y ¯ = yi; (1) n i=1 n i=1 1114 Brian J. McCartin Figure 3: Oblique Projection the sample variances are given by n n 2 1 X 2 2 1 X 2 σx = (xi − x¯) ; σy = (yi − y¯) ; (2) n i=1 n i=1 the sample covariance is given by n 1 X pxy = (xi − x¯) · (yi − y¯); (3) n i=1 and (assuming that σx · σy 6= 0) the sample correlation coefficient is given by pxy rxy = : (4) σx · σy 2 Note that if σx = 0 then the data lie along the vertical line x =x ¯ while 2 if σy = 0 then they lie along the horizontal line y =y ¯. Hence, we assume 2 2 without loss of generality that σx · σy 6= 0 so that rxy is always well defined. Furthermore, by the Cauchy-Buniakovsky-Schwarz inequality, 2 2 2 pxy ≤ σx · σy (i.e. − 1 ≤ rxy ≤ 1) (5) with equality if and only if (yi − y¯) / (xi − x¯), in which case the data lie on the line 2 pxy σy y − y¯ = 2 (x − x¯) = (x − x¯); (6) σx pxy since pxy 6= 0 in this instance. Thus, we may also restrict −1 < rxy < 1. Turning our attention to Figure 3, the oblique linear least squares approx- imation, L, corresponding to the specified slope s may be expressed as L : y − y¯ = m(x − x¯); (7) Geometric construction of regression lines 1115 since it is straightforward to show that it passes through the centroid of the data, (¯x; y¯) [10]. L is to be selected so as to minimize the total square oblique deviation n n X 2 X 2 2 S := di = [(xi − x∗) + (yi − y∗) ]: (8) i=1 i=1 Furthermore, this optimization problem may be reduced [10] to choosing m to minimize 2 n 1 + s X 2 S(m) = 2 · [m(xi − x¯) − (yi − y¯)] ; (9) (m − s) i=1 thereby producing the extremizing slope [10] 2 σy − s · pxy m = 2 : (10) pxy − s · σx Equations (7) and (10) together comprise a complete solution to the problem of oblique linear least squares approximation. 3 Linear Regression Some of the implications of oblique linear least squares approximation for linear regression are next considered, beginning with the case of coordinate regression. Setting s = 0 in Equation (10) produces regression of x on y: 2 + σy − mx := ; mx := 0: (11) pxy Likewise, letting s ! 1 in Equation (10) produces regression of y on x: + pxy − my := 2 ; my := 1: (12) σx Unsurprisingly, we see that the coordinate regression lines [7] are special cases of oblique regression. The more substantial case of orthogonal regression is now considered. Set- 1 ting s = − m in Equation (10) produces 2 2 2 pxy · m − (σy − σx) · m − pxy = 0; (13) whose roots provide both the best orthogonal regression line q (σ2 − σ2) + (σ2 − σ2)2 + 4p2 + y x y x xy m? = ; (14) 2pxy 1116 Brian J. McCartin as well as the worst orthogonal regression line q (σ2 − σ2) − (σ2 − σ2)2 + 4p2 − y x y x xy m? = : (15) 2pxy Thus, orthogonal regression [13] is also a special case of oblique regression. λ Turning now to the case of λ-regression, set s = − m in Equation (10): 2 2 2 pxy · m − (σy − λσx) · m − λpxy = 0; (16) whose roots provide both the best λ-regression line q (σ2 − λσ2) + (σ2 − λσ2)2 + 4λp2 + y x y x xy mλ = ; (17) 2pxy as well as the worst λ-regression line q (σ2 − λσ2) − (σ2 − λσ2)2 + 4λp2 − y x y x xy mλ = : (18) 2pxy Thus, λ-regression [8] is also a special case of oblique regression. µm−λ Proceeding to the case of (λ, µ)-regression, set s = m−µ in Equation (10): 2 2 2 2 2 (pxy − µσx) · m − (σy − λσx) · m − (λpxy − µσy) = 0; (19) whose roots provide both the best (λ, µ)-regression line q (σ2 − λσ2) + (σ2 − λσ2)2 + 4(p − µσ2)(λp − µσ2) + y x y x xy x xy y mλ,µ = 2 ; (20) 2(pxy − µσx) as well as the worst (λ, µ)-regression line q (σ2 − λσ2) − (σ2 − λσ2)2 + 4(p − µσ2)(λp − µσ2) − y x y x xy x xy y mλ,µ = 2 : (21) 2(pxy − µσx) Thus, (λ, µ)-regression [9] is also a special case of oblique regression. 4 Concentration Ellipse As established in the previous section, all of the garden varieties of linear regression are subsumed under the umbrella of oblique linear least squares ap- proximation. The generalization of the Galton-Pearson-McCartin geometric characterizations of linear regression to oblique linear least squares approxi- mation is now considered.