ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression © Prof

ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression © Prof. Mohamad Hassoun This lecture covers the following topics: Introduction Linear least-squares-Error (LSE) regression: The straight-line model Linearization of nonlinear models General linear LSE regression and the polynomial model Polynomial regression with Matlab: polyfit Non-linear LSE regression Numerical solution of the non-linear LSE optimization problem: Gradient search and Matlab’s fminsearch function Solution of differential equations based on LSE minimization Appendix: Explicit matrix formulation for the quadratic regression problem Introduction In the previous lecture, polynomial and cubic spline interpolation methods were introduced for estimating a value between a given set of precise data points. The idea was to (interpolate) “fit” a function to the data points so as to perfectly pass through all data points. Many engineering and scientific observations are made by conducting experiments in which physical quantities are measured and recorded as inexact (noisy) data points. In this case, the objective would be to find the best-fit analytic curve (model) that approximates the underlying functional relationship present in the data set. Here, the best-fit curve is not required to pass through the data points, but it is required to capture the shape (general trend) of the data. This curve fitting problem is referred to as regression. The following sections present formulations for the regression problem and provide solutions. The following figure compares two polynomials that attempt to fit the shown data points. The blue curve is the solution to the interpolation problem. The green curve is the solution (we seek) to the linear regression problem. Linear Least-Squares-Error (LSE) Regression: The Straight-Line Model The regression problem will first be illustrated for fitting the linear model (straight- line), 푦(푥) = 푎1푥 + 푎0, to a set of 푛 paired experimental observations: (푥1, 푦1), (푥2, 푦2), … , (푥푛, 푦푛). So, the idea here is to position the straight-line (i.e., to determine the regression coefficients 푎0 and 푎1) so that some error measure of fit is minimized. A common error measure is the sum-of-the-squares (SSE) of the residual errors 푒푖 = 푦푖 − 푦(푥푖), 푛 푛 푛 2 2 2 퐸(푎0, 푎1) = ∑ 푒푖 = ∑[푦푖 − 푦(푥푖)] = ∑[푦푖 − (푎1푥푖 + 푎0)] 푖=1 푖=1 푖=1 The residual error 푒푖 is the discrepancy between the measured value, 푦푖, and the approximate value 푦(푥푖) = 푎0 + 푎1푥푖, predicted by the straight-line regression model. The residual error for the 푖th data point is depicted in the following figure. A solution can be obtained for the regression coefficients, {푎0, 푎1}, that minimizes 퐸(푎0, 푎1). This criterion, 퐸, which is called least-squares-error (LSE) criterion, has a number of advantages, including that it yields a unique line for a given data set. Differentiating 퐸(푎0, 푎1) with respect to each of the unknown regression model coefficients, and setting the result to zero lead to a system of two linear equations, 푛 휕 퐸(푎0, 푎1) = 2 ∑(푦푖 − 푎1푥푖 − 푎0)(−1) = 0 휕푎0 푖=1 푛 휕 퐸(푎0, 푎1) = 2 ∑(푦푖 − 푎1푥푖 − 푎0)(−푥푖) = 0 휕푎1 푖=1 After expanding the sums, we obtain 푛 푛 푛 − ∑ 푦푖 + ∑ 푎0 + ∑ 푎1푥푖 = 0 푖=1 푖=1 푖=1 푛 푛 푛 2 − ∑ 푥푖푦푖 + ∑ 푎0푥푖 + ∑ 푎1푥푖 = 0 푖=1 푖=1 푖=1 푛 Now, realizing that ∑푖=1 푎0 = 푛푎0, and that multiplicative quantities that do not depend on the summation index 푖 can be brought outside the summation (i.e., 푛 푛 ∑푖=1 푎푥푖 = 푎 ∑푖=1 푥푖), we may rewrite the above equations as These are called the normal equations. We can solve for 푎1 using Cramer’s rule and for 푎0 by substitution (Your turn: Perform the algebra) to arrive at the following LSE solution: 푛 푛 푛 ∗ 푛 ∑푖=1 푥푖푦푖 − ∑푖=1 푥푖 ∑푖=1 푦푖 푎1 = 푛 2 푛 2 푛 ∑푖=1 푥푖 − (∑푖=1 푥푖) ∑푛 푦 ∑푛 푥 푎∗ = 푖=1 푖 − 푎∗ 푖=1 푖 0 푛 1 푛 ∗ ∗ Therefore, the regression model is the straight-line 푦(푥) = 푎1푥 + 푎0. ∗ ∗ The value 퐸(푎0, 푎1) represents the LSE value and will be referred to as 퐸퐿푆퐸 and expressed as 푛 ∗ ∗ 2 퐸퐿푆퐸 = ∑(푦푖 − 푎1푥푖 − 푎0) 푖=1 Any other straight-line will lead to an error 퐸(푎0, 푎1) > 퐸퐿푆퐸. Let the value of the sum-of-the-square of the difference between the 푦푖 values and ∑푛 푦 their average value, 푦̅ = 푖=1 푖, be 푛 푛 2 퐸푀 = ∑(푦푖 − 푦̅) 푖=1 Then, the (positive) difference 퐸푀 − 퐸퐿푆퐸 represents the improvement (where the smaller 퐸퐿푆퐸 is, the better) due to describing the data in terms of a straight-line, rather than as an average value (a straight-line with zero slope and 푦-intercept equals to 푦̅). The coefficient of determination, 푟2, is defined as the relative error between 퐸푀 and 퐸퐿푆퐸, 퐸 − 퐸 퐸 푟2 = 푀 퐿푆퐸 = 1 − 퐿푆퐸 퐸푀 퐸푀 For perfect fit, where the regression line goes through all data points, 퐸퐿푆퐸 = 0 and 푟2 = 1, signifying that the line explains 100% of the variability in the data. On the 2 other hand for 퐸푀 = 퐸퐿푆퐸 we obtain 푟 = 0, and the fit represents no improvement over a simple average. A value of 푟2 between 0 and 1 represents the extent of improvement. So, 푟2 = 0.8 indicates that 80% of the original uncertainty has been ∗ ∗ explained by the linear model. Using the above expressions for 퐸퐿푆퐸, 퐸푀, 푎0 and 푎1 one may derive the following formula for the correlation coefficient, 푟, (your turn: Perform the algebra) 퐸 − 퐸 푛 ∑(푥 푦 ) − (∑ 푥 )(∑ 푦 ) 푟 = √ 푀 퐿푆퐸 = 푖 푖 푖 푖 퐸 푀 2 2 2 2 √푛 ∑ 푥푖 − (∑ 푥푖) √푛 ∑ 푦푖 − (∑ 푦푖) where all sums are performed from 1 to 푛. Example. Fit a straight-line to the data provided in the following table. Find 푟2. x 1 2 3 4 5 6 7 y 2.5 7 38 55 61 122 110 Solution. The following Matlab script computes the linear regression coefficients, ∗ ∗ 푎0 and 푎1, for a straight-line employing the LSE solution. x=[1 2 3 4 5 6 7]; y=[2.5 7 38 55 61 122 110]; n=length(x); a1=(n*sum(x.*y)-sum(x)*sum(y))/(n*sum(x.^2)-(sum(x)).^2) a0=sum(y)/n-a1*sum(x)/n ∗ ∗ The solution is 푎1 = 20.5536 and 푎0 = −25.7143. The following plot displays the data and the regression model, 푦(푥) = 20.5536푥 − 25.7143. The following script computes the correlation coefficient, 푟. x=[1 2 3 4 5 6 7]; y=[2.5 7 38 55 61 122 110]; n=length(x); r=(n*sum(x.*y)-sum(x)*sum(y))/((sqrt(n*sum(x.^2)- ... (sum(x))^2))*(sqrt(n*sum(y.^2)-(sum(y))^2))) The script returns 푟 = 0.9582 (so, 푟2 = 0.9181). These results indicate that about 92% of the variability in the data has been explained by the linear model. A word of caution: Although the coefficient of determination provides a convenient measure of the quality of fit, you should be careful not to rely on it completely. For, it is possible to construct data sets that will similar 푟2 values, while the regression line is not well positioned for some sets. A good practice would be to visually inspect the plot of the data along with the regression curve. The following example illustrates these ideas. Example. Anscombe's quartet comprises four datasets that have 푟2 ≅ 0.666, yet appear very different when graphed. Each dataset consists of eleven (푥푖, 푦푖) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. Notice that if we are to ignore the outlier point in the third data set, then the regression line would be perfect, with 푟2 = 1. Your turn: Employ linear regression to generate the above plots and determine 푟2 for each of the Anscombe’s data sets. Linearization of Nonlinear Models The straight-line regression model is not always suitable for curve fitting. The choice of regression model is often guided by the plot of the available data, or can be guided by the knowledge of the physical behavior of the system that generated the data. In general, polynomial or other nonlinear models are more suitable. A nonlinear regression technique (introduced later) is available to fit complicated nonlinear equations to data. However, some basic nonlinear functions can be readily transformed into linear functions in their regression coefficients (we will refer to such functions as transformable or linearizable). Here, we can take advantage of the LSE regression formulas, which we have just derived, to fit the transformed equations to the data. One example of a linearizable linear model is the exponential model, 푦(푥) = 훼푒훽푥, where 훼 and 훽 are constants. This equation is very common in engineering (e.g., capacitor transient voltage) and science (e.g., population growth or radioactive decay). We can linearize this equation by simply taking its natural logarithm to yield: ln(푦) = ln(훼) + 훽푥. Thus, if we transform the 푦푖 values in our data, by taking their natural logarithms, and define 푎0 = ln(훼) and 푎1 = 훽 we arrive at the equation of a straight-line (of the form 푌 = 푎0 + 푎1푥). Then, we can readily use the ∗ ∗ formulas for the LSE solution, (푎0, 푎1), derived earlier. The final step would be to ∗ 푎0 ∗ set 훼 = 푒 and 훽 = 푎1 and arrive at the regression solution, ∗ ∗ 푦(푥) = 훼푒훽푥 = (푒푎0)푒푎1푥 A second common linearizable nonlinear regression model is the power model, 푦(푥) = 훼푥훽.

Load more