Introduction to Curve Fitting & Linear Regression

2/18/2020 Computational Science: Computational Methods in Engineering Introduction to Curve Fitting & Linear Regression Outline • Introduction • Statistics of Data Sets • Best Fit Methods • Linear Regression (ugly math) • Linear Least Squares (clean math) 2 1 2/18/2020 Introduction 3 What is Curve Fitting? Curve fitting is simply fitting an analytical equation to a set of measured data. data sample fit curve 2 x C fx A Be D “Curve fitting” determines the values of A, B, C, and D so that f(x) best represents the given data. 4 2 2/18/2020 Why Fit Data to a Curve? • Estimate data between discrete values (interpolation) • Find a maximum or minimum. • Deriving finite-difference approximations. • Fit measured data to an analytical equation to extract meaningful parameters. • Remove noise from a function. • Observe and quantify general trends. 5 Two Categories of Curve Fitting Best Fit – Measured data has noise so the curve Exact Fit – Data samples are assumed to be exact does not attempt to intercept every point. and the curve is forced to pass through each one. • Linear regression (ugly math) • Fitting to polynomials • Linear least-squares (clean math) • Nonlinear regression (moderate math) 6 3 2/18/2020 Statistics of Data Sets 7 Arithmetic Mean If there was a single number that best represents an entire set of data, the arithmetic mean would be it. M ff1 2 fM 1 favg fm M M m1 8 4 2/18/2020 Geometric Mean The geometric mean is defined as M fgm ff 1 2 fM The arithmetic mean tends to suppress the significance of outlying data samples. With the geometric mean, even a single small value among many large values can dominate the mean. This is useful in optimizations where multiple parameters must be maximized at the same time and it is not acceptable to have any one of them low. 9 Variance & Standard Deviation Standard Deviation sf The standard deviation is a measure of the “spread” of the data about the mean. It is convenient because it shares the same units as the data. M 1 2 s ff m favg M m1 Variance vf Variance is used more commonly in calculations, but carries the same information as the standard deviation. M 2 1 2 vffs ff m avg M m1 10 5 2/18/2020 Coefficient of Variation The coefficient of variation (CV) is the standard deviation normalized to the mean. Think of it as “relative standard deviation.” s CV f favg 11 Linear Regression (Best Fit, Ugly Math) 12 6 2/18/2020 Goal of Linear Regression The goal of linear regression is to fit a straight line to a set of measured data that has noise. xy1,1 , xy 2, 2 , , xyM, M y a0 ax1 13 Statement of Problem Given a set of measured data points: (x1,y1), (x2,y2), …, (xM,yM), The equation of the line is written for each point. y1 a 0 ax 1 1 e1 y a ax 2 0 1 2 e2 To be completely correct, an error term e is introduced called the residual. yM a0 ax 1 M eM It is desired to determine values of a0 and a1 such that the residual terms em are as small as possible. 14 7 2/18/2020 Criteria for “Best Fit” A single quantity is needed that measures how “good” the line fits the set of data. Guess #1 – Sum of Residuals M E e This does not work because negative and positive residuals can m cancel and mislead the overall criteria to think there is no error. m1 Guess #2 – Sum of Magnitude of Residuals M E em This does not work because it does not lead to a unique best fit. m1 Guess #3 – Sum of Squares of Residuals M 2 E em This works and leads to a unique solution. m1 15 Equation for Criterion The line equation for the mth sample is ym a0 ax 1 m em Solving this for the residual em gives em ym a0 ax 1 m This is the value of y of the line at point xm. This is the measured value of y. From this, the error criterion is written as M M M 2 2 2 Eem ymeasured,m y line, m yma 0 a 1 x m mm1 1 m 1 16 8 2/18/2020 Least-Squares Fit It is desired to minimize the error criterion E. Minimums can be identified where the first-order derivative is zero. E E 0 and 0 a0 a 1 Values of a0 and a1 are sought that satisfy these equations. This approach is solving the problem by least-squares (i.e. minimizing the squares of the residuals). 17 The Fun Math Step 1 – Differential E with respect to each of the unknowns. M M E 2 E 2 y a ax ym a0 ax 1 m m0 1 m a0 a 0 m1 a1 a 1 m1 M M 2 2 y a ax ym a0 ax 1 m m0 1 m m1 a0 m1 a1 M M 2 y aax x 2ym a0 ax 1 m 1 m0 1 mm m1 m1 M M 2 y a ax x 2ym a0 ax 1 m m0 1 mm m1 m1 18 9 2/18/2020 The Fun Math Step 2 – Set the derivatives to zero to locate the minimum of E. E E 0 0 a a0 1 M M 2 y a ax x 2ym a0 ax 1 m m0 1 mm m1 m1 M M M M y a ax x ym a0 ax 1 m m0 1 mm m1 m 1 m 1 m1 M M M M M yx a x ax2 ym Ma0 ax 1 m mm0 m 1 m m1 m 1 m1 mm 1 1 19 The Fun Math Step 3 – Write these as two simultaneous equations. These are called the normal equations. E E 0 0 a a0 1 M M 2 y a ax x 2ym a0 ax 1 m m0 1 mm m1 m1 M M M M y a ax x ym a0 ax 1 m m0 1 mm m1 m 1 m 1 m1 M M M M M yx a x ax2 ym Ma0 a 1 x m mm0 m 1 m m1 m 1 m1 mm 1 1 M M M M M 2 Maa0 1 xm y m axax0m 1 m yx mm m1 m 1 m1 m 1 m 1 20 10 2/18/2020 The Fun Math Step 4 – The normal equations are solved simultaneously and the solution is M M M Mxymm x m y m a y ax m1 mm 1 1 0 avg 1 avg a1 M M 2 2 Mxm x m m1 m 1 1 M xavg xm M m1 Yikes! M 1 There has to be an easier way! yavg ym M m1 21 Linear Least-Squares (Best Fit, Clean Math) 22 11 2/18/2020 Statement of Problem It is desired to fit a set of M measured data points to a curve containing N + 1 terms: f az0 0 az 1 1 az 2 2 azN N f measured value zn parameters from which f is evaluated an coefficients for the curve fit 23 Formulation of Matrix Equation Start by writing the function f for each of the M measurements. The residual terms are also incorporated. faz1 0 0,1 az 1 1,1 azN N ,1 e1 f2 az 00,2 az 11,2 azN N ,2 e2 fM a0 z 0, M a 1 z1, M a NNMz , eM This large set of equations is put into matrix form. f1 zz0,1 1,1 zN ,1 e1 f zz z a e 2 0,2 1,2 N ,2 0 2 f Z a e f zz z a e 3 0,3 1,3 N ,3 1 3 or f Za e fM 1 zz0,M 1 1, M 1 z NM , 1 aN eM 1 fM z0,M z1, M z NM , eM 24 12 2/18/2020 Formulation of Solution by Least-Squares (1 of 4) Step 1 – Solve matrix equation for e. f Za e e f Za e Step 2 – Calculate the error criterion E from e. 1 e 2 M 2 e3 T E em eee1 2 3 eM 1 e M e e m1 e M 1 eM Step 3 – Substitute the equation for e from Step 1 into the equation for E from Step 2. E eT e f Za T f Za 25 Formulation of Solution by Least-Squares (2 of 4) Step 4 – The new matrix equation is algebraically manipulated as follows in order to make it easier to find its first-order derivative. E f Za T f Za original equation fT a TT Z f Za distribute the transpose T T TT TT ff fZa aZf aZZa expand equation These are scalars and transposes of each other so they are equal. fT f2 a TT Z f a TT Z Za combine terms 26 13 2/18/2020 Formulation of Solution by Least-Squares (3 of 4) Step 5 – Differentiate E with respect to a. It is desired to determine a that minimizes E. This can be accomplished using the first-derivative rule. E fT f2 a TT Z f a TT Z Za E fT f 2 a TT Z f a TT Z Za substitute in expression for E a a fT f 2aTT Z f a TT Z Za f is not a function of a a a a 2ZT f 2 Z T Za finish differentiation 27 Formulation of Solution by Least-Squares (4 of 4) Step 6 – Find the value of a that makes the derivative equal to zero.

Introduction to Curve Fitting & Linear Regression

Unsupervised Contour Representation and Estimation Using B-Splines and a Minimum Description Length Criterion Mário A

Chapter 26: Mathcad-Data Analysis Functions

Curve Fitting Project

Nonlinear Least-Squares Curve Fitting with Microsoft Excel Solver

(EU FP7 2013 - 612218/3D-NET) 3DNET Sops HOW-TO-DO PRACTICAL GUIDE

A Grid Algorithm for High Throughput Fitting of Dose-Response Curve Data Yuhong Wang*, Ajit Jadhav, Noel Southal, Ruili Huang and Dac-Trung Nguyen

Fitting Conics to Noisy Data Using Stochastic Linearization

Investigating Bias in the Application of Curve Fitting Programs To

Curve Fitting – Least Squares

Least Squares Analysis and Curve Fitting

Fitting Ellipses and General Second-Order Curves

Curvefitting