M359; Least Squares Fitting of Experimental Data

MISN-0-359 LEAST SQUARES FITTING OF EXPERIMENTAL DATA by Robert Ehrlich 1. Introduction to Hypothesis Testing a. Overview . 1 b. Consistency Between Data and Theory . 1 c. Consistency Measured In Terms of Probability . 3 LEAST SQUARES FITTING d. Chi-Squared and Probability . 3 OF EXPERIMENTAL DATA e. An Example of Hypothesis Testing . 4 2. Â2 Test and Least Squares Fitting 3. Linear, One Adjustable Parameter y a. Formal Solution . 6 b. Example: Fitting ! Mass Data . 7 c. Whether the \Bad" Data Should be Dropped . 9 f (x) 3 4. Two Adjustable Parameters a. Formal Solution . 9 f2(x) b. Pseudo-Linear Least Square Fits . 11 c. Example: Acceleration Data . 12 d. Fit With Unknown Measurement Errors . .14 f (x) 1 5. Program for Least Squares Fits a. Input . 15 b. Sample Output . .15 x 6. A Project a. A Linear Least Squares Fit . 16 b. Goodness-of-Fit Test . 18 Acknowledgments . 18 A. Fortran, Basic, C++ Programs. .18 Project PHYSNET·Physics Bldg.·Michigan State University·East Lansing, MI 1 2 ID Sheet: MISN-0-359 THIS IS A DEVELOPMENTAL-STAGE PUBLICATION Title: Least Squares Fitting of Experimental Data OF PROJECT PHYSNET Author: R. Ehrlich, Physics Dept., George Mason Univ., Fairfax, VA The goal of our project is to assist a network of educators and scientists in 22030; (703)323-2303. transferring physics from one person to another. We support manuscript Version: 3/22/2002 Evaluation: Stage 0 processing and distribution, along with communication and information systems. We also work with employers to identify basic scienti¯c skills Length: 2 hr; 24 pages as well as physics topics that are needed in science and technology. A Input Skills: number of our publications are aimed at assisting users in acquiring such skills. 1. Vocabulary: cumulative probability, Gaussian distribution, per- cent error. Our publications are designed: (i) to be updated quickly in response to 2. Enter and run a computer program using advanced programming ¯eld tests and new scienti¯c developments; (ii) to be used in both class- techniques such as loops and arrays in FORTRAN (MISN-0-347) room and professional settings; (iii) to show the prerequisite dependen- or in BASIC. cies existing among the various chunks of physics knowledge and skill, 3. Take partial derivatives of simple functions of two variables as a guide both to mental organization and to use of the materials; and (MISN-0-201). (iv) to be adapted quickly to speci¯c user needs ranging from single-skill instruction to complete custom textbooks. Output Skills (Knowledge): New authors, reviewers and ¯eld testers are welcome. K1. Vocabulary: chi-squared (Â2), con¯dence level, correlation term, degrees of freedom, error bars, least squares ¯t, method of least PROJECT STAFF squares, random error of measurement, residual. K2. State how least squares ¯tting is used to determine unknown pa- Andrew Schnepp Webmaster rameters in a mathematical function used to ¯t data. Eugene Kales Graphics K3. Describe the chi-squared test for the goodness of ¯t of a mathe- Peter Signell Project Director matical function to a set of experimental data. Output Skills (Project): ADVISORY COMMITTEE P1. Enter and run a computer program to do a linear least squares ¯t to a set of experimental or simulated data. D. Alan Bromley Yale University E. Leonard Jossem The Ohio State University P2. Use the chi-squared value computed for the ¯t in P1 to determine A. A. Strassenburg S. U. N. Y., Stony Brook the goodness of ¯t and use the computed residuals to reject \bad" data points, then re-run the analysis with the \bad" data points Views expressed in a module are those of the module author(s) and are removed. not necessarily those of other project participants. External Resources (Required): °c 2002, Peter Signell for Project PHYSNET, Physics-Astronomy Bldg., 1. A computer with FORTRAN or BASIC. Mich. State Univ., E. Lansing, MI 48824; (517) 355-3784. For our liberal use policies see: http://www.physnet.org/home/modules/license.html. 3 4 MISN-0-359 1 MISN-0-359 2 LEAST SQUARES FITTING y OF EXPERIMENTAL DATA by Robert Ehrlich 1. Introduction to Hypothesis Testing 1a. Overview. In this module we discuss and use the \least squares" method for ¯tting a mathematical function to a set of experimental data. The various values in the data set result from unavoidable experimental Figure 1. A set of data which is error and from controlled variations of one or more independent param- reasonably consistent with func- eters by the experimentalist. The mathematical ¯tting function must be x tion y = f(x). a function of the same independent parameters as were varied in the experiment. Such a mathematical ¯tting function constitutes a hypothesis corresponding to n associated values of some independent variable x: about how the dependent (measured) quantity varies in nature as the in- x1; x2; :::; xn. In most experiments it is also possible to assign to each dependent variables change. Thus it is a hypothesis as to the theoretical data point (xj ; yj ), a \random error" (really an uncertainty) of measure- relationship involved. The least squares ¯tting method is accomplished by ment, σj, which depends on the precision of the measuring apparatus. varying one or more additional (\free") parameters in the mathematical There may also be an uncertainty in the independent x variable, but function so as to minimize the sum of the squares of the deviations of the we shall simplify the analysis by ignoring this possibility. Suppose we ¯tting function from the data at corresponding values of the independent wish to test whether x and y satisfy some particular functional relation- experimentally-varied parameters. The minimum value for the sum of the 2 ship y = f(x). A rough simple test of whether the data are consistent squared deviations is called \chi-squared" and is written x . The values with the hypothesized relationship can be made by plotting the function of the free parameters yielding the minimum square deviation sum de¯ne y = f(x) on the same graph as the data (x1; y1); :::; (xn; yn). The data the best ¯t to the experimental data, given the chosen mathematical form may be considered roughly consistent with the function y = f(x) if they for the assumed relationship among the data. A comparison of the value lie near the curve and are scattered on either side of it in a random looking of chi-squared for this ¯t with a standard chi-squared table enables us to manner (see Fig. 1). It is highly unlikely that the center points of all of determine the probability that deviations of the data from the theoretical the data will lie exactly on the curve, due to random measurement error: curve could be attributed to the kinds of random ﬂuctuations that are the data and the functional curve shown in Fig. 1 would be considered assumed to be present in the particular experimental measurement pro- quite consistent with each other. Note that the vertical \error bar" on cess. If this \goodness of ¯t" test yields a low probability of being due each data point extends above and below the point a distance equal to solely to the assumed random ﬂuctuations, then it could mean that there the random measurement \error" σj. A data point could be considered are either unexpected errors in the data or that the assumed theoretical \near" the curve if its distance to the curve in the y direction is no greater relationship is in error. A high probability would only mean that the than the assigned measurement error σj, in which case the error bar in- data are consistent with the theory, not that they and the relationship tersects the curve.This vertical distance from the data point to the curve are necessarily correct. is the \residual" for the point (xj ; yj ), and is given by 1b. Consistency Between Data and Theory. In analyzing data r = y ¡ f(x ): (1) from an experiment, we are generally trying to determine whether the j j j data are consistent with a particular theoretical relationship. Suppose the Note that the residual rj is positive or negative, depending on whether data consist of a set of n values of some measured quantity y: y1; y2; :::; yn, the data point is above or below the curve. 5 6 MISN-0-359 3 MISN-0-359 4 1c. Consistency Measured In Terms of Probability. The quan- Table 1. Chi-squared values for sample size and con¯dence level. t tity tj = rj/σj , is a measure of the discrepancy between the j h data Sample Con¯dence Level (P ) point and the theoretical curve. We might consider a set of data consis- Size (n) :90 :80 :50 :30 :01 tent with a curve even if the magnitudes of some of the residuals rj are 1 :0158 :0642 :455 1:074 6:635 greater than the corresponding measurement errors σj, which is the case 2 :211 :446 1:386 2:408 9:210 for two of the data points in Fig. 1. This is because the measurement 3 :584 1:005 2:366 3:665 11:345 error σj is usually de¯ned such that there is some speci¯c probability 4 1:064 1:649 3:357 4:878 13:277 (less than 100%) of any particular measurement falling within a range of 5 1:610 2:343 4:351 6:064 15:086 §σj of the true value. In fact, one way to experimentally determine the 6 2:204 3:070 5:348 7:231 16:812 measurement error σj is to make many repeated measurements of yj for 7 2:833 3:822 6:346 8:383 18:475 the same value of xj, and then ¯nd a value for σj such that a speci¯c 8 3:490 4:594 7:344 9:524 20:090 percentage (often chosen to be 68%) of all measurements fall within §σj 9 4:168 5:380 8:343 10:656 21:666 of the average value yj .

Load more