<<

Chemometrics in Spectroscopy Linearity in Calibration: How to Test for Non-linearity Previous methods for linearity testing discussed in this series contain certain shortcom- ings. In this installment, the authors describe a method they believe is superior to others.

Howard Mark and Jerome Workman Jr.

n the previous installment of “Chemometrics in Spec- instrumental methods have to produce a number, represent- troscopy” (1), we promised we would present a descrip- ing the final answer for that instrument’s quantitative assess- I tion of what we believe is the best way to test for linearity ment of the concentration, and that is the test result from (or non-linearity, depending upon your point of view). In the that instrument. This is a univariate concept to be sure, but first three installments of this column series (1–3) we exam- the same concept that applies to all other analytical methods. ined the Durbin–Watson (DW) statistic along with other Things may change in the future, but this is currently the way methods of testing for non-linearity. We found that while the analytical results are reported and evaluated. So the question Durbin–Watson statistic is a step in the right direction, but we to be answered is, for any given method of analysis: Is the also saw that it had shortcomings, including the fact that it relationship between the instrument readings (test results) could be fooled by data that had the right (or wrong!) charac- and the actual concentration linear? teristics. The method we present here is mathematically This method of determining non-linearity can be viewed sound, more subject to statistical validity testing, based upon from a number of different perspectives, and can be consid- well-known mathematical principles, consists of much higher ered as coming from several sources. One way to view it is as statistical power than DW, and can distinguish different types having a pedigree as a method of numerical analysis (5). of non-linearity from one another. This new method also has Our new method of determining non-linearity (or show- been described recently in the literature (4). ing linearity) also is related to our discussion of , But let us begin by discussing what we want to test. The particularly when using the Savitzky–Golay method of con- FDA/ICH guidelines, starting from a univariate perspective, volution functions, as we discussed recently (6). This last is considers the relationship between the actual analyte concen- not very surprising, once you consider that the tration and what they generically call the “test result,” a term Savitzky–Golay convolution functions also are (ultimately) that is independent of the technology used to ascertain the derived from considerations of numerical analysis. analyte concentration. This term therefore holds good for In some ways it also bears a resemblance to the current every analytical methodology from manual wet chemistry to method of assessing linearity that the FDA and ICH guidelines the latest high-tech instrument. In the end, even the latest recommend, that of fitting a straight to the data, as assess- ing the goodness of the fit. As we have shown (2, 3), based Jerome Workman Jr. serves on the Editorial Advisory Board upon the work of Anscombe (7), the currently recommended of Spectroscopy and is director of research, technology, and applica- method for assessing linearity is faulty because it cannot dis- tions development for the Molecular Spectroscopy & Microanalysis tinguish linear from non-linear data, nor can it distinguish division of Thermo Electron Corp. He can be reached by e-mail at: between non-linearity and other types of defects in the data. [email protected]. Howard Mark serves on the But an extension of that method can. Editorial Advisory Board of Spectroscopy Expanding a Definition and runs a consulting In our recent column we proposed a definition of linearity service, Mark (2). We defined linearity as “The property of data comparing Electronics (Suffern, test results to actual concentrations, such that a straight line NY). He can be provides as good a fit (using the least-squares criterion) as any reached via e-mail at: other mathematical .”This almost seems to be the [email protected]. same as the FDA/ICH approach, which we have just discred- ited. But there is a difference. The difference is the question of

26 Spectroscopy 20(9) September 2005 www.spectroscopyonline.com Chemometrics in Spectroscopy

Table I. The results of applying the new method of detecting non-linearity to Anscombe’s data sets (linear and non-linear). Parameter Coefficient when using t-value when using Coefficient using t-value using only linear term only linear term square term square term Results for non-linear data Constant 3.000 4.268 Linear term 0.500 4.24 0.5000 3135.5 Square term -- -- -0.1267 -2219.2

SEE 1.237 0.0017 R 0.816 1.0

Results for normal data Constant 3.000 3.316 Linear term 0.500 4.24 0.500 4.1 Square term -- -- -0.0316 -0.729

SEE 1.237 1.27 R 0.816 0.8291

fitting other possible functions to the do it. Many texts exist dealing with this highest power to which the variable is data; the FDA/ICH guidelines only subject, but we will follow the presenta- raised in that ). If we have specify trying to fit a straight line to the tion of Arden (5). Arden points out and chosen the wrong function, then there data. This also is more in line with our discusses in detail many applications of might be some error in the estimate of own proposed definition of linearity. numerical analysis: fitting data, deter- data between the known data points, We can try to fit functions other than a mining derivatives and integrals, inter- but at the data points the error must be straight line to the data, and if we can- polation (and extrapolation), solving zero. A good deal of mathematical not obtain an improved fit, we can con- systems of equations, and solving dif- analysis goes into estimating the error clude that the data is linear. ferential equations. These methods are that can occur between the data points. But it also is possible to fit other func- all based on using a Taylor series to tions to a set of data, using least-squared form an approximation to a function Approximation mathematics. In fact, this is what the describing a set of data. The nature of The concepts of interest to us are con- Savitzky–Golay method does. The the data, and the nature of the approxi- tained in Arden’s book in a chapter ti- Savitzky–Golay algorithm, however, does mation considered differs from what we tled “Approximation.”This chapter takes a whole bunch of things, and lumps all are used to thinking about, however. a slightly different tack than the rest of those things together in a single set of The data is assumed to be univariate the discussion, but one that goes exactly convolution coefficients: it includes (which is why this is of interest to us in the direction that we want to go. In smoothing, differentiation, -fitting here) and to follow the form of some this chapter, the scenario described of of various degrees, least- mathematical function, although we above is changed very slightly. There is squares calculations, does not include might not know what the function is. still the assumption that there is a single (although it could), and All the applications mentioned there- (univariate) mathematical system (cor- finally combines all those operations fore are based upon the concept that responding to “analyte concentration” into a single set of numbers that you can since a function exists, our task is to and “test reading”), and that there is a multiply your measured data by to get estimate the nature of that function, functional relationship between the two the desired final answer directly. using a Taylor series, and then evaluate variables of interest, although again, the For our purposes, though, we don’t the parameters of the function by nature of the relationship might be un- want to lump all those operations imposing the condition that our known. The difference, however, is the together. Rather, we want to separate approximating function must pass recognition that data might have error, them and retain only those operations through all the data points available, and therefore we no longer impose the that are useful for our own purposes. because those data points all are condition that the function we arrive at For starters, we discard the smoothing, described exactly by that function. must pass through every data point. We derivatives, and performing a successive Using a Taylor series implies that the replace that criterion with a different (running) fit over different portions of approximating function that we wind criterion — the criterion we use is one the data set, and keep only the curve- up with will be a polynomial, and per- that will allow us to say that the func- fitting. Texts dealing with numerical haps one of very high degree (the tion we use to describe the data “fol- analysis tell us what to do and how to “degree” of a polynomial being the lows” the data in some sense. While

28 Spectroscopy 20(9) September 2005 www.spectroscopyonline.com Chemometrics in Spectroscopy

other criteria can be used, a common criterion used for this ai and setting each of those derivatives equal to zero. Note that purpose is the “least squares” principle: to find parameters for because there are n + 1 different ai we wind up with n + 1 any given function that minimize the sum of the squares of equations, although we only show the first three of the set: the differences between the data and a corresponding point of the function. [4a] Similarly, many different types of functions can be used. Arden discusses, for example, the use of Chebyshev polyno- mials, which are based upon trigonometric functions (sines [4b] and cosines). But these polynomials have a major limitation: they require the data to be collected at uniform X-intervals throughout the range of X, and real data will seldom meet [4c] that criterion. Therefore, because they also are by far the sim- plest to deal with, the most widely used approximating func- tions are simple polynomials; they also are convenient in that Now we actually take the indicated of each term ∂ ∑ 2 ∑ ∂ they are the direct result of applying Taylor’s theorem, where- and separate the . Noting that ( i F ) = 2 iF F by Taylor’s theorem produces a description of a polynomial (where F is the inner of the ai X): that estimates the function being reproduced: Also, as we will see, they lead to a procedure that can be applied to data hav- [5a] ing any distribution of the X-values.

[1] [5b]

While discussing derivatives, we have noted in a previous column that for certain data a polynomial can provide a bet- [5c] ter fit to that data than can a straight line (see Figure 6b of reference 8). In fact, we reproduce that Figure 6b here again etc. as Figure 1 in this column, for ease of reference. Higher degree polynomials might provide an even better fit, if the Dividing both sides of Equations 5 by 2 eliminates the con- data requires it. Arden points this out, and also points out stant term and subtracting the term involving Y from each that, for example, in the non-approximation case (assuming side of the resulting equations puts the equations in their final exact functionality), if the underlying function is in fact itself form: a polynomial of degree n, then no higher degree polynomial is needed in that case, and in fact, it is impossible to fit a [6a] higher degree polynomial to the data. Even if an attempt is made to do so, the coefficients of any higher-degree terms will be zero. For functions other than polynomials the “best” [6b] fit might not be clear, but as we will see, that will not affect our efforts here. The mathematics of fitting a polynomial by least squares [6c] are relatively straightforward, and we present a derivation here, one that follows Arden, but is rather generic, as we will etc. see: Starting from Equation 1, we want to find coefficients (the ai) that minimize the sum-squared difference between The values of X and Y are known, since they constitute the the data and the function’s estimate of that data, given a set data. Therefore Equations 6a–c comprise a set of n + 1 equa- of values of X. Therefore we first form the differences: tions in n + 1 unknowns, the unknowns being the various val- ues of the ai because the summations, once evaluated, are con- [2] stants. Therefore, solving Equations 6a–c as simultaneous equations for the ai results in the calculation of the coefficients We then square those differences and sum those squares over that describe the polynomial (of degree n) that best fits the all the sets of data (corresponding to the samples used to gen- data. erate the data): In principle, the relationships described by Equations 6a–c could be used directly to construct a function that relates test [3] results to sample concentrations. In practice there are some important considerations that must be taken into account. The problem now is to find a set of values for the ai that mini- The major consideration is the possibility of correlation Σ 2 mizes D with respect to each ai. We do this by the usual between the various powers of X. We find, for example, that procedure of taking the derivative of ΣD2 with respect to each the correlation coefficient of the integers 1–10 with their

30 Spectroscopy 20(9) September 2005 www.spectroscopyonline.com squares is 0.974 — a rather high value. the data can be fitted better by any poly- tion between the various powers of X in Arden describes this mathematically and nomial of degree greater than 1, than it a polynomial, based upon the use of or- shows how the determinant of the can by a straight line (which is a polyno- thogonal Chebyshev polynomials, as we matrix formed by Equations 6a–c mial of degree 1). To this end we need to briefly mentioned above. But this become smaller and smaller as the num- test a polynomial of any higher degree. method is unnecessarily complicated for ber of terms included in Equation 1 While in some cases the use of more terms our current purposes, and in any case increases, due to correlation between the might be warranted, in the limit we need has limitations of its own. When applied various powers of X. Arden is concerned test only the ability to fit the data using to actual data, Chebyshev and other with computational issues, and his con- only one term of degree greater than 1. types of orthogonal polynomials (Le- cern is that the determinant will become Hence, while in general we might wish to gendre, Jacobi and others) that could be so small that operations such as matrix try fitting equations of degrees 2, 3,...m used will be orthogonal only if the data inversion will be come impossible to (where m is some upper limit less than n), is uniformly, or at least symmetrically, perform because of truncation error in we can begin by using polynomials of distributed; real data will not always the computer used. Our concerns are degree 2, that is, quadratic fits. meet that requirement. not so severe; as we will see, we are not A complication arises. We learn from Because, as we shall see, we do not likely to run into such drastic problems. considerations of multiple regression need to deal with the general case, we Nevertheless, correlation effects still are analysis, that when two (or more) vari- can use a simpler method to orthogo- of concern for us, for another reason. Our ables are correlated, the standard error of nalize the variables, based upon Daniel goal, recall, is to formulate a method of both variables is increased over what and Wood, who showed how a variable testing linearity in such a way that the would be obtained if equivalent but can be transformed so that the square results can be justified statistically. uncorrelated variables are used. This is of that variable is uncorrelated with the Ultimately we will want to perform statis- discussed by Daniel and Wood (see page variable. This is a matter of creating a tical testing on the coefficients of the fit- 55 in reference 9), who show that the new variable by simply calculating a ting function that we use. In fact, we will variance of the estimates of coefficients quantity Z and subtracting that from want to use a t-test to see whether any (their standard errors) is increased by a each of the original values of X.A sym- given coefficient is statistically significant, factor of metric distribution of the data is not compared to the standard error of that required since that is taken into account coefficient. We do not need to solve the [7] in the formula. Z is calculated using the general problem, however, just as we do expression (see page 121 in reference 9). not need to create the general solution when there is correlation between the In Appendix A we present the deriva- implied by Equation 1. In the broadest variables, where R represents the corre- tion of this formula: sense, Equation 1 is the basis for comput- lation coefficient between the variables ing the best-fitting function to a given set and we use the term VIF, as is some- of data, but that is not our goal. Our goal done, to mean variance inflation is only to determine whether the data rep- factor. Thus we would like to use uncor- [8] resents a or not. To this related variables. Arden describes a gen- end it suffices only to ascertain whether eral method for removing the correla-

Then the set of values (X–Z)2 will be uncorrelated with X, and estimates of 0.0015 the coefficients will have the minimum Parabola 0.0005 possible variance, making them suitable for statistical testing. In appendix A we -0.0005 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 also present formulas for making the -0.0015 Second derivative cubes, quartics and, by induction, -0.0025 higher powers of X be orthogonal to the Response set of values of the variable itself. -0.0035 In his discussion of using these -0.0045 approximating polynomials, Arden -0.0055 presents a computationally efficient Wavelength method of setting up and solving the pertinent equations. But we are less concerned with abstract concepts of Figure 1. A quadratic polynomial can provide a better fit to a non- efficiency than we are with achieving linear function over a given region than a straight line can; in this case our goal of determining linearity. To the second derivative of a normal absorbance band. this end, we point out that Equations

September 2005 20(9) Spectroscopy 31 Chemometrics in Spectroscopy

6a–c, and indeed the whole derivation of them, is familiar to the necessary calculations. Furthermore, along with the val- us, although in a different context. We all are familiar with ues of the coefficients, we can obtain all the usual statistical using a relationship similar to Equation 1; in using spec- estimates of variances, standard errors, goodness of fit, etc., troscopy to perform quantitative analysis, one of the repre- that MLR programs produce for us. Of special interest is the sentations of the equation involved is: fact that MLR programs compute estimates of the standard errors of the coefficients, as described by Draper and Smith [9] (see, for example, page 129 in reference 11). This allows test- ing the statistical significance of each of the coefficients, which is the form we use commonly to represent the equa- which, as we recall, are now the coefficients of the various tions needed for performing quantitative spectroscopic analy- powers of X that comprise the polynomial we are fitting to sis using the MLR algorithm. The various Xi in Equation 9 the data. represent entirely different variables. Nevertheless, starting This is the basis of our tests for non-linearity. We need from Equation 9, we can derive the set of equations for calcu- not use polynomials of high degree because our goal is not lating the MLR calibration coefficients, in exactly the same necessarily to fit the data as well as possible. Especially way we derived Equations 6a–c from Equation 1. An example because we expect that well-behaved methods of chemical of this derivation is presented in [10]. analysis will produce results that already are close to linear- Because of this parallelism we can set up the equivalencies: ly related to the analyte concentrations, we expect non-lin- ear terms to decrease as the degree of the fitting equation a0 = b0 used increases. Thus we need only fit a quadratic, or at a1 = b1 X1 = X most a cubic equation, to our data to test for linearity, 2 a2 = b2 X2 = X although there is nothing to stop us from using equations 3 a3 = b3 X3 = X of higher degree if we choose. Data well-described by a lin- etc. ear equation will produce a set of coefficients with a statis- and we see that by replacing our usual MLR-oriented vari- tically significant value for the term X1 (which is X,of 2 3 2 ables X1, X2, X3, and so forth with X, X , X , and so forth, re- course) and non-significant values for the coefficients of X spectively, we can use our common and well-understood or higher degree. mathematical methods (and computer programs) to perform Conclusion This is the basis for our new test of linearity. It has all the ad- vantages we described: it gives an unambiguous determina- tion of whether any non-linearity is affecting the relation- ship between the test results and analyte concentration. It provides a means of distinguishing between different types of non-linearity, if they are present, because only those that have statistically-significant coefficients are active. It also is more sensitive than any other statistical linearity test includ- ing the Durbin–Watson statistic. The tables in Draper and Smith for the thresholds of the Durbin–Watson statistic only give values for more than 10 samples. As we will see shortly, however, this method of linearity testing is quite satisfactory for much smaller numbers of samples. As an example, we applied these concepts to the Anscombe data (7). Table I shows the results of applying this to both the “normal” data (Anscombe’s X1, Y1 set) and the data showing non-linearity. We also computed the nature of the fit using only a straight-line (linear) fit as was done originally by Anscombe. We also fitted a polynomial using the quadratic term as well. It is interesting to compare results both ways. We find that in all four cases, the coefficient of the linear term is 0.5. In Anscombe’s original paper, this is all he did, and obtained the same result, but this was by design — the synthetic data he generated was designed and intended to give this result for all the data sets. The fact that we obtained the same coefficient (for X) using the polynomial demonstrates that the quadratic term was indeed uncorre- lated to the linear term.

Circle 22 www.spectroscopyonline.com Chemometrics in Spectroscopy

The improvement in the fit from the quadratic polynomial Appendix: Derivation and Discussion of the Formula applied to the non-linear data indicated that the square term in Equation 8 was indeed an important factor in fitting that data. In fact, Starting with a set of data values Xi, we want to create a set of including the quadratic term gives well-nigh a perfect fit to other values from these Xi such that the squares of those val- that data set, limited only by the computer truncation preci- ues are uncorrelated to the Xi themselves. We do this by sub- sion. The coefficient obtained for the quadratic term is com- tracting a value Z, from each of the Xi and find a suitable 2 parable in magnitude to the one for linear term, as we might value of Z. so that the set of values (Xi–Z) is uncorrelated expect from the amount of curvature of the line we see in with the Xi. From the definition of the correlation coefficient, Anscombe’s plot (7). The coefficient of the quadratic term then, this means that the following must hold: for the “normal” data, on the other hand, is much smaller than for the linear term. As we expected, furthermore, for the “normal,” linear rela- tionship, the t-value for the quadratic term for the linear [A1] data is not statistically significant. This demonstrates our contention that this method of testing linearity is indeed capable of distinguishing the two cases, in a manner that is Multiplying both sides of Equation A1 by the denominator of statistically justifiable. the LHS of Equation A1 results in the much-simplified ex- The performance statistics — the SEE and the correla- pression: tion coefficient — show that including the square term in the fitting function for Anscombe’s non-linear data set, gives, as we noted above, essentially a perfect fit. It is clear [A2] that the values of the coefficients obtained are the ones he used to generate the data in the first place. The very large t- We now need to solve this expression for Z. We begin by ex- values of the coefficients are indicative of the fact that we panding the square term: are near to having only computer round-off error as opera- tive in the difference between the data he provided and the values calculated from the polynomial that included the [A3] second-degree term. Thus this new test also provides all the statistical tests that the current FDA/ICH test procedure recommends. It pro- We then multiply through: vides information as to whether, and how well, the analytical method gives a good fit of the test results to the actual con- [A4] centration values. It can distinguish between different types of non-linearities, if necessary, while simultaneously evaluat- ing the overall goodness of the fitting function. As the results Distributing the summations and bringing constants outside from applying it to the Anscombe data show, it is eminently the summations: suited to evaluating the linearity characteristics of small data set as well as large ones. [A4]

References 1. H. Mark and J. Workman, Spectroscopy 20(1), 56–59 (2005). Since 2. H. Mark and J. Workman, Spectroscopy 20(3), 34–39 (2005). 3. H. Mark and J. Workman, Spectroscopy 20(4), 38–39 (2005). 4. H. Mark, J. Pharm. Biomed. Anal. 33, 7–20 (2003). 5. B.W Arden, An Introduction to Digital Computing, 1st ed. (Addi- the last term in equation A4 vanishes, leaving: son-Wesley Publishing Co., Inc., Reading, MA, 1963). 6. H. Mark and J. Workman, Spectroscopy 18(12), 106–111 (2003). [A5] 7. F.J. Anscombe, Am. Stat. 27, 17–21 (1973). 8. H. Mark and J. Workman, Spectroscopy 18(9), 25–28 (2003). 9. C. Daniel and F. Wood, Fitting Equations to Data — Computer Equation A5 is now readily rearranged to solve for Z: Analysis of Multifactor Data for Scientists and Engineers, 1st ed. (John Wiley & Sons, New York, 1971). 10. H. Mark, Principles and Practice of Spectroscopic Calibration [A6] (John Wiley & Sons, New York, 1991). 11. N. Draper and H. Smith, Applied Regression Analysis, 3rd ed. (John Wiley & Sons, New York, 1998).

34 Spectroscopy 20(9) September 2005 www.spectroscopyonline.com Equation A6 appears to differ from the expression in Daniel and Wood (9), in that the denominator expressions differ. To Similarly, for fourth powers we set up the expression: show that they are equivalent, we start with the denominator term of the expression on page 121 of reference 9: [A15]

which gives: [A7]

Again, we expand this expression: [A16] [A8] Again, Equation A16 is cubic in z4 and can be solved by alge- and separating and collecting terms: braic methods. For higher powers of the variable we can de- rive similar expressions. After the sixth power, algebraic meth- ods are no longer available to solve for the zi, but after [A9] evaluating the summations, computerized approximation methods can be used.

Rearranging the last term in the expression: Thus, the contribution of any power of the x-variable to the non-linearity of the data can be tested similarly by these [A10] means. ■

And we find that again, the last term in equation A10 vanishes since

, leaving:

[A11]

And upon combining the summations and factoring out Xi:

[A12]

Which is thus seen to be the same as the denominator term we derived in Equation A6: QED

By similar means we can derive expressions that will create transformations of other powers of the X-variable, that make the corresponding power uncorrelated to the X variable itself. Thus, analogously to Equation A2, if we wish to find a quan- 3 tity Z3 that will make (Xi–Z3) be uncorrelated with X, we set up the expression:

[A13]

Which provides the following polynomial in Z3:

[A14]

Equation A14 is quadratic in Z3, and thus, after evaluating the summations is easily solved through use of the Quadratic Formula.

Circle 25