Chemometrics in Europe: I (R;Q,P)= I (Q11p) Na (2) Selected Results
Total Page:16
File Type:pdf, Size:1020Kb
Volume 93, Number 3, May-June 1988 Journal of Research of the National Bureau of Standards Accuracy in Trace Analysis [C]={[E]' [El> [E] [A] tions to the field. Obviously, this is not the place to offer a review on chemometrics, let alone one that to recalculate the concentration profiles. This pro- is restricted to a continent. cess (truncation, normalization and pseudoinverse The definition of chemometrics [I] comprises followed by pseudoinverse) was repeated until no three distinct areas characterized by the key words further refinement occurred. "optimal measurements," "maximum chemical in- The concentration profiles and spectra of the formation" and, for analytical chemistry something three unknown components of stearyl alcohol in that sounds like the synopsis of the other two: "op- carbon tetrachloride obtained in this manner were timal way [to obtain] relevant information." found to make chemical sense. This EFA procedure, unlike others, was success- ful in extracting concentration profiles from situa- Information Theory tions where one component profile was completely encompassed underneath another component Eckschlager and Stepanek [2-5] pioneered the profile. adaption and application of information theory in analytical chemistry. One of their important results gives the information gain of a quantitative deter- References mination [5] [I] Malinowski, E. R., and Howery, D. G., Factor Analysis in Chemistry, Wiley Interscience, New York (1980). [2] Gemperline, P. G., J. Chem. Inf. Comput. Sci. 24, 206 I (qI 1p)=lIn toII)= n(X 2SV2xR-en-xl) \/nA (I) (1984). [3] Vandeginste, B. G. M., Derks, W., and Kateman, G., Anal. Chim. Acta 173, 253 (1985). [4] Gampp, H., Macder, M., Meyer, C. J., and Zuberbuhler, A. where q and p are the prior and posterior distribu- D., Talanta 32, 1133 (1985). tion of the analyte concentration for the specific [5] Malinowski, F. R., J. Chemometrics 1, 33 (1987). cases of a rectangular prior distribution in (x,,x2) [6] Wold, S., Technometrics 20, 397 (1978). and a Gaussian posterior with a standard deviation s determined from /A independent results. The penalty for an inaccurate analysis is considerable and can be expressed as Chemometrics in Europe: I (r;q,p)= I (q11P)_nA (2) Selected Results Wolfhard Wegscheider with d the difference between obtained value and the true value of x. The concept has also been ex- Institute for Analytical Chemistry tended to multicomponent analysis and multi- Mikro- and Radiochemistry method characterization. In the latter case, Technical University Graz, A-8010 Graz, Austria correlations between the information provided by the different methods need to be accounted for. Chemometrics is a very international branch of Given the cost of and time needed for an analysis, science, perhaps more so than chemistry at large, information efficiency can be deduced in a straight- and it is therefore appropriate to question the suit- forward manner [2]. Recently, work was published ability of the topic to be presented. It is, however, [5] suggesting the incorporation of various rele- the author's opinion that the profile of European vance coefficients; this, indeed, is a very important chemometric research has a couple of distinct fea- step since it provides a way to single out the infor- tures that may originate more in the structure of mation that is judged to be relevant for a given the educational system than in the actual research problem. It also opens up the possibility to draw on topics. The profile as it will be presented is the one information theory for defining objective functions perceived by the author, and therefore comprises a in computer-aided optimization of laboratory pro- very subjective selection of individual contribu- cedures and instruments. 257 Volume 93, Number 3, May-June 1988 Journal of Research of the National Bureau of Standards Accuracy in Trace Analysis Optimization of Chromatographic back of this and all similar approaches is that an Separations assignment of the peaks to the analytes is required at each stage. Currently, experimental work is un- In the area of optimization, significant activity is der way to evaluate fuzzy peak tracking based on spotted in chromatography since systematic strate- peak area and mean elution order [10]. gies for tuning the selectivity are badly needed. Due to the randomness of elution, a hypothesis supported recently by Martin et al. [6], peak over- Selectivity and Multicomponent Analysis lap is the rule rather than the exception even for systems with high peak capacity. Assuming In spite of the great advances in the development equiprobable elution volumes for all analytes, the of methods at trace levels, neither current separa- authors [6] have derived from standard probability tion techniques nor the detection principles are suf- theory an expression for the probability that all an- ficiently selective. An attempt was recently made alytes are separated. An evaluation of eq (9) in to redefine selectivity in chemometric terms, since reference [6] Kaiser's well received proposal on this subject fails in two ways [11]. First, it does not give a clear account of the interrelationship between selectiv- P.nmn() (1 -I (3) ity, accuracy and precision, and second it gives a value of infinity for full selectivity, a somewhat fuzzy result. By Taylor expansion, any multicom- (m the number of components, n the peak capacity ponent calibration system can be assessed for its of the system and Pm,(m) the probability that all selectivity by the condition number of the sensitiv- components are separated) is provided in table I ity matrix, cond(K); the condition number assumes and shows that the absence of overlap is really the value of one for full selectivity and grows as very improbable except in cases of an extremely the mutual dependence of the columns of K in- high ratio of peak capacity/number of components. creases. Since the condition number can also be The necessity of a separate optimization for each shown to relate to the analytical error, it was mixture of compounds is thus the rule and not an termed "error amplification" factor [12]. It can exception. thus be regarded a prime concern in multicompo- Chromatographic optimization cannot be reli- nent analysis to find suitable means to minimize the ably accomplished, however, by direct search tech- condition number. niques like the SIMPLEX, since each change of Recently, a chemometric approach was sug- elution order corresponds to a minimum in the ob- gested by Lohninger and Varmuza [13], who jective function. For multimodal surfaces, tech- derived a linear discriminator to serve as a selec- niques that model the retention behaviour of each tive detector for PAH in GC-MS. This was accom- analyte separately and subsequently assemble a re- plished by selecting the best of 40 proposed sponse function for a grid-like pattern in the space features both in terms of discriminatory power of experimental variables seem to be most promis- against NOT-PAHs and by a rigorous sensitivity ing [7]. Lankmayr and Wegscheider [8] have sug- analysis to discard those that are unreliable in the gested to use a "moving least-squares" algorithm presence of experimental error. The authors claim known from the interpolation part of the X-ray flu- a recognition rate of 99%. Surely the concept is in orescence program NRLXRF [9] for modelling its infancy, but the chemometric detector may well elution. This results in an efficient optimization that have potential in other areas, as well. is not stranded on secondary maxima of the re- sponse surface. A flow chart of this procedure is provided in figure 1. The efficiency stems mainly Partial Least Squares and Multivariate from three things: (i) a first estimate of the opti- Design mum can be made from only two chromatographic runs under different conditions; (ii) additional data Relating sets of measurements to each other is can be incorporated to update the model of the re- frequently done by least squares techniques. sponse surface as they become available; and (iii) if Among those, the partial least squares (PLS) al- required, additional experimental variables can be gorithm as developed by Wold et al. (14] is most entered in the optimization at any stage. A draw- popular among chemometricians. As opposed to 258 Volume 93, Number 3. Mav-June 1988 Journal of Research of the National Bureau of Standards Accuracy in Trace Analysis principal component regression, another remedy in duced in analytical chemistry [18]. Its basic notion case of collinearity, PLS enhances relatively small is that crisp sets and numbers are replaced by their variance structures if they are relevant for predic- fuzzy analogues, to allow for variability on a non- tions of Y from X. This is done by using eigenvec- probabilistic basis. The type of variability is defined tors of X'YY'X instead of X'X as done in principal in a so-called membership function that can be con- components analysis [15]. The most recent applica- structed from supplemental knowledge or subjec- tion of PLS are in quantitative structure activity tive judgement by an expert in the field. By doing relation (QSAR) studies of various classes of com- this, each number manipulated also carries informa- pounds. tion on its spread. Important applications of fuzzy Examples are contained in reference [16] and theory so far have been library searching [19], pat- briefly outlined in reference [17]. For a series of tern classification [20J, and calibration with linear bradykinin-potentiating pentapeptides, a model and non-linear signal/analyte dependencies in the was derived to predict the biological activity of presence of errors in x and y [21]. In the future, the another set of peptides by describing the amino incorporation of fuzzy information in expert sys- acid sequence from principal properties of single tems will undoubtedly play a major role.