BookReview Review of Chemometrics: Analysis for the Laboratory and Chemical Plant, by Richard Brereton, Wiley (2003)

Howard Mark

ata analysis comprises more than chemometrics, sta- The second chapter, dealing with experimental designs, is a tistics or any single named discipline in seeking out wonderful merging of statistical principles with more-sophis- the information contained in any given set of data. ticated chemometric designs for . The principles DHappily, despite its title, Chemometrics: Data Analysis of applying distributional and numerical statistical tests to the for the Laboratory and Chemical Plant recognizes that simple data are explained, with numerical examples. The use of stan- fact and makes serious efforts to include examples of many of dard statistical hypothesis tests (t- and F-tests, for example) is the various ways in which information about the underlying a good example of the application of rigorous statistical prin- behavior of a system can be extracted from data obtained from ciples to the analysis of chemometric algorithms and is ex- the system. This includes both standard (for example, chemo- tremely welcome in a book about chemometrics. The fact that metrics and ) and nonstandard (for example, “Man- so much material is packed into a book of finite size , hattan Distance”) ways to view and examine data. A short however, that some subjects are not explained in the detail summary of elementary statistics is included in an appendix. needed to prevent the novice from falling into error. An exam- While welcome, the necessary brevity of the ple of this is the author’s discussion of cu- treatment probably makes it too compressed mulative normal distribution (CND) plots. for a reader to actually learn statistics from The presentation and illustration of the ap- that presentation. plication of this tool is virtually unique in The book is organized according to applica- the chemometric literature; however, guide- tion, rather than by technique. Thus, principal lines for its use in distinguishing real effects components largely are described in the chapter from the variability of random data are ab- on , and partial sent. Cuthbert Daniel has shown that, espe- (PLS) is dealt with in the chapter on calibra- cially for small numbers of samples, the tion. Every chapter includes several problems at variability due to can mislead the end to give readers practice in applying, and the novice into mistaking them for real ef- to promote thought about, the principles devel- fects (see C. Daniel and F. Wood, Fitting oped during the course of the chapter. Equations to Data [Wiley, 1971] and also C. In the introductory chapter the author dis- Daniel, Applications of Statistics to Industrial cusses the various ways different disciplines Experimentation [Wiley, 1976]). In the same (physics, , and engineering) approach books, Daniel presented illustrative exam- the examination of data, and the variety of backgrounds and ples based upon the CND of normal random numbers for mathematical sophistication these disciplines bring to this comparison as a reference to which actual CND plots can be study. He then goes on to introduce his own philosophy of data compared. As Brereton points out, before computers were analysis and how to learn it, including the importance of prac- widely available, compilations such as Daniel’s were the only tical application. It is peppered with good advice (“Scale your way in which such information could be disseminated. Nowa- variables appropriately before comparing them”,for example). days, however, because computers and appropriate software To that end the author uses programs written in MATLAB and are readily available, users of these tools are well advised to Excel, and through their use illustrates the concepts with nu- create their own comparison plots by applying the tool (the merous examples, both graphical and numerical. He also makes CND plot, in this case) to known-random, known-normal the programs and several of the data sets available to his readers . Brereton initially applies these tests to simple, on the Internet, an innovation in publishing that expands the statistically based designs. Then more complex and sophisti- book beyond what is contained between its covers. He con- cated experimental designs (for example, mixture designs, re- cludes this chapter with a discussion and list of other resources, sponse-surface designs, and so forth) are introduced, and the including both print and electronic (Internet-based) resources. analysis based upon statistical principles continued as long as

March 2004 19(3) 39 Book Review

reasonably possible. Other characteris- and matrices: tensors. Tensor analysis tics of designs, such as the leverage of requires some industrial-strength math, the design points and how the overall far beyond what chemists are used to design affects them, are also treated. dealing with. For this reason it is ques- The third chapter, “,” tionable whether its inclusion in this deals with methods of analyzing sequen- book is appropriate, even though some tial signals. After a brief introductory of the more advanced researchers are passage presenting some of the common investigating the topic. The chapter application areas, the chapter again starts winds up with a discussion of methods with elementary methods (smoothing, for validating a calibration model once curve fitting), and expands its coverage it is produced. to discussions of the effects of noise and The sixth chapter, “Evolutionary Sig- digitization. It then goes into descrip- nals,”deals with what basically is a spe- tions of some of the more complex and cialized application of time-series analy- sophisticated techniques (Savitzky-Golay sis; it goes into depth about the analysis methods, Fourier transforms, and so of spectroscopic detection of chromato- forth) and ends with discussions of the graphic effluent. Examples of the appli- latest cutting-edge methodologies, which cations examined are the use of PCA still are being examined largely by the re- and PLS to separate the various compo- search community. Examples are nents of a sample, when their chro- transforms, maximum entropy, and matographic peaks are overlapped, and Bayesian methods. having separated them, to quantitate The fourth chapter, “Pattern Recog- one or more of the components. The nition,” again follows the overall archi- general problem is divided into multiple tecture of the book by starting with de- possible situations based upon the de- scriptive material and elementary gree of overlap, from the easiest (simple methods (in this case “naked-eye” partial overlap) to the most difficult (a methods such as dendrograms). The small peak completely embedded under chapter then goes into a fairly extensive a large peak). Various auxiliary tech- examination of principal components, niques are also discussed, and recom- illustrating it with an application to mendations are made. spectra of chromatographic effluent Appendix: There is only a single ap- having overlapping peaks and using the pendix, but the topics covered vary so application of principal components to widely that each one could have mer- the chromatographic data to separate ited a separate appendix of its own. The the peaks. appendix discusses the basics of vector The fifth chapter, “Calibration,”con- and matrix math, the description of tinues the pattern by starting with a PCA and PLS in terms of matrices, one-variable example, presenting both basic statistics including hypothesis the “classical” and “inverse” methods. testing (and some tables of common Here again, as in previous chapters, sec- statistics), and a discussion of how to ondary considerations are discussed as use Excel and MATLAB for performing well as the main subject. The author de- chemometric and other forms of data scribes and discusses effects of intercept, analysis. Again, inclusion of these topics centering, and the locations of errors in is welcome, although the treatment the two cases. The concepts then are ex- probably makes the discussion too panded to use multiple-regression and compressed for a reader to actually principal component analysis (PCA) as learn from the presentation. On the multivariate calibration methods. Fol- other hand, the topics can serve as in- lowing this is an extensive discussion of troductions to more-extensive treat- PLS in several of its varieties including ments, so that readers will be familiar the extension of the discussion to more- with the terminology and approaches advanced topics, particularly trilinear used when delving more fully into a PLS1 (the application of the PLS algo- topic of interest. They could also be rithm to a single dependent variable at a valuable reviews for those readers who time). This extension requires use of the once knew the topics but whose knowl- next mathematical step beyond vectors edge has since gotten rusty. ■

40 Spectroscopy 19(3) March 2004 Circle 34 www.spectroscopyonline.com