A Statistical View of Some Chemometrics Regression Tools Author(S): Ildiko E
Total Page:16
File Type:pdf, Size:1020Kb
American Society for Quality A Statistical View of Some Chemometrics Regression Tools Author(s): Ildiko E. Frank and Jerome H. Friedman Reviewed work(s): Source: Technometrics, Vol. 35, No. 2 (May, 1993), pp. 109-135 Published by: American Statistical Association and American Society for Quality Stable URL: http://www.jstor.org/stable/1269656 . Accessed: 19/08/2012 10:18 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. American Statistical Association and American Society for Quality are collaborating with JSTOR to digitize, preserve and extend access to Technometrics. http://www.jstor.org ? 1993 American Statistical Association and TECHNOMETRICS,MAY 1993,VOL. 35, NO. 2 the American Society for Quality Control A StatisticalView of Some Chemometrics RegressionTools IldikoE. Frank JeromeH. Friedman Jerll,Inc. Departmentof Statistics Stanford,CA 94305 and StanfordLinear Accelerator Center StanfordUniversity Stanford,CA 94305 Chemometricsis a fieldof chemistrythat studies the application of statisticalmethods to chemicaldata analysis.In additionto borrowingmany techniques from the statistics and engineeringliteratures, chemometrics itself has givenrise to severalnew data-analytical methods.This article examines two methods commonly used in chemometricsfor predictive modeling-partialleast squares and principal components regression-from a statistical per- spective.The goalis to tryto understand their apparent successes and in what situations they can be expectedto workwell and to comparethem with other statistical methods intended forthose situations. These methods include ordinary least squares, variable subset selection, and ridgeregression. KEY WORDS: Multipleresponse regression; Partial least squares; Principal components regression;Ridge regression; Variable subset selection. 1. INTRODUCTION Wold 1975) and, to a somewhatlesser extent,prin- cipal componentsregression (PCR) (Massy 1965). Statisticalmethodology has been successfullyap- Although PLS is heavily promoted (and used) by plied to manytypes of chemicalproblems for some chemometricians,it is largelyunknown to statisti- time. For example, experimentaldesign techniques cians. PCR is knownto, but seldom recommended have had a strongimpact on understandingand im- by,statisticians. [The Journalof Chemometrics(John provingindustrial chemical processes. Recentlythe Wiley) and Chemometricsand IntelligentLaboratory fieldof chemometricshas emergedwith a focus on Systems(Elsevier) containmany articleson regres- analyzingobservational data originatingmostly from sion applicationsto chemical problemsusing PCR organicand analyticalchemistry, food research,and and PLS. See also Martensand Naes (1989).] environmentalstudies. These data tend to be char- The originalideas motivatingPLS and PCR were acterizedby manymeasured variables on each of a entirelyheuristic, and theirstatistical properties re- fewobservations. Often the number of such variables main largelya mystery.There has been some recent p greatlyexceeds the observationcount N. There is progresswith respect to PLS (Helland 1988; Lorber, generallya high degree of collinearityamong the Wangen, and Kowalski 1987; Phatak, Reilly, and variables, which are often (but not always) digiti- Penlidis1991; Stone and Brooks 1990). The purpose zations of analog signals. of this article is to view these proceduresfrom a Many of the tools employedby chemometricians statisticalperspective, attempting to gain some in- are the same as those used in otherfields that pro- sightas to when and why theycan be expected to duce and analyze observationaldata and are more work well. In situationsfor which they do perform or less well known to statisticians.These tools in- well, theyare comparedto standardstatistical meth- clude data explorationthrough principal components odologyintended for those situations.These include and cluster analysis, as well as modern computer ordinaryleast squares (OLS) regression,variable graphics.Predictive modeling (regression and clas- subsetselection (VSS) methods,and ridgeregression sification)is also an importantgoal in most appli- (RR) (Hoerl and Kennard1970). The goal is to bring cations.In thisarea, however,chemometricians have all of these methodstogether into a commonframe- inventedtheir own techniquesbased on heuristicrea- workto attemptto shed some lighton theirsimilar- soning and intuitiveideas, and there is a growing ities and differences.The characteristicsof PLS in body of empiricalevidence that they perform well in particularhave so fareluded theoreticalunderstand- manysituations. The mostpopular regression method ing. This has led to unsubstantiatedclaims concern- in chemometricsis partial least squares (PLS) (H. ing its performancerelative to otherregression pro- 109 110 ILDIKO E. FRANK AND JEROME H. FRIEDMAN cedures, such as that it makes fewer assumptions axes. This leads to the assumptionthat the response concerningthe natureof the data. Simplynot under- is likelyto be influencedby a few of the predictor standingthe natureof the assumptionsbeing made variables but leaves unspecifiedwhich ones. It will does not mean thatthey do not exist. thereforetend to work best in situationscharacter- Space limitationsforce us to limitour discussion ized by true coefficientvectors with components here to methodsthat so far have seen the most use consistingof a veryfew (relatively)large (absolute) in practice. There are many other suggested ap- values. proaches[e.g., latentroot regression (Hawkins 1973; Section 5 presentsa simulationstudy comparing Webster,Gunst, and Mason 1974), intermediateleast the performanceof OLS, RR, PCR, PLS, and VSS squares (Frank 1987), James-Steinshrinkage (James in a varietyof situations.In all of the situationsstud- and Stein 1961), and various Bayes and empirical ied, RR dominatedthe other methods,closely fol- Bayes methods] that, althoughpotentially promis- lowed by PLS and PCR, in thatorder. VSS provided ing, have not yet seen wide applications. distinctlyinferior performance to these but stillcon- siderablybetter than OLS, whichusually performed 1.1 Summary Conclusions quite badly. RR, PCR, and PLS are seen in Section3 to operate Section 6 examines multiple-responseregression, in a similarfashion. Their principalgoal is to shrink investigatingthe circumstancesunder which consid- the solution coefficientvector away fromthe OLS eringall of the responsestogether as a group might solution toward directionsin the predictor-variable lead to betterperformance than a sequence of sep- space of largersample spread. Section 3.1 provides arate regressionsof each response individuallyon a Bayesian motivationfor this under a prior distri- the predictors.Two-block multiresponse PLS is an- bution thatprovides no informationconcerning the alyzed. It is seen to bias the solutioncoefficient vec- directionof the true coefficientvector-all direc- torsaway fromlow spreaddirections in thepredictor tionsare equally likelyto be encountered.Shrinkage variablespace (as would a sequence of separatePLS away fromlow spread directionsis seen to control regressions)but also toward directionsin the pre- the variance of the estimate. Section 3.2 examines dictorspace thatpreferentially predict the highspread the relativeshrinkage structure of these threemeth- directionsin the response-variablespace. An (em- ods in detail. PCR and PLS are seen to shrinkmore pirical) Bayesian motivationfor this behavior is de- heavily away fromthe low spread directionsthan veloped by consideringa joint prior on all of the RR, whichprovides the optimal shrinkage(among (true) coefficientvectors that providesinformation linear estimators)for an equidirectionprior. Thus on the degree of similarityof the dependenceof the PCR and PLS make the assumptionthat the truthis responses on the predictors(through the response likelyto have particularpreferential alignments with correlationstructure) but no informationas to the the high spread directionsof the predictor-variable particularnature of those dependences. This leads (sample) distribution.A somewhatsurprising result to a multiple-responseanalog of RR that exhibits is thatPLS (in addition)places increasedprobability similarbehavior to thatof two-blockPLS. The two mass on the truecoefficient vectoraligning with the proceduresare comparedin a smallsimulation study Kth principalcomponent direction, where K is the in whichmultiresponse ridge slightlyoutperformed numberof PLS componentsused, in factexpanding two-blockPLS. Surprisingly,however, neitherdid the OLS solutionin that direction.The solutionsand dramaticallybetter than the correspondingunire- hence the of performance RR, PCR, and PLS tend sponse proceduresapplied separatelyto the individ- to be similarin quite mostsituations, largely because ual responses,even thoughthe situationswere de- are to they applied problems involvinghigh colli- signed to be most favorable to the multiresponse nearityin whichvariance tends to dominatethe bias, methods. in the directionsof small especially predictorspread, Section 7 discusses the invarianceproperties of all three methods to shrink causing heavily