Bayesian Wavelet Regression on Curves with Application to A

Bayesian WaveletRegression on Curves With Application toa Spectroscopic Calibration Problem P. J. Brown, T. Fearn, and M. Vannucci Motivatedby calibration problems in near-infrared (N IR)spectroscopy,we considerthe linear regression setting in which the many predictorvariables arise fromsampling an essentially continuous curve at equallyspaced pointsand there may bemultiple predictands. We tackle thisregression problem by calculating the wavelet transformsof the discretized curves,then applying a Bayesian variable selectionmethod using mixture priors to the multivariate regression of predictands on wavelet coefcients. For prediction purposes, we average overa set oflikely models. Applied to a particularproblem in N IRspectroscopy,this approach was able to ndsubsets ofthe wavelet coefcients withoverall better predictive performance thanthe more usualapproaches. I ntheapplication, the available predictorsare measurements ofthe N IRreectance spectrum ofbiscuit dough pieces at 256equally spaced wavelengths.The aim is topredict the composition (i.e., the fat, our,sugar, and water content)of the dough pieces usingthe spectral variables.Thus we have amultivariateregression of four predictands on 256 predictors with quite high intercorrelation among the predictors. A trainingset of 39samples is availableto tthisregression. Applying a wavelet transformreplaces the256 measurements oneach spectrum with256 wavelet coefcients thatcarry thesame information.The variable selection method could use subsetsof these coefcients thatgave goodpredictions for all fourcompositional variables on aseparate test set ofsamples. Selectingin the wavelet domainrather thanfrom theoriginal spectral variablesis appealingin this application, because asinglewavelet coefcient can carry informationfrom a bandof wavelengthsin theoriginal spectrum. This band can benarrowor wide,depending on thescale ofthewavelet selected. KEY WORDS:Markov chain Monte Carlo; Mixture prior; Model averaging; Multivariate regression; Near-infrared spectroscopy; Variable selection. 1. INTRODUCTION measurethe composition of biscuit dough pieces (formed but unbakedbiscuits), for possibleon-line implementation. (For a Thisarticle presents a newway of tacklinglinear regression problemsin whichthe predictor variables arise from sampling fulldescription of the experiment, see Osborne, Fearn, Miller, anessentially continuous curve at equally spaced points. The andDouglas 1984.) Brie y, two similar sample sets were workwas motivatedby calibration problems in near-infrared madeup, with the standard recipe varied to provide a large (NIR)spectroscopy,of which the following example is typical. rangefor eachof the four constituents under investigation: fat,sucrose, dry our,and water. The calculated percentages 1.1 Near-Infrared Spectroscopy of Biscuit Doughs ofthesefour ingredients represent the q 4responses.There D were n 39samples in the calibration or training set, with QuantitativeN IRspectroscopyis used to analyze such D diversematerials as food and drink, pharmaceutical prod- sample23 excluded from theoriginal 40 as an outlier, and a further m 39in the separate prediction or validation set, ucts,and petrochemicals. The N IRspectrumof a sampleof, D say,wheat ouris a continuouscurve measured by modern againafter one outlier was excluded.Thus Y and Yf , the scanninginstruments at hundreds of equally spaced wave- matricesof compositional data for thetraining and validation sets,are both of dimension39 4. lengths.The information contained in this curve can be used topredict the chemical composition of the sample. The prob- AnN IRreectance spectrum is available for eachdough lemlies in extracting the relevant information from possibly piece.The original spectral data consist of 700 points mea- thousandsof overlapping peaks. Osborne, Fearn, and Hindle suredfrom 1100to 2498 nanometers (nm) insteps of 2 nm. (1993)described applications in food analysis and reviewed For ouranalyses using wavelets, we havechosen to reduce the someof thestandard approaches to the calibration problem. numberof spectralpoints to savecomputational time. The rst Theexample studied in detail here arises from anexper- 140and last 49 wavelengths, which were thoughtto contain imentdone to test the feasibility of N IRspectroscopyto littleuseful information, were removed,leaving a wavelength rangefrom 1380nm to 2400 nm, over which we tookevery otherpoint, thus increasing the gap to 4 nmand reducing the P.J.Brownis Pzer Professorof Medical Statistics, I nstituteof Math- numberof points to p 256.The matrices X and X of spec- ematics andStatistics, University of Kent, Canterbury, Kent, CT2 7NF, D f U.K.(E-mail: [email protected] ).T.Fearnis Professorof Applied traldata are then 39 256.Samples of three centered spectra Statistics,Department ofStatistical Science, UniversityCollege London, aregiven on the left side of Figure1. WC1E6BT, U.K. (E-mail: [email protected] ).M.Vannucciis Assistant Professorof Statistics, Department ofStatistics, Texas A&M University, Theaim is to derive an equation that will predict the CollegeStation, TX 77843(E-mail: [email protected] ). This work responsevalues Y from thespectral data X for futuresam- was supportedby theU.K. Engineering and Physical Sciences Research Coun- cil underthe Stochastic Modelling in Science andTechnology I nitiative,grant ples where Y isunknown but X canbe measuredcheaply and GK/K73343.M. Vannuccialso acknowledges support from the Texas Higher rapidly. AdvancedResearch Program,grant 010366-0075 from Texas A&M Inter- nationalResearch TravelAssistant Programand from the National Science FoundationCAREER award DMS-0093208.The spectroscopic calibration problemwas providedby theFlour Milling and Baking Research Association. Theauthors thank the associate editorand a referee, as well as MikeW est, © 2001 American StatisticalAssociation DukeUniversity, and Adrian Raftery, University of W ashington,for sugges- Journal of theAmerican StatisticalAssociation tionsthat helped improve the article. June 2001, Vol.96, No.454, Applicationsand Case Studies 398 Brown, Fearn and Vannucci:Bayesian WaveletRegression on Curves 399 – 0.06 0.5 e – 0.08 0 c n T a t c – 0.1 W – 0.5 e l D f e R – 0.12 – 1 – 0.14 – 1.5 1400 1600 1800 2000 2200 2400 0 50 100 150 200 250 0.1 1 e c n 0.5 T a t c 0.05 W e l D f e 0 R 0 – 0.5 1400 1600 1800 2000 2200 2400 0 50 100 150 200 250 0.08 e 0.06 0.4 c n T a t c 0.04 W 0.2 e l D f e R 0.02 0 0 – 0.2 1400 1600 1800 2000 2200 2400 0 50 100 150 200 250 Wavelength Wavelet coefficient Figure1. Original Spectra(left column)and Wavelet Transforms (right column). 1.2 StandardAnalyses 1.3 SelectingWavelet Coef’ cients Themost commonly used approaches to this calibration Theapproach that we investigatehere involves selecting problemregress Y on X,withthe linear form variables,but we selectfrom derivedvariables. The idea is totransform each spectrum into a setof wavelet coef cients, thewhole of whichwould suf ce to reconstructthe spectrum, Y XB E D C andselect good predictors from amongthese. There are good reasonsfor thinkingthat this approach might have advantages beingjusti ed either by appeals to the Beer– Lambert law overselecting from theoriginal variables. (Osborneet al. 1993) or on the grounds that it worksin prac- Inpreviouswork (Brown, Fearn, and V annucci1999; tice.I nSection6 we alsoinvestigate logistic transformations Brown,V annucci,and Fearn 1998a,b) we exploredBayesian ofthe responses, showing that overall their impact on the pre- approachesto the problem of selecting predictor variables in dictionperformance of the model is not bene cial. thismultivariate regression context. W eapplythis methodol- Theproblem is not straightforward, because there are many ogyto wavelet selection here. morepredictor variables (256) than training samples (39) in Weknowthat wavelets can be used successfully for com- ourexample. The most commonly used methods for overcom- pressionof curveslike the spectra in ourexample, in thesense ingthis dif culty fall into two broad classes: variable selection thatthe curves can be accuratelyreconstructed from afraction andfactor-based methods. When scanning N IRinstruments ofthe full set of wavelet coef cients (Trygg and W old1998; rst appeared,the standard approach was toselect (typically Walczakand Massart 1997). Furthermore, the wavelet decom- positionof the curve is a localone, so that if the information usinga stepwiseprocedure) predictors at a smallnumber of relevantto our prediction problem is contained in a particular wavelengthsand use multiple linear regression with this subset partor partsof the curve, as ittypically is, then this informa- (Hrushka1987). Later, this approach was largelysuperseded tionwill be carried by a verysmall number of wavelet coef- bymethods that reduce the p spectralvariables to scores on cients.Thus we mayexpect selection to work. The ability amuchsmaller number of factors and then regress on these ofwavelets to model the curve at different levels of resolu- scores.T wovariants— principal components regression (PCR; tiongives us theoption of selectingfrom ourcurve at a range Coweand McNicol 1985) and partial least squares regres- ofbandwidths. I nsomesituations it may be advantageous to sion(PLS; Wold,Martens, and W old1983)— are now widely selecta sharpband, as we dowhenwe selectone of theorigi- used,with equal effectiveness, as thestandard approaches. The nalvariables; in other situations a broadband, averaging over increasingpower of computershas triggered renewed research manyadjacent

Load more