Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 463

Research Article Chemometric analysis of combined NIR and MIR spectra to characterize French

Nathalie Dupuy1, Oswin Galtier1, Yveline Le Dre´au1, Christian Pinatel2, Jacky Kister1 and Jacques Artaud1

1 CNRS UMR6263, ISM2,E´ quipe AD2EM, Groupe Syste`mes Chimiques Complexes, Case 451, Universite´ Paul Ce´zanne, , 2 Centre Technique de l’Olivier (CTO), Maison des Agriculteurs, Aix-en-Provence, France

Chemometric treatment of near-infrared (NIR) and mid-infrared (MIR) combined spectra was used firstly to predict oil and water contents in fresh fruit samples (n ¼ 223) and secondly to classify these samples into five principal French cultivar origins (, , Olivie`re, , and ). The study was carried out during four crop years (2005/2006 to 2008/2009) to take into account the seasonal variations. The comparison of the results obtained in the combined range (REP ¼ 2.6% for the water content and 3.5% for the oil content) provides an obvious advantage compared to the NIR and MIR techniques used separately. Fresh olive fruit cultivars were satisfactorily classified with the partial least squares-discriminant analysis (PLS-DA) method in the combined range. After use of the K-means clustering on the PLS-DA scores, all the samples were well classified into their five groups of origin. The use of infrared combined spectra allows a considerable improvement in estimating olive fruit quality (oil and water contents, varietal origins).

Keywords: Cultivar / Oil content / Olive fruit / Partial least squares-discriminant analysis / Water content Received: September 4, 2009; accepted: January 4, 2010 DOI: 10.1002/ejlt.200900198

1 Introduction technique for evaluating the oil content. These methods are time consuming, expensive, and impractical for processing The olive tree is one of the most important crops in the large numbers of samples. Although NMR has widely Mediterranean countries. Virgin is obtained from replaced these procedures, the samples still have to be dried the olive fruit only by mechanical or physical methods which until constant mass before analysis [3], and NMR is not do not alter the oil. The quality of virgin olive oil is principally convenient and is too expensive for on-line application. a function of four parameters: varietal and geographic origin, Near-infrared (NIR) and mid-infrared (MIR) spec- olive fruit quality, and extraction process. The oil content in troscopy associated with chemometric analyses should also olive fruit was shown to have a high variability between be able to advantageously replace the traditional analytical cultivars [1]. The production of good-quality virgin olive methods to determine these parameters, because no complex oil must start with raw materials that meet well-defined preparation of sample is required and spectral acquisition is quality standards [2]. In laboratories, traditional analytical relatively fast. Moreover, it was possible to quantify various methods to determine oil and water contents are used: drying parameters using only one infrared spectrum. The determi- the sample for evaluating the water content and the Soxhlet nation of the oil and fat contents of olive pastes [4–6] or olive pomace [7–9] by NIR spectroscopy and Raman spectroscopy [8] has been reported. Recently, Leon et al. [10] have deter- Correspondence: Dr. Yveline Le Dre´au, CNRS UMR6263, ISM2,E´ quipe mined the oil and moisture contents in , Frantoio, AD2EM, Groupe Syste`mes Chimiques Complexes, Case 451, Universite´ and olive cultivars in intact frozen olive fruits with a Paul Ce´zanne, 13397 Marseille Cedex 20, France VIS/NIR analysis system in reflectance mode. Partial least E-mail: [email protected] squares (PLS) regression was used for each cultivar and for Fax: þ33-4-91289152 each harvest year. For the samples of three cultivars, the relative error of prediction (REP) for the oil and moisture Abbreviations: NIR, near-infrared; MIR, mid-infrared; PCA, principal component analyses; PLS, partial least squares; REP, relative error of contents varied according to the harvest year. So, for the years prediction; SEC, standard error of calibration; SOD, sum of distances 1996 and 1997, the estimated REP values for oil were about

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com 464 N. Dupuy et al. Eur. J. Lipid Sci. Technol. 2010, 112, 463–475

8.5 and 16.9% and for moisture about 3.7 and 5.5%, 2.2 Reference analysis respectively. The cultivar number in the world is estimated at 2000, Of fresh olive fruits, 1.5 kg was milled in a hammer mill. and in France there are about 200 cultivars [11]. For the Then, 45 g was dried in a forced-air oven at 1048C during principal French cultivars, specific land is under cultivation 24 h; the loss of weight gave the percentage of water and outside of which production is disappointing. Some of the volatile matter of the sample. principal French olive cultivars are Aglandau, Cailletier, For oil content determination, the dried sample was trans- Olivie`re, Salonenque, and Tanche. A denomination of a ferred into a glass tube of 40 mm in diameter. The sample cultivar is recognized using different morphological descrip- was analyzed using a Time Domain NMR instrument tors (tree, leaf, fruit, and stone). However, on the basis of (Minispec Bruker) according to the AOCS [16]. these criteria, only very dedicated specialists are able to perform a strict differentiation between all the cultivated 2.3 NIR spectroscopy varieties. Currently, one major problem in the food industry is the setting of objective tools to determine the origin of raw Fresh olive fruit samples were pitted prior to measuring in a materials as well as finished products so that products can be Nicolet Antaris spectrometer interfaced to a PC and followed from the producer to the consumer. The origin and equipped with an InGaAs detector, an H2 NIR source, the authenticity of virgin olive oils have been the object of and a CaF2/germanium beam splitter. The spectrometer many studies in the past few years [12–14]. But, to our was placed in an air-conditioned room (218C). Fourier trans- knowledge, little has been done to determine the origin of form (FT)-NIR spectra were recorded by collecting the NIR raw materials with morphological characterization [15], and a energy that scatters on the surface of olive fruit, using an literature search has revealed no studies on varietal origin integrating sphere in diffuse reflectance mode. Each spec- determination of olive fruits by infrared spectroscopy. trum shown is the mean of 25 spectra obtained on 25 different The aim of this study was to show the advantages of olive fruits – from the same batch – put in an autosampler. All combined NIR and MIR spectroscopy associated with che- spectra were computed at 8 cm1 resolution, with 16 scan mometric treatment for a direct and rapid test method used accumulations between 4000 and 10 000 cm1 using the firstly to predict the water and oil contents of olives and software result integration 2.1 Thermo Nicolet. A back- secondly to provide variety recognition of olive fruits. The ground spectrum was collected under the same conditions results were compared with the ones obtained independently before each batch measurement. in the NIR or MIR range. 2.4 MIR spectra

2 Materials and methods Spectra of each olive sample were recorded from 4000 to 700 cm1, with 4 cm1 resolution and 100 scans on a Nicolet 2.1 Olive fruit samples Avatar spectrometer equipped with a DTGS detector, an Ever-Glo source, and a KBr/germanium beam splitter. The Fresh olive fruit samples were obtained from the Centre spectrometer was placed in the same air-conditioned room. Technique de l’Olivier (CTO). A total of 223 batches of Each spectrum is the mean of five spectra obtained on five 50 olive fruits were collected during four successive crops pieces of five different olive fruits – from the same batch – put (2005/2006, 2006/2007, 2007/2008, and 2008/2009) to take without preparation into a single-bound attenuated total into account the seasonal variations. reflectance (SB-ATR) cell provided with a diamond crystal. For the prediction of the water and oil contents, only olive Air was taken as reference for the background spectrum fruits collected during the first three successive crops (2005/ before each sample. Between spectra, the ATR plate was 2006, 2006/2007, and 2007/2008) were taken into account. cleaned in situ by scrubbing with ethanol solution, enabling 178 samples were used (125 for the calibration set and 53 in to dry the ATR. Cleanliness was verified by collecting a the prediction one): Aglandau (n ¼ 67), Cailletier (n ¼ 31), background spectrum and comparing to the previous back- Olivie`re (n ¼ 19), Salonenque (n ¼ 27), and Tanche ground spectrum. (n ¼ 34). For the prediction of varietal origin, 223 samples were 2.5 PLS regression used: Aglandau (n ¼ 88), Cailletier (n ¼ 34), Olivie`re (n ¼ 29), Salonenque (n ¼ 34), and Tanche (n ¼ 38), 128 PLS is a supervised analysis that is based on the relation in the calibration set and 95 in the prediction set. between the signal intensity and the characteristics of the The olive fruit samples differed in terms of cultivar origin, sample [17]. Interference and overlapping of the spectral crop year, degree of ripening, and area of growth. All the information may be overcome by using powerful multicom- fresh olive fruits were directly analyzed on receipt in the ponent analysis such as PLS regression. PLS allows a soph- laboratory. isticated statistical approach using a spectral region rather

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 Chemometric analysis of combined spectra of French olives 465 than unique and isolated analytical bands [18, 19]. The Another useful parameter is the REP, which shows the algorithm is based on the ability to mathematically correlate predictive ability of the model and is calculated as follows: spectral data to a property matrix of interest while simul- SEP taneously accounting for all other significant spectral factors REP ¼ 100 (3) that perturb the spectrum. It is thus a multivariate regression y method that uses a selected spectral region and is based on where y is the mean of the observed values. the use of latent variables. The predictive ability of the model should also be To construct a model, the first step is to perform a expressed by the bias and the correlation coefficient R2.As calibration. This involves collecting a calibration set of refer- specified in the assistant of the used software, the bias is the ence samples which should contain all chemical and physical systematic difference between predicted and measured val- variations to be expected in the unknown samples, which will ues. The bias is computed as the average value of the be predicted later. The purpose of this calibration is to estab- residuals. The residual is the measure of the variation that lish a multiple linear regression between the NIR spectra data is not taken into account by the model. The residual for a or MIR spectra and the various parameters of the sample set given sample and a given variable is computed as the differ- [volatile compounds (water in the majority), lipid rates, or ence between the observed value and the fitted (or projected varietal origins]. The Jack-Knife technique [20] was used to or predicted) value of the variable on the sample. fix the required number of latent variables for model con- struction. Cross-validation was applied in regression. So, the optimal number of latent variables is determined on the basis 2.7 PLS-discriminant analysis regression of prediction of samples kept out from the individual model. The second step is to validate the model using a prediction set PLS regression is not per se suited to pattern recognition (different from the calibration one), i.e. to compare the values problems, as classification purposes. However, this technique obtained by the model to the values obtained by the reference can be adapted for classification [21–23], giving rise to the method. PLS-discriminant analysis (DA) method. PLS-DA is carried out using an exclusive binary coding scheme with one bit per class. For the codification of olive fruit origin, the five culti- 2.6 Multivariate calibration vars were arbitrarily classified in the order Aglandau, Cailletier, Olivie`re, Salonenque, and Tanche. Therefore, The evaluation of the calibration performance is estimated by for each sample, the origin may be represented by a five- computing the standard error of calibration (SEC) after dimensional output vector with 1 at the position correspond- comparing the real modification with the computed one ing to geographic origin and 0 at the other positions. As for each component. The formula for the standard error of instance, the samples numbered 2, which are of the calibration is: Cailletier cultivar, will be codified by the vector {0; 1; 0; vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u0 1 0; 0}. During the calibration process, the PLS-DA method is u PN u 0 2 trained to compute the five ‘‘membership values,’’ one for uB ðCi CiÞ C uBi 1 C SEC ¼ uB ¼ C (1) each class; the sample is then assigned to the class showing the t@ N 1 p A highest membership value [24]. Five models were computed, one for each origin. The performance of the calibration models was estimated from the percentage of correctly classi- 0 where Ci is the known value, C i the calculated value, fied samples (%CC). The %CC was estimated by the N the number of samples, and p is the number of formula: independent variables in the regression optimized by cross- validation. N %CC ¼ c 100 (4) The standard error of prediction (SEP) gives an esti- Nc þ Nic mation of the prediction performance during the step of where N is the number of correctly classified samples and N validation of the calibration equation: c ic is the number of incorrectly classified samples [25]. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u0 1 u PM u 0 2 uB ðCi CiÞ C uBi 1 C 2.8 Combined methods SEP ¼ uB ¼ C (2) t@ M A Three-way data can be analyzed with various methods. Combined analysis [26, 27] refers to multi-way analysis. 0 where Ci is the known value, C i the value calculated by the Data arrangement for combined PLS is realized with: calibration equation, and M is the number of samples in the X-Block: Virgin olive oil spectral data were arranged in a prediction set. two-way array by taking the NIR and the MIR absorbance as

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com 466 N. Dupuy et al. Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 columns, yielding the X-matrix. The X-matrix data consti- points) with the algorithm developed by Savitzky and Golay tuted the predictor set of variables. [28] to remove unwanted spectral variations as offsets. Y-Block: The Y-block data were the set of the dependent Absorbance values were corrected using mean normalization. variables: water content and oil content for the first part and None of the other mathematical treatments (multiplicative the origin of olives for the second part. scatter correction, second derivative, etc.) or wavelength ranges tested improved the prediction accuracy of the 2.9 K-Means models.

K-Means is an algorithm for data clustering. The samples will 2.11 Software be grouped into K clusters (user-determined number) based on a specific distance measurement, so that the sum of dis- The chemometric applications were performed by the tances between each sample and its cluster centroid is mini- Unscrambler software version 9.8 from CAMO (Computer mized. The procedure consists of the following three steps: Aided Modelling, Trondheim, Norway). (i) The algorithm is initiated by creating K different clus- 3 Results and discussion ters. The samples, included in the analysis, are first randomly distributed between these K different 3.1 NIR spectra of olive fruits clusters. (ii) As a next step, the distance between each sample and Typical NIR spectra of olive fruits are given in Fig. 1a, band its respective cluster centroid is calculated. assessments were realized according to the literature [5, 29, (iii) Samples are then attributed to the nearest cluster (K1 30]. The NIR spectra obtained for all samples seem to be or K2) based on the sample-cluster centroid distance. similar. The number of iterations is the number of times that the The significant bands due to oil are clearly visible in the K-means algorithm is repeated to obtain an optimal cluster- olive spectra (A-bands in Fig. 1a). The bands located at 4250 ing solution, each time starting with random initial clusters. and 4340 cm1 are characteristic of the combination of CH The sum of distances (SOD) is described as the sum of the stretching vibrations of CH3 and CH2 with other vibrations; distance values of each sample to their respective cluster the two bands at 5650 and 5800 cm1 correspond to the first centroid, summed up over all K clusters. This parameter is overtone of the CH stretching vibration of CH3,CH2, and uniquely calculated for each batch of clusters resulting from a CH¼CH, and the bands around 8300 cm1 correspond to single iteration of the algorithm. The results from all iter- the second overtone of the CH stretching vibrations of CH3, ations are compared, and the solution with the least SOD is CH2, and CH¼CH. retained. Two other bands, around 4500–5400 and 6100– 7500 cm1, are due to water (B-bands in Fig. 1a). They 2.10 Data pretreatment are assigned respectively to the combination of the OH stretching band and the OH bending band (5200 cm1) Data analysis was carried out using the full spectra. Mean and to the first overtone of the symmetric and asymmetric centering was used to enhance the smaller spectral differ- OH stretching bands (6900 cm1) or the combination of ences, removing the common information from the spectra. the symmetric and asymmetric OH stretching bands The spectral data were first derived (gap size of five data (6900 cm1) [31, 32]. A third band, with a very low

Figure 1. (a) NIR spectra of fresh olive fruits; (b) MIR spectra of fresh olive fruits. A: Bands due to oil; B: bands due to water.

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 Chemometric analysis of combined spectra of French olives 467

Table 1. Oil and water contents in olive fruits obtained by reference procedures.

Olive samples Oil content Water content

Crop years n Mean (%) SD Range (%) Mean (%) SD Range (%)

2005/2006 50 21.1 3.8 12.2–28.5 49.2 7.9 32.6–67.6 2006/2007 37 21.2 3.8 11.8–27.6 53.5 6.6 42.1–68.1 2007/2008 91 21.1 4.1 11.9–28.5 47.2 6.7 34.2–60.9 Total 178 21.1 3.9 11.8–28.5 51.6 8.4 32.6–68.1

SD, Standard deviation.

absorbance around 8600 cm1 is generally considered as a in pitted olive fruits by chemometric analysis of the NIR spec- combination of the symmetric and asymmetric OH stretching tra, MIR spectra or NIR þ MIR combined spectra using the and OH bending bands [[32]. PLS algorithm. The data set was randomized without reference to the crop year, and 70% of the data set (n ¼ 125) was used in 3.2 MIR spectra of olive fruits calibration and 30% (n ¼ 53) was used in prediction.

The MIR spectra obtained for all samples of olive fruits also 3.3.1 Prediction from NIR analysis seem to be similar. Typical spectra are given in Fig. 1b band assessments were realized according to the literature [33, 34]. The best results were obtained using the mean normalization The significant bands of water are clearly visible in the olive and nine latent variables to build the model of oil content and spectra at 3400 and 1641 cm1. The other bands are charac- six latent variables to build the model of water content teristic of olive oil and dry matter in the fruit. (Table 2). The prediction performance is satisfactory: R is higher than 0.95 and REP is 5.8% for oil content and 3.9% 3.3 Prediction (PLS) of oil content and water content for water content. Oil and water contents can be predicted in olive fruits directly on olive fruit spectra without standard time-consum- ing determinations. The reference data (n ¼ 178) on oil and water contents were obtained by the traditional procedures. Descriptive statistics 3.3.2 Prediction from MIR analysis including range, mean, SD of reference values of oil content and water content are presented by year in Table 1. These The best results were obtained using the mean normalization results show the variation range of the oil and water contents for and six or seven latent variables to build the models (Table 2). different crop years. The oil and water contents were predicted Compared with results obtained from NIR spectra, the

Table 2. Prediction of oil and water contents in olive fruits by NIR, MIR, and NIR þ MIR combined spectroscopy.

Oil content Water content

§ § § § NIR MIR$ NIR þ MIR NIR MIR$ NIR þ MIR

Calibration R2 0.98 0.94 0.97 0.98 0.97 0.97 SEC 0.85 1.32 0.68 1.27 1.94 1.26 Bias 0.01 0.06 0.02 1.22 0.51 0.04 Latent variables 9 6 6 6 7 7 Prediction R2 0.96 0.85 0.98 0.97 0.93 0.98 SEP 1.18 1.86 0.72 1.98 2.57 1.28 Bias 0.05 0.23 0.06 0.03 0.17 0.06 REP [%] 5.8 9.1 3.5 3.9 5.1 2.6

§ Models with 62.5% of the data set, and first derived spectra. $Models with 62.5% of the data set, standard normal variate (SNV) treatment correction. R2, Coefficient correlation; SEC, standard error of calibration; SEP, standard error of prediction; REP, relative error of prediction.

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com 468 N. Dupuy et al. Eur. J. Lipid Sci. Technol. 2010, 112, 463–475

Figure 2. (a) PCA obtained in the NIR range; (b) first principal component; (c) second principal component.

prediction performance is unsatisfactory: REP is higher than case, the variables are normalized using the maximum nor- 5% for the water content and higher than 9% for the oil malization, because absorbance intensities are not the same in content. the two spectral ranges. This correction confers an equal importance to each spectral information. The results are 3.3.3 Prediction from the combined NIR and MIR shown in Table 2. Compared with the results obtained only spectra from the NIR or from the MIR spectra, i.e. from the separate spectral ranges, the results obtained from the whole spectral In order to test the advantage to use the information provided range are better: REP is 2.6% for the water content and 3.5% by the two spectral regions, the data were combined. In this for the oil content.

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 Chemometric analysis of combined spectra of French olives 469

Figure 3. (a) PCA obtained in the MIR range; (b) first principal component; (c) second principal component.

So, the use of the two spectral regions has considerably (REP ¼ 3.5%) compared to a recent study conducted by improved the results for the prediction of oil and water con- NIR and in particular the one on the pulp of 287 olive batches tents in olive fruits. In all three cases, the REP values found (REP ¼ 4.1%) [6]. for the oil content are higher than the ones found for the water content. Similar observations can be made on the results 3.4 Prediction of olive fruit cultivar origin found by Leon et al. [35] on frozen olive fruits and by several authors [4–6] on mixed olives. Furthermore, the use of the Principal component analyses (PCA) were performed on combined NIR and MIR spectra of fresh whole unsorted each data set to study the data structure. Figures 2a olives is faster and leads to improved prediction of oil content and 3a show the PCA obtained in the NIR and in the

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com 470 N. Dupuy et al. Eur. J. Lipid Sci. Technol. 2010, 112, 463–475

Figure 4. (a) PCA obtained in the NIR and MIR combined ranges; (b) first principal component; (c) second principal component.

MIR ranges, respectively. All distinct groups could be combined PCA, the first and the second loadings (Fig. 4b, c) found. The PCA obtained for the combined spectra present, in the 5000–6000 cm1 range, different variations (Fig. 4a) (700–1800, 4500–9500 cm1) shows a better sep- compared with the loadings obtained in the NIR range aration of the Olivie`re group and the Aglandau group than the (Fig. 2b, c), with a significant decrease of the intensities of one obtained in the NIR or MIR ranges. Examination of the the bands at 5668, 5765, 5879, and 5911 cm1 especially for principal components shows that the information present in the first loading while the band at 5286 cm1 remains practi- the combined analysis was not the same as the one present in cally unchanged, and for the second loading an inversion of each separate range, in particular in the NIR range. For the the bands at 5286 and 5026 cm1 while others remain at the

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 Chemometric analysis of combined spectra of French olives 471

Table 3. Classification matrix obtained in prediction for NIR, MIR and NIR þ MIR combined with first derived spectra.

Number of Aglandau Cailletier Olivie`re Salonenque Tanche False-negative Variety latent variables (n ¼ 40) (n ¼ 11) (n ¼ 14) (n ¼ 14) (n ¼ 16) samples

NIR models Aglandau 8 34 1 5 6 Cailletier 7 7 4 Olivie`re 9 1 13 1 1 Salonenque 5 11 3 Tanche 9 1 16 0 MIR models Aglandau 11 35 1 5 Cailletier 7 10 1 Olivie`re 9 2 12 2 Salonenque 10 2 12 2 Tanche 9 1 15 1 Combined models Aglandau 10 40 0 Cailletier 9 11 1 0 Olivie`re 11 13 1 Salonenque 8 14 0 Tanche 10 16 0

same intensities. The information used in the MIR range obtained in the NIR range for the Olivie`re, Salonenque, and seems to be the same in the MIR analysis and in the combined Tanche varieties. The results are significantly better for the one for the first principal component (Figs. 3b and 4b) and a Cailletier variety. little different for the second one (Figs. 3c and 4c). Regarding the traceability of the olive oils, the determi- nation of the varietal origin of the olive fruits is very useful for 3.4.3 Prediction from combined NIR and MIR spectra the producers of protected designation of origin (PDO) olive oils. Thus, the spectra were used to determine the varietal As the combination of the two spectral ranges gives better origin of the olive fruits. The data set was randomized: 55% of results for the determination of the oil content and the water the data set (n ¼ 128) was used in calibration, and the pre- content, it is interesting to test this combination for the diction set included 95 olive fruit samples (Aglandau n ¼ 40, determination of olive fruit origin. The results obtained with Cailletier n ¼ 11, Olivie`re n ¼ 14, Salonenque n ¼ 14, and the combined NIR and MIR spectra are given in Table 3: Tanche n ¼ 16). The prediction performance of the models 99% of the samples are correctly classified and now there is was estimated from the percentage of correctly classified only one false positive and one false negative prediction. samples. Figure 5 allows the comparison between varietal origin predictions obtained in the NIR spectral region, in the MIR 3.4.1 Prediction from NIR spectra region, and in the combined analysis with the Aglandau models, which give the worst results in the NIR spectral The results obtained for each cultivar origin in NIR are given region. in Table 3. Of the samples, 85% are correctly classified; 9 Figure 5c shows that the combined model provided pre- samples are badly classified as false positives and 14 as false dicted values between 0.65 and 1.40 for the samples belong- negatives (false positive: sample classified in a class whereas it ing to the origin and predicted values between 0.35 and belongs to another class, false negative: sample not classified 0.60 for the samples not belonging to the origin. So, if in its own class). Three models give bad results: the Cailletier, the results between 0.30–0.70 are considered as suspicious the Salonenque, and the Aglandau ones with less than 85% of results, the combined analysis shows a high improvement samples correctly classified. The model built for the Tanche compared with the separate MIR or NIR analyses. In the variety gives the best results. NIR range (Fig. 5a), 23 Aglandau samples are correctly classified as Aglandau with a predicted value higher than 3.4.2 Prediction from MIR spectra 0.7 and 16 Aglandau samples have a suspicious predicted value between 0.3 and 0.7 with regard to their origin. Twelve The results obtained for the origin of each cultivar in MIR are other cultivar samples (one Cailletier, one Olivie`re, eight a little better (Table 3): Of the samples, 88% are correctly Salonenque, and two Tanche) are suspicious with a predicted classified and now there are only six samples badly predicted value between 0.3 and 0.7 because they could be considered as false positives and 11 as false negatives. Similar results are as Aglandau cultivar. Among them, five Salonenque samples

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com 472 N. Dupuy et al. Eur. J. Lipid Sci. Technol. 2010, 112, 463–475

Figure 5. Prediction of the varietal origin by the Aglandau model (a) in the NIR spectral region, (b) in the MIR spectral region, and (c) in the NIR and MIR spectral regions. have high predicted values (between 0.5 and 0.7). In the MIR 1 Cailletier sample and 1 Olivie`re sample are suspicious with range (Fig. 5b), 16 samples are correctly classified as a predicted value between 0.3 and 0.5. The same remarks can Aglandau with a predicted value higher than 0.7 and 24 be made for all cultivar models. Thus, the combination of Aglandau samples have a suspicious predicted value between NIR and MIR spectroscopy allows not only an increase of the 0.3 and 0.7 with regard to their origin. One Cailletier sample percentage of correctly classified samples but also an increase is suspicious with 0.35 as predicted value. In the combined of their discriminator power. range (Fig. 5c), 38 samples are correctly classified as The use of the combined spectral range increases the Aglandau with a predicted value higher than 0.7 and quality of the prediction, so the use of the spectral

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 Chemometric analysis of combined spectra of French olives 473

Figure 6. First loading obtained with PLS-DA on the Aglandau model (a) in the MIR range, (b) in the NIR range, and (c) in the combined range. information may be different in each spectral range. Figure 6 particularly in the 5500–6200 cm1 range. This fact could shows the first loading obtained for the Aglandau model in explain the bad results obtained for the Aglandau model in each range. In the NIR range, the loadings are different from the NIR range. It was interesting to observe that the use of the combined one (Fig. 6c) and from the NIR one (Fig. 6b), two different spectral ranges changes the correlation found by

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com 474 N. Dupuy et al. Eur. J. Lipid Sci. Technol. 2010, 112, 463–475

Figure 7. PCA on the PLS scores of Aglandau (A), Cailletier (C), Olivie`re (O), Salonenque (S), and Tanche (T) cultivars.

the PLS method between the spectral data and the predicted The authors are grateful to Carole Fusari (Centre Technique de value. l’Olivier, Aix-en-Provence, France) and Bernat Argile`s for their In order to obtain a supervised clustering, a K-means technical assistance. clustering method was applied to the predicted PLS results obtained for the five models. The Euclidian distance was used The authors have declared no conflict of interest. and the model was optimum for 50 iterations. The K-means results were optimum for five groups and an SOD of 2.97. Figure 7 presents the PCA obtained on the PLS scores of the References five variety predictions. The samples are represented by color [1] Uceda, M., Hermoso, M., Garcia-Ortiz, A., Jimenez, A., according to the results of the K-mean clustering. After the K- et al. Intraspecific variation of oil content and the character- means clustering, the samples are gathered into five groups istics of oils in olive cultivars. Acta Hortic. ISHS 1999, 474, with 100% of well-classified samples. 659–662. [2] Gomez-Caravaca, A. M., Cerretani, L., Bendini, A., Segura- 4 Conclusions Carretero, A., et al. Effects of fly attack (Bactrocera oleae)on the phenolic profile and selected chemical parameters of olive oil. J. Agric. Food Chem. 2008, 56, 4577–4583. Infrared spectroscopy presents high potential for quality con- [3] Del Rı`o, C., Romero, A. M., Whole unmilled olives can be trol of fresh olive fruits. Two spectral regions were analyzed, used to determine their oil content by nuclear magnetic MIR and NIR. The two techniques taken separately provided resonance. Hortechnology 1999, 9, 675–680. interesting results for quantifying the water and oil contents [4] Jimenez, A., Izquierdo, E., Rodriguez, F., Duenas, J. I., et al. in the fresh olive fruits; however, chemometric treatment of Determination of fat and moisture in olives by near-infrared the combined infrared spectroscopy using both spectral reflectance spectroscopy. Grasas Aceites 2000, 51, 311–315. regions provides an obvious advantage compared to the tech- [5] Ayora-Canada, M. J., Muik, B., Garcia-Mesa, J. A., Ortega- niques used separately. Calderon, D., et al. Fourier-transform near-infrared spec- troscopy as a tool for olive fruit classification and quantitative Moreover, the use of the combined infrared spectroscopy analysis. Spectrosc. Lett. 2005, 38, 769–785. allows the recognition of the varietal origin of five principal [6] Bendini, A., Cerretani, L., Di Virgilio, F., Belloni, P., et al. French cultivars with 100% of good classification after the In-process monitoring in industrial olive mill by means of use of a K-means clustering method. The work has shown the FT-NIR. Eur. J. Lipid Sci. Technol. 2007, 109, 498–504. advantage of the use of different analytical methods in the [7] Mesa, J. A. G., Fernandez, M. H., Alonso, P. C., same matrix for complex analysis. Determination of oil and moisture in two-phases olive-

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com Eur. J. Lipid Sci. Technol. 2010, 112, 463–475 Chemometric analysis of combined spectra of French olives 475

pomace by using near-infrared spectroscopy. Grasas Aceites [22] Vong, R., Geladi, P., Wold, S., Esbensen, K., Source con- 1996, 47, 317–322. tributions to ambient aerosol calculated by discriminant par- [8] Muik, B., Lendl, B., Molina-Diaz, A., Perez-Villarejo, L., tial least squares regression (PLS). J. Chemom. 1988, 2, 281– et al. Determination of oil and water contents in olive pomace 286. using near infrared and Raman spectrometry. A comparative [23] Vandeginste, G. M., Massart, D. L., Buydens, L. M. C., study. Anal. Bioanal. Chem. 2004, 379, 35–41. De Jung, S., et al. Handbook of Chemometrics, Part B, Elsevier, [9] Sanchez, A. G., Martos, N. R., Ballesteros, E., Comparative Amsterdam, The Netherlands 1998. study of various analytical techniques (NIR and NMR spec- [24] Kemsley, K., Discriminant analysis of high-dimensional troscopies, and Soxhlet extraction) for the determination of data: A comparison of principal components analysis and the fat and moisture content of olives and pomace obtained partial least squares data methods. Chemom. Intell. Lab Syst. from Jaen (Spain). Grasas Aceites 2005, 56, 220–227. 1996, 33, 47–61. [10] Leon, L., Rallo, L., Garrido, A., Analisis de aceituna intacta [25] Roussel, S., Bellon-Maurel, W., Roger, J. M., Grenier, P., mediante esectroscopia en el infrarrojo cercano (NIRS): Una Authenticating white grape must variety with classification herramienta de utilidad en programas de mejora de olivo. models based on aroma sensors, FT-IR and UV spectrom- Grasas Aceites 2003, 54, 41–47. etry. J. Food Eng. 2003, 60, 407–419. [11] Moutier, M., Pinatel, C., Martre, A., Roger, J. P., et al. [26] Bra´s, L. P., Bernardino, S. A., Lopes, J. A., Menezes, J. C., Identification et Caracte´risation des Varie´te´s d’Olivier Cultive´es Multiblock PLS as an approach to compare and combine en France, Naturalia Publications, Turriers (France) 2004. NIR and MIR spectra in calibrations of soybean flour. [12] Bertran, E., Blanco, M., Iturriaga, H., Maspoch, S., et al. Chemom. Intell. Lab Syst. 2005, 75, 91–99. Near infrared spectometry and pattern recognition as screen- [27] Ciosek, P., Brzozka, Z., Wroblewski, W., Martinelli, E., et al. ing methods for the authentification of virgin olive oils of very Direct and two-stage data analysis procedures based on PCA, close geographical origins. J. Near Infrared Spec. 2000, 8, 45– PLS-DA and ANN for ISE-based electronic tongue – Effect 52. of supervised feature extraction. Talanta 2005, 67, 590– [13] Downey, G., McIntyre, P., Davies, A. N., Geographic classi- 596. fication of extra virgin olive oils from the Eastern [28] Savitzky, A., Golay, M. J. E., Smoothing and differentiation Mediterranean by chemometric analysis of visible and of data by simplified least squares procedures. Anal. Chem. near-infrared spectroscopic data. Appl. Spectrosc. 2003, 57, 1964, 36, 1627–1679. 158–163. [29] Hourant, P., Baeten, V., Morales, M. T., Meurens, M., et al. [14] Galtier, O., Dupuy, N., Le Dre´au, Y., Ollivier, D., et al. Oil and fat classification by selected bands of near- Geographic origins and compositions of virgin olive oils infrared spectroscopy. Appl. Spectrosc. 2000, 54, 1168–1174. determinated by chemometric analysis of NIR spectra. [30] Bu¨ning-Pfaue, H., Analysis of water in food by near infrared Anal. Chim. Acta 2007, 595, 136–144. spectroscopy. Food Chem. 2003, 82, 107–115. [15] Pinheiro, P. B. M., da Silva, J., Chemometric classification of [31] Maeda, H., Ozaki, Y., Tanaka, M., Hayashi, N., et al. Near olives from three Portuguese cultivars of Olea europaea L. spectroscopy and chemometric studies of temperature Anal. Chim. Acta 2005, 544, 229–235. dependent spectral variations of water: Relationship between [16] American Oil Chemists’ Society (AOCS):. Official Methods spectral changes and hydrogen bonds. J. Near Infrared Spec. and Recommended Practices of the AOCS, AOCS Press, 1995, 3, 191–201. Champaign, IL, USA 2003, Method Ca 5a–40. [32] Bertrand, D., Spectroscopie de l’Eau, in: La spectroscopie [17] Fuller, M. P., Griffiths, P. R., Diffuse reflectance measure- infrarouge et ses applications analytiques, 2nd Edn., Tec & Doc ments by infrared Fourier transform spectrometry. Anal. editions, Paris, France 2006, pp. 94–104. Chem. 1978, 50, 1906–1910. [33] Hafidi, M., Amir, S., Revel, J. C., Structural characterization [18] Martens, H., Naes, T., Multivariate Calibration, John Wiley, of olive mill waste-water after aerobic digestion using elemen- Chichester, UK 1989. tal analysis, FTIR and 13CNMR.Process Biochem. 2005, 40, [19] Liang, Y. L., Kvalheim, O. M., Robust methods for multi- 2615–2622. variate analysis – a tutorial review. Chemom. Intell. Lab. Syst. [34] Droussi, Z., D’Orazio, V., Provenzano, M. R., Hafidi, M., 1996, 32, 1–10. et al. Ouatmane Study of the biodegradation and transform- [20] Haaland, D. M., Thomas, E. V., Partial least-squares ation of olive-mill residues during composting using FTIR methods for spectral analyses. 1. Relation to other quanti- spectroscopy and differential scanning calorimetry. J. Hazard tative calibration methods and the extraction of qualitative Mater. 2009, 164, 1281–1285. information. Anal. Chem. 1988, 60, 1193–1202. [35] Leon, L., Garrido-Varo, A., Downey, G., Parent and harvest [21] Sathle, L., Wold, S., Partial least squares analysis with cross- year effects on near-infrared reflectance spectroscopic validation for the two-class problem: A Monte Carlo study. analysis of olive (Olea europaea L.) fruit traits. J. Agric. J. Chemom. 1987, 1, 185–196. Food Chem. 2004, 52, 4957–4962.

ß 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ejlst.com