Infometrix for OpticalSpectrometers ImprovingResults Calibration Brian Rohrback , Infometrix, Inc.
PEFTEC Conference November 2015 November Conference PEFTEC
Infometrix 4. 3. 2. 1. into routine practice. This effort generated the following observations: construction to make the models significantly more robust when put with the goal of streamlining the process of chemometric model last five years, a series of algorithmic approaches have been examined performance in support of refinery and chemical plant labs. Over the effort devoted to maintaining and optimizing spectroscopic model A project has been undertaken to how assess to reduce the amount of Abstract product giveaway. It is ofbenefit to software minimize maintenance frequency to control and staffing changes and lack of training undermines subsequent recalibrations; Even a if calibration was performed installation, properly during initial not be required; to changes crudein slates and blending component composition, but may of Recalibration an optical spectrometer is perceived to warrantedbe due chemometric models, these practices followed; are infrequently Even though there are published “Best Practices” for generating
2
Infometrix referenced referenced methods that constitute the best technologies available. improvement in calibration quality is available through the use of well replacement of either the hardware or software in place. Additional Much of this improvement can attainedbe without requiring calibrations that are more reliable calibration is examined in detail and a path is outlined for building laboratory leading to an improvement in calibration procedures for on algorithm. This presentation reports on a multi detection, selection of the number of factors, and even choice of an impact on the ultimate quality of a chemometrics calibration: outlier and life consider the limitations that ultimately determine the eventual success In order to use optical analyzers effectively, the analyst needs to Abstract, continued - expectancy expectancy of any calibration. There are many areas that have multiwavelength
spectrometers. Here the process of
and less sensitive to process shifts
- organization effort - line line and
. - 3
Infometrix problems in building a chemometric system? inachemometric building problems dowe What problems see createmost that the 38 Years ofChemometrics 7. 6. 5. 4. 3. 2. 1. Poor Complex samples Changing protocol Non Overfitting Groupings Poorly characterized standards - instrument instrument optimal optimal calibration set in the data the inferential model stability
4
Infometrix require more frequent recalibration. The penalty you pay for Soft Models is that they will their measurements more quickly. case with optical spectroscopy, they can often provide work with Soft Modeling techniques are favoredas, in the more stable over time, but often,instruments designed to You would like to fortry “Hard Models”asthey will be feedstock and theprocess parameters themselves). variation, typically instrumental and process (from the application, you need to first lookat sources the of When looking at an analyzer technology for aspecific Analytical Chemistry Basics
-
1
5
Infometrix • • • chemometric chemometric modelsare inferential Inferential models employ chemometrics, but not all Whatis a Soft Model? What isa Hard Model? – – – – Analytical Chemistry Basics Inferential to the reference using a reference method; apply PLS to calibrate the NIR response compositions: measure the BTU content of the gasesas a whole Application of NIR on natural gas samples representing different Hydroxyl value of polyurethane and addto get BTU content associated heat of combustion for each of the components, multiply GC for natural gas: separate all of the components, look up the
polyol
prepolymers -
2
6
Infometrix • • • • Consider … So, what isthe real driving time? way;the alwaysthere’s something. fill tank,the andsometimes you drop yourkids off along But, Fridays you stop for Thursdays you donuts, usually early and Imadeall the lights and gothere in 20.” was an accident. Yesterday was abreeze;left a little You answer “Todayit took me45 but minutes, there it take to get to work?” New colleague says ‘Hey, I live near you; howlong does – – – A single day’sA single is not result necessarily indicative of future expectation period of time We can develop some measures, getting feeling by over multiple a It depends on many factors
7
Infometrix Building an understanding of Error 20 24 28 32 36 40 44 48 Reference Values = Driving Time
8
Infometrix Building an understanding of Error 85.5 86 86.5 87 87.5 88 88.5 89 20 24 28 32 36 40 44 48 Reference Values = OctaneRating Reference Values = Driving Time
9
Infometrix • • Models Built for Spectrometers Sources of Error in All Chemometric “Spectrometric” Error “Spectrometric” Error Reference – – – – – – method Precision is typically much better than reference Notperfect Instrument error error + in chemometric model reproducibility Error assessedinterms of precision and/or Nevertheless, it is considered true to be (accurate) The reference method is not perfect
10
Infometrix • • Two examplesto illustrate RMSEP related? How are Reference Error, Spectrometric Error, and Understanding RMSEP RMSEP = (RefError Example Bconsiders a population of samples Example A looks ata singlesample √
2
+ SpectrometricError
2 )
11
Infometrix • are Error Spectrometric and Reference v
Case A1 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.5 1.0 1.5 1.94 2.06
2.0
2.5 2.83
Similar 3.0
12
Infometrix Spectrometric Error reduced Error Spectrometric Case A2 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.5 1.0 1.5
1.81 1.94
2.0 2.65 2.5
3.0 13
Infometrix Spectrometric Error reduced further… reduced Error Spectrometric Case A3 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.5 1.0 1.49 1.5
1.94
2.0 2.45
2.5
3.0 14
Infometrix …and further… …and Case A4 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.5 1.0 1.20
1.5 1.94
2.0 2.28
2.5 3.0 15
Infometrix …and further… …and Case A5 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.5 0.98 1.0
1.5 1.94
2.0 2.17
2.5 3.0 16
Infometrix …and further… …and Case A6 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.51 0.5
1.0 1.5 1.94 2.01
2.0
2.5 3.0 17
Infometrix …and further… …and Case A7 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.21
0.5 1.0 1.5 1.94 1.95
2.0 2.5 3.0 18
Infometrix …and further… …and Case A8 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.5 1.0 1.5 1.94 1.94
2.0 2.5 3.0 19
Infometrix …until the Spectrometric Error is zero is Error Spectrometric the …until Case A9 Spectrometric Error Spectrometric Reference Error Reference RMSEP
0.0 0.5 1.0 1.5 1.94 1.94
2.0 2.5
3.0 20
Infometrix Case B1 Spectrometric Error Spectrometric Reference Error Reference Spectrometric Error >> Reference Error
RMSEP 0.0 0.98 1.0
2.0 2.99 3.15 3.0
21
Infometrix Spectrometric Error decreases by ~1/3, RMSEP decreasesby < 1/3 Case B2 Spectrometric Error Spectrometric Reference Error Reference
RMSEP 0.0 0.98 1.0
2.07 2.0 2.29
3.0
22
Infometrix Case B3 Spectrometric and Reference Errors are about equal Spectrometric Error Spectrometric Reference Error Reference
RMSEP 0.0 0.98 1.02 1.0
1.41
2.0 3.0
23
Infometrix Case B4 Spectrometric Error is about 1/3 the Reference Error Spectrometric Error Spectrometric Reference Error Reference
RMSEP 0.0 0.39
0.98 1.05 1.0
2.0 3.0
Error Reference approaching RMSEPis
24
Infometrix Spectrometric Error reduced by ~50% Case B5 Spectrometric Error Spectrometric Reference Error Reference
RMSEP 0.21 0.0
0.98 1.00 1.0
vs
Case B4, little effect on RMSEP 2.0 3.0 Error Reference as same the about RMSEPis
25
Infometrix Case B6 Spectrometric Error is zero, RMSEP equals Reference Error Spectrometric Error Spectrometric Reference Error Reference
RMSEP 0.0 0.98 0.98 1.0
2.0 3.0
26
Infometrix RMSEP is Limited by Reference Error At “the “the model cannot better be than the
Spectrometric Error Total Error (RMSEP) Error … value approaches Reference the Total system error (RMSEP) Spectrometric Error Spectrometric = 0, the incorrect conclusion is that
Error approaches zero Error approaches … as Spectrometric
r eference eference method”
27
Infometrix Model Construction Issues A clear case of not handling the data properly Groupings in the Groupings in data the to be biased to bebiased low D istribution istribution appears
28
Infometrix • • Segmented Predictions approachcould be attempted: sufficiently thatdifferent a hierarchical, or segmented, composition of sub In case the of motor fuel properties, thechemical regressions mightimprove the overall prediction quality. cluster, breaking the calibration problem into multiple When samples donot group into a single homogeneous – – model, model, to determine the quality rating Use a second model, one for each category found by the first Use one model to classify the grade - grade andsuper
- grade fuels is 29
Infometrix
Pred Val 80 82 84 86 more more precise measurement of fuel properties range that Feeding Sub 80 MON - - forward forward information on the
and Super is being modeled and can provide a better model and 82
Subgrade Measured Y Supergrade 84
86 - Grade Fuels 4
blend blend reduces the property Pred Val 88 90 92 94 RON 88 .
Subgrade
90 Measured Y Supergrade
92
94 30
5 Infometrix • • Comparison of Octane Models with premium. grade gasoline, but there is a significant drop in the error associated separately, there is just a small change in the evaluation of sub If we build a model for the two ranges of octane measurements cross validation for RONis 0.22 and for MON is 0.25. For the data set that includes both grades, the standard error of MON RON
All Samples 0.25 0.22
Subgrade 0.23 0.22
Premium 0.20 0.15
- 31
Infometrix • • Result: a More Parsimonious Model significantlyreducedandlikelyprovides moreover stability time. Note thatthe numberoffactors requiredto build the PLSmodelis considerably. availableimprovestheinformation fuel propertyassessment Lookingat severalproperties, the value ofintegrating readily MON Benz RON d90 d50 Ole Aro IBP SECV Outliers Removed Outliers
All Samples All 3.69 1.54 1.99 0.04 0.24 0.23 0.58 0.87 Factors 9 8 7 8 7 7 7 7 SECV Outliers Removed Outliers 3.66 0.98 1.83 0.02 0.23 0.22 0.57 0.76 Subgrade Factors 4 7 3 5 5 5 4 3 SECV Outliers Removed Outliers 1.65 1.18 1.55 0.02 0.20 0.15 0.32 0.84 Premium Factors 4 4 6 4 5 3 4 2 -
32
Infometrix
Standard Error Choosing Best #of Factors Fitting Fitting Information # of # of factors
Optimum
Fitting Fitting Noise
33
Infometrix • • • Avoiding Overfitting samples apply thenoise portion of model the to prediction A model with more factorsthan necessary will try to is random Prediction data will have differentnoise; after all, noise As more factors areadded – – model noise variation inthe training data arebuilt into the calibration error isreduced –
and fail –
resulting resulting in increased error
34
Infometrix Noise, std 35dev = Noise added to mean spectrum to mean added Noise
Meanspectrum + noise
35
Infometrix Impact of Noise on Predictions
Infometrix • • • Avoiding Overfitting 2 error prediction samples apply theuncorrelated portion of themodel to A model with more factorsthan will necessary try to Prediction data will have differentlevels of interferent As more factors areadded – – property may be added to model information from interferent not correlated to calibration error isreduced
–
and fail
–
resulting resulting in increased
37
Infometrix Addition of a Small Perturbation Added intensity:peak 10,000 to1,000
Infometrix Overfitting’s Impact on Predictions Relative Magnitude PerturbationRelative of Magnitude
Infometrix • • • Jaggedness for Model Complexity Jaggedness vector regression Quantify when overfitting overlay structure shows noise usually vector Regression model regression for a of factors number alwaysindicateoptimal may not SECV
‘shape’ ‘shape’
40
Infometrix Jaggedness Demonstrated SECV
Calibration Spectra Regression Regression Vector
Jaggedness
41
Infometrix PLS Model Has Outliers RMSECV= 0.55 First First PLS Model
Use diagnostics to flag outliers to flag
42
Infometrix Second PLS RMSECV= 0.23 Second Second PLS Model
–
Additional Outliers Use diagnostics to flag outliers to flag
43
Infometrix Robust PLS RMSECV= 0.55 First First PLS Model
–
Just Use diagnostics to flag outliers to flag O
ne Pass
44
Infometrix Robust Model RMSECV= 0.12 Robust Robust PLS Model
Use diagnostics – to flag outliers to flag
No More Outliers
45
Infometrix problems in building a chemometric system? inachemometric building problems dowe What problems see createmost that the 38 Years ofChemometrics 7. 6. 5. 4. 3. 2. 1. Poor Complex samples Changing protocol Non Overfitting Groupings Poorly characterized standards - instrument instrument optimal optimal calibration set in the data the inferential model stability
46