Infometrix for OpticalSpectrometers ImprovingResults Calibration Brian Rohrback , Infometrix, Inc.

PEFTEC Conference November 2015 November Conference PEFTEC

Infometrix 4. 3. 2. 1. into routine practice. This effort generated the following observations: construction to make the models significantly more robust when put with the goal of streamlining the process of chemometric model last five years, a series of algorithmic approaches have been examined performance in support of refinery and chemical plant labs. Over the effort devoted to maintaining and optimizing spectroscopic model A project has been undertaken to how assess to reduce the amount of Abstract product giveaway. It is ofbenefit to software minimize maintenance frequency to control and staffing changes and lack of training undermines subsequent recalibrations; Even a if calibration was performed installation, properly during initial not be required; to changes crudein slates and blending component composition, but may of Recalibration an optical is perceived to warrantedbe due chemometric models, these practices followed; are infrequently Even though there are published “Best Practices” for generating

2

Infometrix referenced referenced methods that constitute the best technologies available. improvement in calibration quality is available through the use of well replacement of either the hardware or software in place. Additional Much of this improvement can attainedbe without requiring calibrations that are more reliable calibration is examined in detail and a path is outlined for building laboratory leading to an improvement in calibration procedures for on algorithm. This presentation reports on a multi detection, selection of the number of factors, and even choice of an impact on the ultimate quality of a calibration: outlier and life consider the limitations that ultimately determine the eventual success In order to use optical analyzers effectively, the analyst needs to Abstract, continued - expectancy expectancy of any calibration. There are many areas that have multiwavelength

. Here the process of

and less sensitive to process shifts

- organization effort - line line and

. - 3

Infometrix problems in building a chemometric system? inachemometric building problems dowe What problems see createmost that the 38 Years ofChemometrics 7. 6. 5. 4. 3. 2. 1. Poor Complex samples Changing protocol Non Overfitting Groupings Poorly characterized standards - instrument instrument optimal optimal calibration set in the data the inferential model stability

4

Infometrix require more frequent recalibration. The penalty you pay for Soft Models is that they will their measurements more quickly. case with optical , they can often provide work with Soft Modeling techniques are favoredas, in the more stable over time, but often,instruments designed to You would like to fortry “Hard Models”asthey will be feedstock and theprocess parameters themselves). variation, typically instrumental and process (from the application, you need to first lookat sources the of When looking at an analyzer technology for aspecific Analytical Basics

-

1

5

Infometrix • • • chemometric chemometric modelsare inferential Inferential models employ chemometrics, but not all Whatis a Soft Model? What isa Hard Model? – – – – Basics Inferential to the reference using a reference method; apply PLS to calibrate the NIR response compositions: measure the BTU content of the gasesas a whole Application of NIR on natural gas samples representing different Hydroxyl value of polyurethane and addto get BTU content associated heat of combustion for each of the components, multiply GC for natural gas: separate all of the components, look up the

polyol

prepolymers -

2

6

Infometrix • • • • Consider … So, what isthe real driving time? way;the alwaysthere’s something. fill tank,the andsometimes you drop yourkids off along But, Fridays you stop for Thursdays you donuts, usually early and Imadeall the and gothere in 20.” was an accident. Yesterday was abreeze;left a little You answer “Todayit took me45 but minutes, there it take to get to work?” New colleague says ‘Hey, I live near you; howlong does – – – A single day’sA single is not result necessarily indicative of future expectation period of time We can develop some measures, getting feeling by over multiple a It depends on many factors

7

Infometrix Building an understanding of Error 20 24 28 32 36 40 44 48 Reference Values = Driving Time

8

Infometrix Building an understanding of Error 85.5 86 86.5 87 87.5 88 88.5 89 20 24 28 32 36 40 44 48 Reference Values = OctaneRating Reference Values = Driving Time

9

Infometrix • • Models Built for Spectrometers Sources of Error in All Chemometric “Spectrometric” Error “Spectrometric” Error Reference – – – – – – method Precision is typically much better than reference Notperfect Instrument error error + in chemometric model reproducibility Error assessedinterms of precision and/or Nevertheless, it is considered true to be (accurate) The reference method is not perfect

10

Infometrix • • Two examplesto illustrate RMSEP related? How are Reference Error, Spectrometric Error, and Understanding RMSEP RMSEP = (RefError Example Bconsiders a population of samples Example A looks ata singlesample √

2

+ SpectrometricError

2 )

11

Infometrix • are Error Spectrometric and Reference v

Case A1 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.5 1.0 1.5 1.94 2.06

2.0

2.5 2.83

Similar 3.0

12

Infometrix Spectrometric Error reduced Error Spectrometric Case A2 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.5 1.0 1.5

1.81 1.94

2.0 2.65 2.5

3.0 13

Infometrix Spectrometric Error reduced further… reduced Error Spectrometric Case A3 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.5 1.0 1.49 1.5

1.94

2.0 2.45

2.5

3.0 14

Infometrix …and further… …and Case A4 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.5 1.0 1.20

1.5 1.94

2.0 2.28

2.5 3.0 15

Infometrix …and further… …and Case A5 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.5 0.98 1.0

1.5 1.94

2.0 2.17

2.5 3.0 16

Infometrix …and further… …and Case A6 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.51 0.5

1.0 1.5 1.94 2.01

2.0

2.5 3.0 17

Infometrix …and further… …and Case A7 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.21

0.5 1.0 1.5 1.94 1.95

2.0 2.5 3.0 18

Infometrix …and further… …and Case A8 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.5 1.0 1.5 1.94 1.94

2.0 2.5 3.0 19

Infometrix …until the Spectrometric Error is zero is Error Spectrometric the …until Case A9 Spectrometric Error Spectrometric Reference Error Reference RMSEP

0.0 0.5 1.0 1.5 1.94 1.94

2.0 2.5

3.0 20

Infometrix Case B1 Spectrometric Error Spectrometric Reference Error Reference Spectrometric Error >> Reference Error

RMSEP 0.0 0.98 1.0

2.0 2.99 3.15 3.0

21

Infometrix Spectrometric Error decreases by ~1/3, RMSEP decreasesby < 1/3 Case B2 Spectrometric Error Spectrometric Reference Error Reference

RMSEP 0.0 0.98 1.0

2.07 2.0 2.29

3.0

22

Infometrix Case B3 Spectrometric and Reference Errors are about equal Spectrometric Error Spectrometric Reference Error Reference

RMSEP 0.0 0.98 1.02 1.0

1.41

2.0 3.0

23

Infometrix Case B4 Spectrometric Error is about 1/3 the Reference Error Spectrometric Error Spectrometric Reference Error Reference

RMSEP 0.0 0.39

0.98 1.05 1.0

2.0 3.0

Error Reference approaching RMSEPis

24

Infometrix Spectrometric Error reduced by ~50% Case B5 Spectrometric Error Spectrometric Reference Error Reference

RMSEP 0.21 0.0

0.98 1.00 1.0

vs

Case B4, little effect on RMSEP 2.0 3.0 Error Reference as same the about RMSEPis

25

Infometrix Case B6 Spectrometric Error is zero, RMSEP equals Reference Error Spectrometric Error Spectrometric Reference Error Reference

RMSEP 0.0 0.98 0.98 1.0

2.0 3.0

26

Infometrix RMSEP is Limited by Reference Error At “the “the model cannot better be than the

Spectrometric Error Total Error (RMSEP) Error … value approaches Reference the Total system error (RMSEP) Spectrometric Error Spectrometric = 0, the incorrect conclusion is that

Error approaches zero Error approaches … as Spectrometric

r eference eference method”

27

Infometrix Model Construction Issues A clear case of not handling the data properly Groupings in the Groupings in data the to be biased to bebiased low D istribution istribution appears

28

Infometrix • • Segmented Predictions approachcould be attempted: sufficiently thatdifferent a hierarchical, or segmented, composition of sub In case the of motor fuel properties, thechemical regressions mightimprove the overall prediction quality. cluster, breaking the calibration problem into multiple When samples donot group into a single homogeneous – – model, model, to determine the quality rating Use a second model, one for each category found by the first Use one model to classify the grade - grade andsuper

- grade fuels is 29

Infometrix

Pred Val 80 82 84 86 more more precise measurement of fuel properties range that Feeding Sub 80 MON - - forward forward information on the

and Super is being modeled and can provide a better model and 82

Subgrade Measured Y Supergrade 84

86 - Grade Fuels 4

blend blend reduces the property Pred Val 88 90 92 94 RON 88 .

Subgrade

90 Measured Y Supergrade

92

94 30

5 Infometrix • • Comparison of Octane Models with premium. grade gasoline, but there is a significant drop in the error associated separately, there is just a small change in the evaluation of sub If we build a model for the two ranges of octane measurements cross validation for RONis 0.22 and for MON is 0.25. For the data set that includes both grades, the standard error of MON RON

All Samples 0.25 0.22

Subgrade 0.23 0.22

Premium 0.20 0.15

- 31

Infometrix • • Result: a More Parsimonious Model significantlyreducedandlikelyprovides moreover stability time. Note thatthe numberoffactors requiredto build the PLSmodelis considerably. availableimprovestheinformation fuel propertyassessment Lookingat severalproperties, the value ofintegrating readily MON Benz RON d90 d50 Ole Aro IBP SECV Outliers Removed Outliers

All Samples All 3.69 1.54 1.99 0.04 0.24 0.23 0.58 0.87 Factors 9 8 7 8 7 7 7 7 SECV Outliers Removed Outliers 3.66 0.98 1.83 0.02 0.23 0.22 0.57 0.76 Subgrade Factors 4 7 3 5 5 5 4 3 SECV Outliers Removed Outliers 1.65 1.18 1.55 0.02 0.20 0.15 0.32 0.84 Premium Factors 4 4 6 4 5 3 4 2 -

32

Infometrix

Standard Error Choosing Best #of Factors Fitting Fitting Information # of # of factors

Optimum

Fitting Fitting Noise

33

Infometrix • • • Avoiding Overfitting samples apply thenoise portion of model the to prediction A model with more factorsthan necessary will try to is random Prediction data will have differentnoise; after all, noise As more factors areadded – – model noise variation inthe training data arebuilt into the calibration error isreduced –

and fail –

resulting resulting in increased error

34

Infometrix Noise, std 35dev = Noise added to mean to mean added Noise

Meanspectrum + noise

35

Infometrix Impact of Noise on Predictions

Infometrix • • • Avoiding Overfitting 2 error prediction samples apply theuncorrelated portion of themodel to A model with more factorsthan will necessary try to Prediction data will have differentlevels of interferent As more factors areadded – – property may be added to model information from interferent not correlated to calibration error isreduced

and fail

resulting resulting in increased

37

Infometrix Addition of a Small Perturbation Added intensity:peak 10,000 to1,000

Infometrix Overfitting’s Impact on Predictions Relative Magnitude PerturbationRelative of Magnitude

Infometrix • • • Jaggedness for Model Complexity Jaggedness vector regression Quantify when overfitting overlay structure shows noise usually vector Regression model regression for a of factors number alwaysindicateoptimal may not SECV

‘shape’ ‘shape’

40

Infometrix Jaggedness Demonstrated SECV

Calibration Spectra Regression Regression Vector

Jaggedness

41

Infometrix PLS Model Has Outliers RMSECV= 0.55 First First PLS Model

Use diagnostics to flag outliers to flag

42

Infometrix Second PLS RMSECV= 0.23 Second Second PLS Model

Additional Outliers Use diagnostics to flag outliers to flag

43

Infometrix Robust PLS RMSECV= 0.55 First First PLS Model

Just Use diagnostics to flag outliers to flag O

ne Pass

44

Infometrix Robust Model RMSECV= 0.12 Robust Robust PLS Model

Use diagnostics – to flag outliers to flag

No More Outliers

45

Infometrix problems in building a chemometric system? inachemometric building problems dowe What problems see createmost that the 38 Years ofChemometrics 7. 6. 5. 4. 3. 2. 1. Poor Complex samples Changing protocol Non Overfitting Groupings Poorly characterized standards - instrument instrument optimal optimal calibration set in the data the inferential model stability

46