Design of Experiments – Get It Right from the Beginning
Total Page:16
File Type:pdf, Size:1020Kb
Design of Experiments – Get it right from the beginning Margrét Thorsteinsdóttir Professor at the Faculty of Pharmaceutical Sciences, University of Iceland Objective • The objective is to illustrate how design of experiments (DoE) can be implemented for optimization of quantitative LC-MS/MS clinical diagnostic method Outline • Part 1 - Introduction to Design of Experiments (DoE) • Why use DoE? • Basic concepts • Part 2 - Design of Experiments (DoE) • Experimental Screening • Optimization • Quantitative Modeling • Part 3 - Practical Example • Optimization of clinical LC-MS based assay utilizing DoE What is Chemometrics? Chemometrics is the chemistry discipline that uses mathematical and statistical methods; Professor Svante Wold and – To design or select optimal professor Bruce Kowalski, measurement procedure and 11 th Scandinavia Symposium experiments on Chemometrics (SSC), To provide maximum chemical Loen, Norway – information by analyzing chemical data (Wold and Kowalski - 1972) Application of DoE • Development of new product and processes/ analytical method • Enhancement of existing analytical method • Optimization of quality and performance of an analytical method • Optimization of existing analytical method • Screening of important factors • Robustness testing Multivariate Problems • One phenomenon usually depends on several factors! – Question is not : Is the problem multivariate? – The question is: How to handle multivariate problems? How shall I find the optimum? • COST approach (Changing One Separate Factor at Time) • DoE (Design of Experiments) Why Design of Experiments (DoE)? • To provide a framework for changing all important factors systematically with a limited number of experiments • To ensure that the selected experiments are maximally informative • To get an overview of the relationships between all the parameters • To make R&D more efficient COST approach (Changing One Separate Factor at Time) • Does not lead to the real optimum and gives different implications with different starting points • Leads to many experiments and little information • System influenced by more than one factor are poorly investigated – interaction are missed A better approach - DoE The solution is to construct a Standard 100 • 300/75/75 carefully prepared set of X3 representative experiments, 50 100 in which all relevant factors X2 are varied simultaneously 200 X1 400 50 A successful experiment has several prerequisites • Define your objective(s) • Planning • Estimation of experimental error Variability • Every measurement and experiment is influenced by noise • Under stable conditions every process and system varies around its mean – Variance in analysis – Variance in sampling The experiment must be of sufficient precision to satisfy the main objective The experiment must be unbiased Reacting to noise • Consider one experiment where the temperature is changed from 35 °C to 40 °C • The response change, from slightly below 93% to close to 96%, lies within the variability interval found when replicating Ten measurements of yield, under identical conditions yield 92 94 96 98 Two measurements of yield. Any real difference? yield 92 94 96 98 Consequence of variability Two points experiments, Two points far away And if a center-point is added, it close to each other make frome each other is possible to explore whether the slope of the line be make the slope be the model is linear or non-linear poorly determined well determinde Y Y Y X X X Design is needed – as it matters where the experiments are positioned! Focusing on effects • DoE provides better flow of measurements to information to knowledge • Leads to more precise effect estimates Y1 X2 X3 X1 X2 X1 X1 Estimating real effects and noise • Real effects are estimated by the coefficients • The noise is contained in the confidence intervals Uncertainty of coefficient Assessment of DoE • Define the experimental objective(s) • Define factors • Define responses • Selection of regression model • Generation of experimental design • Creation of worksheet • Analyze the data Terminology • Factors: Parameters changed to influence responses and direct the system/process towards a desired response profile • Responses: Variables describing the properties of the system/process • Model: Mathematical expression linking the changes in the factor to the changes in the responses System Process Factors(X) Responses (Y) Selection of Experimental Objective • Experimental objectives may be selected from different stages of DoE • Familiarization • Screening • Finding the optimal region • Optimization • Robustness testing • Mechanistic modelling Important questions? • The experimental objective tells which kind of investigation one wants to do: – One should ask why is an experiment done? – For what purpose? – What is the desired result? Selection of model Generation of Design • Chosen model and design to be generated are intimately linked Modeling • The results are expressed • Advantageous: as a mathematical function – Replaces large tables of data of experimental conditions with a single equation – Provides mean to predict and estimate results at level that were not directly studied Software • MODDE 12, MKS Data Analytics, Umetrics • Unscrambler®X, CAMO software • JMP® software from SAS • Matlab • ExperimentalDesign in R – on the CRAN repository Part 2 - Design of Experiments (DoE) Experimental Screening Optimization Quantitative Modeling The primary experimental objectives • Screening - Which experimental factors are most influential? - What are their appropriate ranges? • Optimization - How shall we define optimum? - Is there a unique optimum, or is a compromise necessary to meet conflicting demands on the responses? • Quantitative modeling - What are the predicted values of the response for given settings of the factors in a model? Screening - objective • To explore many factors in order to reveal whether they have an influence on the responses • To identify their appropriate ranges • To investigate if factor/response relationship is linear or non-linear? Specification of Factors • Categorization of factors Example; Vcontrolled and • Quantitative uncontrolled – Temperature Vquantitative and – Flow rate qualitative – Capillary voltage • Qualitative • Define ranges – Type of column – Type of organic solvent Uncontrolled Factors • These factors that cannot be controlled, but which still may influence the results (responses) VExample: Ambient humidity and temperature • Record values of uncontrolled factors, and include these in the data analysis • Use randomization of experiments Specification of Responses • Choose responses which are relevant to the objective(s) – Example: Retention time, peak height, peak area • Often responses need to be transformed V Transform the responses after executing the results Example: Log transformation – from the design if there is a non-linear relationship between y and x Factorial Designs Full Factorial Design Fractional Factorial Design 23 = 8 experiments 23-1 = 4 experiments Two-level full factorial designs • These designs enable interaction models to be estimated, which is No of No of runs No of runs investigated Full factorial Fractional adequate for screening factors (k) factorial 2 4 --- • Each factor is investigated at both 3 8 4 levels of all other factors 4 16 8 5 32 16 – balancing 6 64 16 – orthogonality 7 128 16 8 256 16 • Full factorial designs are realistic 9 512 32 choices with 2-4 factors; with 5 or 10 1024 32 more factors fractional factorial designs are recommended Fractional Factorial Design • Fractional factorial designs are used in screening and robustness testing • Advantage: Reduction of experiments • Disadvantage: Confounding of effects 7 8 100 7 8 100 7 8 100 5 6 5 6 5 6 Eggpowder Eggpowder Eggpowder 50 50 3 4 3 4 3 4 50 g g n 100 n 100 g i i n 100 n n i e e n t t e r r t o o r h h o h 1 2 S 1 2 S 1 2 S Flour 50 Flour 50 Flour 50 200 400 200 400 200 400 x3 = -x3 = x1 = Flour Run x1 x2 x1x2 Run x1 x2 x1x2 x2 = Shortening 5 - - + 1 - - - 2 + - - 6 + - + x3 = Eggpowder 3 - + - 7 - + + 8 + + + 4 + + - Graphical interpretation of confoundings (2 4-1 design) • Only the sum of confounded terms is estimated • Main effects usually dominate over three-factor interactions • More experiments are needed to resolve confounded terms D-Optimal Design • Multi-level qualitative factors Type of organic solvent Type of column pH of mobile phase • Three quantitative factors Screening Example: Quantification of biomarker in human plasma with LC-MS/MS Objective: Optimize the response for compound x in as short time as possible • Factors: – Type of LC column – pH of the mobile phase – Amount of acetonitrile in mobile phase B – Slope of gradient – Flow rate – Amount Injected • Response: – Retention Time – Peak area Design • Full factorial design of Creation of worksheet and results of the experiments 4 factors for each pH and each column type – 24 = 16 + 3 experiment in the center point – 4 x 19 experiments = 76 experiments total MODDE 7, Umetrics AB Data Analysis • Evaluation of raw data – Replicate plot, histogram • Regression analysis and model interpretation – R2/Q2/Model Peak Area Retention Time Validity/Reproducibility – Coefficient plot • Use of regression model – Response contour plot Evaluation of raw data - Replicate plot V The replicate plot shows the variation among the replicates in relation to the variation across the entire design (”reproducibility”) Regression analysis – summary of fit plot R2 – measures fit (”explained variation”) Q2 – measures predictive