Introduction to Chemometrics

Introduction to Chemometrics

Introduction to Chemometrics Torbjörn Lundstedt KI 080609 AcurePharma Chemometrics • Use of mathematical and statistical methods for selecting optimal experiments Statistical experimental design Design of Experiments ( DoE ) • Extracting maximum amount of information when analysing multivariate (chemical) data E.g. classification, (process) monitoring, multivariate calibration, Quantitative Structure- Activity Relationships (QSAR) AcurePharma CONFIDENTIAL 2/92 Why perform experiments? • Increase understanding of observed phenomenon(s) • Identify what is important for influencing an investigated system • Find experiments (compounds) with desired properties • Make predictions about the outcome of new experiments AcurePharma CONFIDENTIAL 3/92 DoE – terminology • Experimental domain The experimental area studied, area where model is valid • Factors Controlled variables which can be varied independently and have an impact on the result in the experiments (“X-block”) • Independent variables Same as factors • Quantitative variables Continuous variables – Independent variables which can be adjusted to any value over a specified the range AcurePharma CONFIDENTIAL 4/92 DoE – terminology • Qualitative or Discrete Variables Independent variables which describe non- continuous variation, e.g. type of solvent, cell medium A or B • Responses Variables which are observed and a result from changing independent variables (“Y-block”) • Dependent Variables Same as responses AcurePharma CONFIDENTIAL 5/92 DoE – terminology • Residuals The difference between the observed response and the response predicted from the model • Uncontrolled or background variables Known variables which are not possible/desirable to alter • Unknown variables Currently unidentified variables AcurePharma CONFIDENTIAL 6/92 Aim of Modelling • Present the result in a clear and interpretable way – graphics very useful! • Extract as much information as possible from the experiments • Provide a “correct” conclusion – validate! • Indicate which new experiments to perform and the probable outcome of these AcurePharma CONFIDENTIAL 7/92 Model types • Fundamental models (hard models, “global models”) 2 -kt E = mc y = y 0e U = IR • Empirical models (soft models, “local models”) Taylor expansions (polynomials of different complexity) y = b 0 + b 1x1 + b 2x2 + b 12 x12 + e AcurePharma CONFIDENTIAL 8/92 Models Mathematical Equation Describing a System • Chemistry • Biology • Physics • Economics • etc . AcurePharma CONFIDENTIAL 9/92 Soft Modelling Smaller Parts of the Universe is Modelled A smaller experimental domain is investigated AcurePharma CONFIDENTIAL 10/92 Linear Model y = b 0 + b 1x1 + b 2x2 + e e = The residual, the part of the data the model does not explain Important for validating the modell AcurePharma Second Order Interaction Model y = b 0 + b 1x1 + b 2x2 + b 12 x1x2 + e b12 Interaction term, resulting from the effect of two variables. Skews the surface. AcurePharma Quadratic Model y = b 0 + b 1x1 + b 2x2 + b 12 x1x2 + 2 2 b11 x1 + b22 x2 + e b11 Quadratic term, resulting from the effect of one variable. Curves the surface. AcurePharma Establishing the Model • Generally a model is based on a set of experiments, where some output ( response or responses) has been measured • In the experiments different factors , variables, are investigated at different levels , i.e. settings (e.g. temp., conc., logP, nos. C, etc.) AcurePharma CONFIDENTIAL 14/92 Limitation of Models • Usually only local validity (soft models) (interpolation/extrapolation) • All models are wrong … but some are still very useful! • How should the experiments be performed in order to gain as much information as possible? AcurePharma CONFIDENTIAL 15/92 Change One Separate factor at a Time x2 Best x2 Best! 44 43 40 36 30 52 52 67 55 Temperature Temperature 47 Time x Time 1 x1 Optimum? Examine x1 Examine x2 Time Temperature Best yield? AcurePharma CONFIDENTIAL 16/92 COST Change One Separate factor at a Time x2 ”Best” x2 ”Best” x2 Best ? Temperature Temperature Temperature Time x Time Time 1 x1 x1 Examine x1 Examine x2 If there is an interaction Not the optimum! AcurePharma CONFIDENTIAL 17/92 Statistical Experimental Design x 2 Change all variables at the same time – Find Directions – Temperature Time x1 AcurePharma CONFIDENTIAL 18/92 Statistical Experimental Design • Planning of experiments to perform in order to extract as much information as possible with as few experiments as possible • Analysis of the result – modelling AcurePharma CONFIDENTIAL 19/92 Experimental Strategy Most important: Definition of Aim(s) • Problem formulation: - What is the aim? - What is desired? (yield, purity, activity, robustness) • Familiarization - What is known? - What is unknown? - Test experiments AcurePharma CONFIDENTIAL 20/92 The time you spend in the beginning to define a project you will have back with interest in the end! AcurePharma CONFIDENTIAL 21/92 Screening designs: Full Factorial Designs • Every level of a factor is investigated at both levels of all the other factors • It is a balanced (orthogonal) design • k factors (experimental variables) gives with a 2 level full factorial design 2k experiments AcurePharma CONFIDENTIAL 22/92 Full Factorial Designs Most common: investigate in two levels - + + + - + + + + + x x 2 3 - - + + - + - + - + + - -- + - x --- + - - 2 x1 x1 The experiments in a The experiments in a design with two variables design with three variables More variables – hyper cube AcurePharma CONFIDENTIAL 23/92 Full Factorial Designs 2k Experiments For two variables For three variables For four variables Exp. No. x1 x2 Exp. No. x1 x2 x3 Exp. No. x1 x2 x3 x4 1 -- 1 --- 1 ---- 2 + - 2 + - - 2 + - - - 3 - + 3 - + - 3 - + - - 4 + + 4 + + - 4 + + - - 5 - - + 5 - - + - 6 + - + 6 + - + - 7 - + + 7 - + + - 8 + + + 8 + + + - 9 - - - + 10 + - - + 11 - + - + Simple to generate – similar pattern 12 + + - + 13 - - + + no matter the number of variables to 14 + - + + investigate! 15 - + + + 16 + + + + AcurePharma CONFIDENTIAL 24/92 Analysis of result Multiple Linear Regression (MLR) • Regression method using a least squares fit ”Classical regression” • Requires independent variables in the X-block • Separate model for each Y response Coefficients for each y analysed – variable influence can be identified AcurePharma CONFIDENTIAL 25/92 Always look at the raw data – e.g. a replicate plot • Each point Investigation: enamdemo HT02 represent an Plot of Replications for Yield experiment 8 • Exp. no 15 performed in 70 12 10 14 replicate 4 7 60 16 • Variation in overall Yield 17 6 1518 response, not in 13 3 50 9 replicate 5 2 11 1 40 1 2 3 4 5 6 7 8 9 101112131415 Replicate Index AcurePharma CONFIDENTIAL 26/92 Cross-validation gives Q 2 Y Parts of the data is held out X and a model is build on the remaining repeated until all data has been kept out once X Σ Q2 = 1 – (Yobs-Ypred) Σ(Yobs-Yaverage) X etc. AcurePharma CONFIDENTIAL 27/92 Model diagnostics Nature of data R2 Q2 Chemical Acceptable: ≥ 0,8 Acceptable: ≥ 0,5 Excellent: > 0,8 Biological Acceptable: > 0,7 Acceptable: > 0,4 • The goal is not to optimise Q 2 • A stable and interpretable model which can be used for predictions is desired • A lousy model can still provide useful information AcurePharma CONFIDENTIAL 28/92 Useful plots • Replicate plot • Design matrix • Residuals • Coefficients • ANOVA tables/plots • Contour plots • More… AcurePharma CONFIDENTIAL 29/92 Candy production – ”sega råttor” X (independent) –variables Y (dependent) –variables • Amount sugar (g) • Colour • Amount glucose (g) • Taste • Amount H O 2 • Sweetness • Amount Gelatine • ”Seghet” • Amount H 2O • Mix H 2O/gelatine speed • Form • Mix H 2O/gelatine time • Size • Mix H 2O/gelatine heat • Mix 2 speed • Heat 114 • Cool temperature • Colour • Flavour AcurePharma• etc. CONFIDENTIAL 30/92 Full Factorial Designs Sega råttor… Problem! Number of Number of variables Experiments 2 4 4 16 With an increasing 6 64 number of variables the 8 256 required number of 10 1024 experiments rapidly 12 4096 becomes impractical to 14 16384 handle… 16 65536 AcurePharma CONFIDENTIAL 31/92 Fractional Factorial Designs (FFD) • One solution is to use a smaller part − a fraction − of the full factorial design • Possible to greatly reduce the number of experiments • Still investigate the defined experimental domain well • 2k-p experiments required, k = number of variables, p the size of the fraction AcurePharma CONFIDENTIAL 32/92 Factorial Designs (2 3) Full (2 3-1) Fractional + + + - + + + + + x - - + + - + x3 - - + 3 000 000 - + - - + - + + - x - - - + - - 2 x2 x + - - 1 x1 Maximum volume Maximum volume with a minimum number of experiments AcurePharma CONFIDENTIAL 33/92 Statistical Experimental Designs Example of different types of designs • Full factorial designs • Fractional factorial designs • Plackett-Burman designs (special case of FD) • D-optimal designs • Taguchi designs • Central Composite Designs (CCC and CCF) • Mixture Designs • Simplex Designs AcurePharma CONFIDENTIAL 34/92 Multivariate analysis • PCA • PLS • MVD AcurePharma CONFIDENTIAL 35/92 Chemometrics • Use of mathematical and statistical methods for selecting optimal experiments Statistical experimental design and optimisation • Extracting maximum amount of information when analysing chemical data Multivariate data analysis Multivariate design Combining statistical experimental design and multivariate data analysis – a tool in drug discovery AcurePharma CONFIDENTIAL 36/92 The (scientific) world today • Generating numbers to understand and quantify phenomenon's

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    92 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us