Introduction to Chemometrics

Torbjörn Lundstedt KI 080609

AcurePharma Chemometrics

• Use of mathematical and statistical methods for selecting optimal Statistical experimental design ( DoE ) • Extracting maximum amount of information when analysing multivariate (chemical) E.g. classification, (process) monitoring, multivariate calibration, Quantitative Structure- Activity Relationships (QSAR)

AcurePharma CONFIDENTIAL 2/92 Why perform experiments? • Increase understanding of observed phenomenon(s) • Identify what is important for influencing an investigated system • Find experiments (compounds) with desired properties • Make predictions about the outcome of new experiments

AcurePharma CONFIDENTIAL 3/92 DoE – terminology

• Experimental domain The experimental area studied, area where model is valid • Factors Controlled variables which can be varied independently and have an impact on the result in the experiments (“X-block”) • Independent variables Same as factors • Quantitative variables Continuous variables – Independent variables which can be adjusted to any value over a specified the

AcurePharma CONFIDENTIAL 4/92 DoE – terminology

• Qualitative or Discrete Variables Independent variables which describe non- continuous variation, e.g. type of solvent, cell medium A or B • Responses Variables which are observed and a result from changing independent variables (“Y-block”) • Dependent Variables Same as responses

AcurePharma CONFIDENTIAL 5/92 DoE – terminology

• Residuals The difference between the observed response and the response predicted from the model • Uncontrolled or background variables Known variables which are not possible/desirable to alter • Unknown variables Currently unidentified variables

AcurePharma CONFIDENTIAL 6/92 Aim of Modelling

• Present the result in a clear and interpretable way – graphics very useful! • Extract as much information as possible from the experiments • Provide a “correct” conclusion – validate! • Indicate which new experiments to perform and the probable outcome of these

AcurePharma CONFIDENTIAL 7/92 Model types • Fundamental models (hard models, “global models”)

2 -kt E = mc y = y 0e U = IR

• Empirical models (soft models, “local models”) Taylor expansions (polynomials of different complexity)

y = b 0 + b 1x1 + b 2x2 + b 12 x12 + e

AcurePharma CONFIDENTIAL 8/92 Models Mathematical Equation Describing a System • • Physics • Economics • etc .

AcurePharma CONFIDENTIAL 9/92 Soft Modelling Smaller Parts of the Universe is Modelled

A smaller experimental domain is investigated

AcurePharma CONFIDENTIAL 10/92

y = b 0 + b 1x1 + b 2x2 + e

e = The residual, the part of the data the model does not explain Important for validating the modell

AcurePharma Second Order Model

y = b 0 + b 1x1 + b 2x2 + b 12 x1x2 + e

b12 Interaction term, resulting from the effect of two variables. Skews the surface.

AcurePharma Quadratic Model

y = b 0 + b 1x1 + b 2x2 + b 12 x1x2 + 2 2 b11 x1 + b22 x2 + e

b11 Quadratic term, resulting from the effect of one variable. Curves the surface.

AcurePharma Establishing the Model

• Generally a model is based on a set of experiments, where some output ( response or responses) has been measured

• In the experiments different factors , variables, are investigated at different levels , i.e. settings (e.g. temp., conc., logP, nos. C, etc.)

AcurePharma CONFIDENTIAL 14/92 Limitation of Models • Usually only local validity (soft models) (interpolation/extrapolation) • All models are wrong … but some are still very useful! • How should the experiments be performed in order to gain as much information as possible?

AcurePharma CONFIDENTIAL 15/92 Change One Separate factor at a Time

x2 Best x2 Best! 44 43 40 36 30 52 52 67

55 Temperature Temperature

47

Time x Time 1 x1

Optimum? Examine x1 Examine x2 Time Temperature Best yield? AcurePharma CONFIDENTIAL 16/92 COST Change One Separate factor at a Time

x2 ”Best” x2 ”Best” x2 Best ? Temperature Temperature Temperature

Time x Time Time 1 x1 x1

Examine x1 Examine x2 If there is an interaction  Not the optimum! AcurePharma CONFIDENTIAL 17/92 Statistical Experimental Design

x 2 Change all variables at the same time – Find Directions – Temperature

Time x1 AcurePharma CONFIDENTIAL 18/92 Statistical Experimental Design • Planning of experiments to perform in order to extract as much information as possible with as few experiments as possible • Analysis of the result – modelling

AcurePharma CONFIDENTIAL 19/92 Experimental Strategy Most important: Definition of Aim(s)

• Problem formulation: - What is the aim? - What is desired? (yield, purity, activity, robustness) • Familiarization - What is known? - What is unknown? - Test experiments

AcurePharma CONFIDENTIAL 20/92 The time you spend in the beginning to define a project you will have back with interest in the end!

AcurePharma CONFIDENTIAL 21/92 Screening designs: Full Factorial Designs • Every level of a factor is investigated at both levels of all the other factors

• It is a balanced (orthogonal) design

• k factors (experimental variables) gives with a 2 level full factorial design 2k experiments

AcurePharma CONFIDENTIAL 22/92 Full Factorial Designs

Most common: investigate in two levels

- + + + - + + + + + x x 2 3 - - + + - +

- + - + + - -- + - x --- + - - 2 x1 x1 The experiments in a The experiments in a design with two variables design with three variables

More variables – hyper cube AcurePharma CONFIDENTIAL 23/92 Full Factorial Designs 2k Experiments

For two variables For three variables For four variables

Exp. No. x1 x2 Exp. No. x1 x2 x3 Exp. No. x1 x2 x3 x4 1 -- 1 --- 1 ---- 2 + - 2 + - - 2 + - - - 3 - + 3 - + - 3 - + - - 4 + + 4 + + - 4 + + - - 5 - - + 5 - - + - 6 + - + 6 + - + - 7 - + + 7 - + + - 8 + + + 8 + + + - 9 - - - + 10 + - - + 11 - + - + Simple to generate – similar pattern 12 + + - + 13 - - + + no matter the number of variables to 14 + - + + investigate! 15 - + + + 16 + + + +

AcurePharma CONFIDENTIAL 24/92 Analysis of result Multiple (MLR) • Regression method using a fit ”Classical regression” • Requires independent variables in the X-block • Separate model for each Y response Coefficients for each y analysed – variable influence can be identified

AcurePharma CONFIDENTIAL 25/92 Always look at the raw data – e.g. a replicate plot • Each point

Investigation: enamdemo HT02 represent an Plot of Replications for Yield 8 • Exp. no 15 performed in 70 12

10 14 replicate 4 7

60 16 • Variation in overall

Yield 17 6 1518 response, not in 13 3 50 9 replicate 5 2 11 1 40 1 2 3 4 5 6 7 8 9 101112131415

Replicate Index

AcurePharma CONFIDENTIAL 26/92 Cross-validation gives Q 2 Y Parts of the data is held out X and a model is build on the remaining  repeated until all data has been kept out once X Σ Q2 = 1 – (Yobs-Ypred) Σ(Yobs-Yaverage) X etc. AcurePharma CONFIDENTIAL 27/92 Model diagnostics

Nature of data R2 Q2 Chemical Acceptable: ≥ 0,8 Acceptable: ≥ 0,5 Excellent: > 0,8 Biological Acceptable: > 0,7 Acceptable: > 0,4

• The goal is not to optimise Q 2 • A stable and interpretable model which can be used for predictions is desired • A lousy model can still provide useful information

AcurePharma CONFIDENTIAL 28/92 Useful plots • Replicate plot • Design matrix • Residuals • Coefficients • ANOVA tables/plots • Contour plots • More…

AcurePharma CONFIDENTIAL 29/92 Candy production – ”sega råttor”

X (independent) –variables Y (dependent) –variables • Amount sugar (g) • Colour • Amount glucose (g) • Taste • Amount H O 2 • Sweetness • Amount Gelatine • ”Seghet” • Amount H 2O

• Mix H 2O/gelatine speed • Form

• Mix H 2O/gelatine time • Size

• Mix H 2O/gelatine heat • Mix 2 speed • Heat 114 • Cool temperature • Colour • Flavour AcurePharma• etc. CONFIDENTIAL 30/92 Full Factorial Designs

Sega råttor… Problem! Number of Number of variables Experiments 2 4 4 16 With an increasing 6 64 number of variables the 8 256 required number of 10 1024 experiments rapidly 12 4096 becomes impractical to 14 16384 handle… 16 65536

AcurePharma CONFIDENTIAL 31/92 Fractional Factorial Designs (FFD)

• One solution is to use a smaller part − a fraction − of the full factorial design • Possible to greatly reduce the number of experiments • Still investigate the defined experimental domain well • 2k-p experiments required, k = number of variables, p the size of the fraction

AcurePharma CONFIDENTIAL 32/92 Factorial Designs

(2 3) Full (2 3-1) Fractional + + + - + + + + + x - - + + - + x3 - - + 3

000 000

- + - - + - + + - x - - - + - - 2 x2 x + - - 1 x1 Maximum volume Maximum volume with a minimum number of experiments

AcurePharma CONFIDENTIAL 33/92 Statistical Experimental Designs Example of different types of designs • Full factorial designs • Fractional factorial designs • Plackett-Burman designs (special case of FD) • D-optimal designs • Taguchi designs • Central Composite Designs (CCC and CCF) • Mixture Designs • Simplex Designs

AcurePharma CONFIDENTIAL 34/92 Multivariate analysis • PCA • PLS • MVD

AcurePharma CONFIDENTIAL 35/92 Chemometrics • Use of mathematical and statistical methods for selecting optimal experiments Statistical experimental design and optimisation • Extracting maximum amount of information when analysing chemical data Multivariate data analysis

Multivariate design Combining statistical experimental design and multivariate data analysis – a tool in drug discovery

AcurePharma CONFIDENTIAL 36/92 The (scientific) world today

• Generating numbers to understand and quantify phenomenon's around us • Many responses are measured, sometimes at regular time intervals • ”Large” data tables are generated • Tools for viewing all data simultaneously are needed K variables XX N observations

AcurePharma CONFIDENTIAL 37/92 Principal Component Analysis (PCA)

• A projection method – extract information () from large data sets • Creates “windows” in a multidimensional space (matrix with several variables correlated to each other) • Graphical plots to interpret the result; identify classes, patterns, outliers, etc.

AcurePharma CONFIDENTIAL 38/92 Data pre–treatment

Mean- Scaling to unit centring variance

Subtract 1/SD average variable for each variable

The variables have Raw data Moving the an equal chance to coordinate system to influence the PCA the origin model

Default setting is centering and unit variance scaling

AcurePharma CONFIDENTIAL 39/92 PCA – graphical description Investigating three variables, e.g. formula weight, melting point and log P

x x1 x2 x3 2

x3

x1

AcurePharma CONFIDENTIAL 40/92 t PCA 2

x1 x2 x3 Comp 1 (t ) 1 t1 x2

Observation i Scores (observations) Plane PCA Projection x3

Comp 2 (t 2)

p2

x1

p1 x2 x3 x1 Loadings (variables)

AcurePharma CONFIDENTIAL 41/92 PCA - A graphical example

Raw data

No. Gender Shoe Size Height (cm) 1 Female1 37 168 2 Female2 36 166 3 Male1 42 185 4 Female3 38 171 5 Male2 41 174 6 Male3 43 180 Mean 39.5 174

AcurePharma CONFIDENTIAL 42/92 Centring of data

Raw data Centred data No. Gender Shoe Size Height (cm) Shoe Size Height 1 Female1 37 168 -2.5 -6 2 Female2 36 166 -3.5 -8 3 Male1 42 185 2.5 11 4 Female3 38 171 -1.5 -3 5 Male2 41 174 1.5 0 6 Male3 43 180 3.5 6 Mean 39.5 174

The mean is calculated for each variable. The centring subtracts the mean from values of each variable.

AcurePharma CONFIDENTIAL 43/92 Plotting the data Raw data vs Centring

Raw data Centring

190 15

185 10

180 5

175 0 170 -4 -2 0 2 4 -5 165 35 40 45 -10

Ad scaling! AcurePharma CONFIDENTIAL 44/92 Height 1st principal component 15

10 1. Fit a line to the data points through the origin Score value 2. Make a perpendicular 5 projection to the principal component for all data Size points

-5 5 3. Measure the distance from the origin to the -5 projections

Score values (t i) -10

AcurePharma CONFIDENTIAL 45/92 st 15 Height 1 principal component

10 4. Measure the angle, α, between the principal α component and each 5 Längd variable α 5. Calculate cos( α) Storlek Loadings (p i) -5 5 Size -5

-10

AcurePharma CONFIDENTIAL 46/92 Height 1st Principal component 15

10 6. DModX – Distance to the Model in X (X = the DModX data table) 5 Finding deviating observations Size

-5 5

-5

-10

AcurePharma CONFIDENTIAL 47/92 Comparison between the “graphical” PCA and the PCA obtained from SIMCA

“Graphical”PCA SIMCA PCA No. Gender t1 p1 t1 p1 1 Female1 -6.5 Size = 0.39 -6.4945 Size =0.35 2 Female2 -8.75 Height = 0.92 -8.7171 Height = 0.94 3 Male1 11.05 11.185 4 Female3 -3.35 -3.3339 5 Male2 0.6 0.51964 6 Male3 6.85 6.841 R2X = 0.98033 Calculation of R2X

2 Σ 2 Σ 2 R X=1- (x i – xipred ) / (x i –Xmean ) i.e. 1-SS Res /SS Tot or 2 Σ 2 Σ 2 R X= (x ipred ) / (x i –Xmean ) i.e. SS Pred /SS Tot AcurePharma CONFIDENTIAL 48/92 Score plot (t 1) - to evaluate the result

15

Males

Females

-15

AcurePharma CONFIDENTIAL 49/92 PCA example

• Questions in the form of on a continuous scale or as yes/no

• 213 general questions about TV-programs, celebrities, food habits, ethical opinion, etc.

• “Limited time” for answering the questionnaire

AcurePharma CONFIDENTIAL 50/92 Data – the chemistry department • 14 persons  observations • 213 questions “describing” the chemists  variables Variable 1 ...... K

Observation 1 . . X . . . . N

AcurePharma CONFIDENTIAL 51/92 Finding relations • Relating the persons to the questions, i.e. the observations to the variables

10 0.150 CocaP3 Cola 2 Low East&WestO´Learys O´ConnorsSvenssonsFredrikHembränt Ju Kronåsens Katalin Tipsextra Anja Perss Absolut voSimpsonsVännerMikael Per FolkparkerBrad Pitt Surströmmi Sova i tälSportNationalis påLasse T KronABBA 13 0.100 SurströmmingLängdSystembolaSkoterNationernaHockeyJohnJamesFotbollRegistrera Grish Bond Pasta 14 UtförsåkniRådhuspubeSpelaKentHundar inneDjurförsök WhiskeyDeklaratioSnus AlkoläskGylleneGodis Ti EscobarPotatisBadaSAAB bastu 5 MögelostKött Karl-BertiFlustretKOPPS StalletsBeatlesVita tlögne IKEA 1 Sagan om r SkonummerRöttManchester vinMagnusBertNorrlandAbort KarlsSn HärFrank Sina 3 Kaffe OrmarGrötfrukos Enköping S KontaktannMagnusAstrologiCharterres och Vitlök MelacuresDataspel Vitt NaknevinOrgandonatGolf kock DödshjälpFolkmusikFisk 4 0.050 l KallaKatter Föregåendefött JonasSemlorTe Gard SvenskaJanAktuelltDrottningKnorrsUppsalaRis Guillo na sop Ketchup NewsSvartjobb CafeCurryCityakutenBingoKajsaGenmanipulJämställdh lott BerqKungAustralienAttKina dansaKarl BaywatchDjurparkerLättmjölkPepsiExpressenSillGöran cola ochOasis Pers P Stockholm KjellKaraoke Erik KommunalVolvoMatLekar s (Tina)KällsorterGrönsaker t[2] Le Parc HarryTomten Pott p[2] SalvadorLasseLondon D Berg MOMS Dagens Nyh Spindlar SandraFelliniHöga Bul höjde Mobiltelef 0 0.000 SkeppsholmBigPåvelsTräskor BrotheAftonbladeSAS Långkalson Ally McBeaFredmannsKärnkraftGES JoggaEuronAIK FameRadioPolitik AnsjovisFactoSolarium RIXParisOliverBantningUSA EU Zlatan PC A-ekonomiDödsstraffVägsalt 7 9 Monstertju Trivial Pu Marilin MaNyårslöfteStrumpbyxoSkönhetsplMacIntoshYoga Camp Maloy Fläskpannk Radio NRJRunar Söör 10 -0.050 FarmenAndersOperaStatistik Bjö Dörrvakter 12 Cigaretter AstraJeopardyKönresundsbro (Man 0 5 6 SaddamVita huset Hus AvundsjukaBarn AndningsövGeorge Bus -5 RomantiskaTelitubbieLoisDavidGöteborg och Lett CDjurförsök Tidiga morÅlder High 8 -0.100 Olof Palme 11 Pharmacia StyrketränTandläkareTandläkaren

-10-8 -6 -4 -2 0 2 4 6 8 10 1214 -0.100 -0.050 0.000 0.050 0.100 t[1] p[1]

AcurePharma CONFIDENTIAL 52/92 Finding patterns

10 2

No 13 14 children 5 1 3 4 Male t[2] 0 Female

7 9 10 12 5 6 Have -5 children 11 8 -10-8 -6 -4 -2 0 2 4 6 8 10 12 14 t[1]

AcurePharma CONFIDENTIAL 53/92 PCA • Same principals with a larger number of variables • Principal components are always orthogonal (independent) from each other • Each principal component summarises the data set by generating scores for the observations with corresponding loadings for the variables, i.e. scores and loadings should be compared to each other • Can handle moderate amount of (25%)

AcurePharma CONFIDENTIAL 54/92 Determining the number of principal components •Q2 – cross-validated value indicating how well the model is able to predict the data (explained variance) • Eigenvalue – the length of the principal component • (Chemical) interpretation in the plots, e.g. A=1 (A denotes the number of components)

AcurePharma CONFIDENTIAL 55/92 Outliers – deviating observations

DModX x2 for 1 PC1

1 Outlier in DModX

x1

2 t-value for t-value for observation 1

DModX for 2 Outlier in 2 score space

AcurePharma PCA objectives

• Overview of data – always a good starting point (historical data, data from other sources) • Identify patterns in the data set • Identify important variables • Identify outliers • Understand how variables (loadings) and observations (scores) are related to each other

AcurePharma CONFIDENTIAL 57/92 PCA objectives

• Classification & clustering – dividing the data set depending on the patterns (structure) of the data set – Treated vs. untreated – Before vs. after treatment (cross-over designs, difference in time) – (proteins, enzymes) • Classify new observations • Summarise data with a fewer number of variables – generating ”principal properties” design variables for multivariate design (starting materials or products, chemical

AcurePharmalibraries) CONFIDENTIAL 58/92 Soft Independent Modelling of Class Analogy • Referred to as SIMCA classification • Separate PCA models for each identified class • Predictions of new objects in score space • Predictions of new objects in DModX

• Use your and the knowledge of others…

AcurePharma CONFIDENTIAL 59/92 PCA Modelling

x PC1 X-matrix 3

Class I

x2

Class II

x1

New samples Class III

AcurePharma CONFIDENTIAL 60/92 SIMCA Modelling, Class I

x X-matrix 3 PC1 PCA Class I Class I

x2

Class II x1

New samples Class III Predict in to model AcurePharma CONFIDENTIAL 61/92 PLS - Partial Least Squares Projection to Latent Structures • A projection method, ”regression extension of PCA” • Find the relation between the latent structure in X and latent structure in Y • Maximize the between the X block and the Y block • PLS1 - one Y variable • PLS2 - more than one Y variable

AcurePharma CONFIDENTIAL 62/92 Analysis of the result • Relating the variables (X-block) to the response or responses (Y-block) • Need a regression method which can handle correlated X-variables • Analyse many Y variables at the same time

X Y

AcurePharma CONFIDENTIAL 63/92 PLS Describing Response variables T´ U´ variables p ´ P´ 1 p2´ t1 t2 u1 u2 • Can handle many noisy collinear variables X Y (compare with MLR)

w ´ 1 c1´ C´ • Tolerate moderate W´ w ´ c ´ 2 2 amounts of missing X-space Y- space

x y3 3 data (X and Y) t • Multiple responses

y modelled at the same x1 1 u y time x2 2

u • The result can be graphically visualized t i.e. score plots and

AcurePharma CONFIDENTIAL loading plots 64/92 PLS - Geometric Interpretation

Same observation

x 3 y3

x2 y2

x1 y1 • Each observation is represented by one point in the X-space and one in the Y-space • As in PCA, the initial step is to calculate and subtract the averages; this corresponds to moving the coordinate systems

AcurePharma CONFIDENTIAL 65/92 PLS - Geometric Interpretation

x3 y3

x2 y2

x1 y1 Same observation

• The mean-centering procedure implies that the origin of each coordinate system is re-positioned

AcurePharma CONFIDENTIAL 66/92 PLS - Geometric Interpretation

x3 Comp 1 (t 1) y3 Comp 1 (u 1)

x2 y2

x1 y1 Projection of observation i • The first PLS-component is a line in the X-space and a line in the Y-space, calculated to a) approximate the point-swarms well in X and Y b) provide a good correlation between the projections (t 1 and u1) • Directions are w 1 and c 1 and co-ordinates along these vectors are t and u , respectively. AcurePharma1 1 CONFIDENTIAL 67/92 PLS - Geometric Interpretation

• The projection coordinates, t 1 and u 1, in the two spaces, X and Y, are connected and correlated through the inner relation ui1 = t i1 + h i (h i is a residual)

AcurePharma• The slope of the dotted lineCONFIDENTIAL is 1.0 68/92 PLS predictions • A new observation is similar to the training set if it X-space Y-space y is inside the tolerance x3 3 cylinder in X-space t • Then its projection on the X-model (t) can be entered y into the T-U-relation giving x1 1 a u-value for each model u x y2 dimension 2 • These values define a u point on the Y-space model, which, in turn, corresponds t to a predicted value for each y-variable

AcurePharma CONFIDENTIAL 69/92 PLS - Model diagnostics SIMCA supports two internal model validation strategies 1. Cross validation To estimate the optimal model complexity 2. Response permutation test (Validate-option) To check the degree of overfit

AcurePharma CONFIDENTIAL 70/92 Evaluation of R 2 and Q 2

• PRESS is the sum of •R2 is always larger than Q 2 squared differences • High R 2 and high Q 2 is between predicted and desired observed y-elements • The difference between R 2 2 and Q 2 should not be too P R E S S=∑ ( y − y∃ ) im im large

• PRESS can be transferred into a dimensionless quantity, Q2, which resembles R 2

2 Q = 1 - PRESS/SSY total

R2 = 1 - SSY /SSY AcurePharma resid total CONFIDENTIAL 71/92 PLS-DA PLS Discriminant Analysis

Sekvens Grupp Y-block X-block y1 y2 • Adds information in a 1 B 0 1 2 A 1 0 Y block indicating 3 A 1 0 4 B 0 1 group belonging 5 B 0 1 6 A X 1 0 • X variables important 7 B 0 1 8 A 1 0 for separation can be 9 B 0 1 . . . identified ...... • Works for more than etc. etc. etc. two groups For example two groups

Group A; y 1=1, y 2=0 Group B; y 1=0, y 2=1

AcurePharma CONFIDENTIAL 72/92 PLS – step by step 1. Problem definition / Objective 2. Collect data 3. Import data 4. Pre-treatments 5. Calculate model 6. Evaluate/Validate model 7. Analyse model 8. Suggest new experiments

AcurePharma CONFIDENTIAL 73/92 Multivariate design

AcurePharma Combinatorial Chemistry • Fast compound generation On solid phase, in solution, phage display, in parallel, in mixtures • Purification, analysis methods • ID/characterization Coding, LC-MS, NMR • Fast biological testing High Trough-put Screening • Increase structure diversity Synthesize and test a large number of compounds AcurePharma CONFIDENTIAL 75/92 Drug Discovery Aim: Reduce Development Time 6 200 Synthesized 21 Subacute Toxicology Time span 6.5 12.8 Years Tested in humans 2.5 Phase III clinical trials One Approved drug

COST ~ $ 350 000 000 Nature vol. 384, 1996, supp.

AcurePharma CONFIDENTIAL 76/92 Information Drives the Drug Discovery Process

Aim: Gain information – the more – high quality data  design Guided– the quicker learning – the better FILED PATENT  THE CLOCK IS RUNNING

AcurePharmaNew or better drugs CONFIDENTIAL 77/92 The Combinatorial Explosion • Rapidly generate a great number of possible compounds

Synthesis + +

100 200 300 6 000 000 Diamines Ketones Carboxylic acids products

HTS ID, verification New lead compounds & new drugs

• Commercially available reactants (ACD – Available Chemicals Directory, others)

AcurePharma• Proprietary data basesCONFIDENTIAL 78/92 The Chemical ”Space”

• ~ 10 200 organic molecules with a molecular weight of less than 850 g/mol • ~10 40 organic compounds with “drug like properties” • 10 17 seconds have passed since the Big Bang Roughly 10 to 20 billion years ago – if you believe in that model…

AcurePharma CONFIDENTIAL 79/92 Practical Limitations

• ~ One million compounds in mixtures • ~1000 compounds in parallel synthesis • Costs and practical obstacles to consider − Equipment − Synthesis − Work up − Biological testing (£ 0.1 - 2.0 per sample, 1996) − Disposal considerations − Personnel (salaries, training, etc.)

AcurePharma CONFIDENTIAL 80/92 Important concepts • Maximum diversity Chose as different compounds as possible Diversity : spread of compounds with a defined set of descriptors Descriptor : a variable characterizing a property of the compounds

• Similarity Chose structures similar to a known active compound, lead optimization

AcurePharma CONFIDENTIAL 81/92 Best Selection? Selection in a chemical space defined by two descriptors

All structures

9 selected 2 selected 9 selected Not only the number of compounds synthesised (experiments) that are of importance! AcurePharma CONFIDENTIAL 82/92 Design – screening

Next step  Increase training set – decrease risk of leverage due to deviating results

AcurePharma CONFIDENTIAL 83/92 Follow up design

AcurePharma CONFIDENTIAL 84/92 Multivariate design • Multivariate characterization + • Multivariate data analysis (PCA and PLS) + • Factorial designs 

• MULTIVARIATE DESIGN (MVD) • When used in drug development and with design in identified sub-cluster – Statistical Molecular Design ( SMD )

AcurePharma CONFIDENTIAL 85/92 Multivariate Design • Tool for selecting representative structures • Full factorial designs • Fractional factorial designs • D-optimal designs • Works for more than two principal properties

• Factorial designs - synthesis optimisation - formulation optimisation - process optimisation etc.

AcurePharma CONFIDENTIAL 86/92 Summary • Historical data  Always a good starting point for analysis (PCA)  Determine data structure, preferred format/output  Get acquainted with the process and the data  Find “hidden” information  Good starting point for discussions • Determine  The aim – is there a defined stop criteria (yield, purity, accepted batches, ID important/”sensitive” variables etc.)  Important to define prior to investigation  know when to stop  The experimental domain (variables, settings, responses)

AcurePharma CONFIDENTIAL 87/92 Summary • Design of Experiments (DoE)  Simplify analysis  Ensure a systematic variation in the investigated experimental domain  Small design within the defined limits  (Design in historical data) • Analysis  PCA (SIMCA etc.)  PLS, PLS-DA • Next step  Aim(s) reached  Important variables  New optimal experiments

AcurePharma CONFIDENTIAL 88/92 Acure Pharma Business model ACURE P H A R M A

Consulting services Drug development • Support chemistry • Proprietary compound library • Support IPR questions and • Finding financing for drug problems development • Second opinion/news value Company collaborations • Chemometrics Venture capital PAT (Process Analytical Governmental funding Technology, FDA guidelines) (7FP, VINNOVA) • Network • Defined AcurePharma projects • Exploratory research

AcurePharma CONFIDENTIAL 89/92 Acure Pharma History

… AcureOmics AB September 2007 VINNOVA grant BBB, May 2007 Start-up company of the year 2007 Action Pharma A/S – collaboration Uminova – in-licensing of Cancer project Research agreement with professor Sharma AnaMar Medical AB, Out licensing Carlsson Research – Research agreement, MVA Educational activities (SE, UK) Second opinion, Novelty search Consulting in research and development Acure Pharma Consulting AB  Acure Pharma AB Bidding on the compound library (Melacure) – 2004.

AcurePharma CONFIDENTIAL 90/92 Future outlook

… … … Clinical trials Phase I Identification of new early projects Start up Ltd in UK? – EU project, 2008 – 2011 Consulting and educational activities Establish additional research collaborations

AcurePharma CONFIDENTIAL 91/92 Project pipeline 2008 Pre-clinical development ACURE P H A R M A Hits Leads Pre-CDs CD Phase I TNF-alpha

Blood Brain Barrier

Colorectal cancer

ACURE RESEARCH COLLABORATIONS P H A R M A

Adenosine 2a agonists

MCR

Serotonin modulators

SCI

Metabolic syndrom

AcurePharma CONFIDENTIAL 92/92