Building innovative drug discovery alliances
What to make next? Augmented Design
Cresset UGM, 29th June 2017 Building innovative drug discovery alliances
What to make next? Augmented Design
Or
2 Old blokes and a couple of buns
Cresset UGM, 29th June 2017 What to make next?
PAGE 2 What’s medicinal chemistry?
PAGE 3 What do I make next? Lead Telemetry
1st Candidate
• MPO scores comprised of: Potency MPO Solubility Score Protein binding Metabolic stability LogD
Project progression
PAGE 4 Good medicinal chemistry? Project Progress as a Function of Time & compound number pIC 50 Comp. 3 Comp. 4 (LLE 8.0) (LLE 8.0) First co-crystal structure delivered Structure based design: Comp. 2 - Targeting specific unconserved (LLE 7.9) Target Rapid progress increasing potency Core redesigned: cysteine residue Off Target 1 Docking based design: - Replacement of aromatic CH Off Target 2 - Introduction of axial methyl “lock” with heteroatom - Rigidification of molecule Comp. 1 - Intramolecular H-bonding (LLE 6.8)
Rapidly regained potency
Advanced Lead identified Ames liability removed AO liability identified No Ames liability … with drop in potency No AO activity
Initial Hit (LLE 3.2) Ames liability Selectivity improved identified 6 months
PAGE 5 Key tactics for early series evaluation
Establishing the Pharmacophore Understanding Conformation • Determine which molecular features are driving or limiting • Assessing conformational potency landscape of hit series • Evolution of molecular • Exploration of molecular scaffolds using series features limiting conformational hybridisation and freedom and assessing their computational scaffold hopping impact on biological profile approaches
Focus on Properties Ensuring Compound Efficiency • Focus on aligning molecular • Hypothesis driven, iterative properties of a series with approach to compound design desirable property space* • Preparing minimum number of • Establish independence compounds required to address between trends in molecular an issue or assess potential of properties and biological profile a series
PAGE 6 * Desirable property space depends on route of admin, target organ etc What do chemists do well? An under-utilised resource
lab notebooks (eLN)
200 chemists
proprietary reaction ~10 reactions per week database
Adding say ~100,000 reactions per year, strong medicinal chemistry bias
PAGE 7 The ‘Bread & Butter’ A brief history of synthetic medicinal chemistry ….
‘Our study also shows a steady increase in the number of different reaction types used in pharmaceutical patents but a trend toward lower median yield for some of the reaction classes.’
Huge pool of known chemistry waiting to be tapped
PAGE Landrum et al J. Med. Chem. 2016, 59, 4385−4402; Rougley & Jordan J Med Chem. 2011, 54,3451-79 Reaction Vectors
O O + HO OH O R1 R2 P
I 1 2 3 4 I 1 2 3 4 Bond C-C C=O C-OH C-OR Bond C-C C=O C-OH C-OR # 4 1 2 0 # 4 1 0 2
reactant vector, R = (R1 + R2) product vector, P
I 1 2 3 4 Bond C-C C=O C-OH C-OR # 0 0 -2 2
reaction vector, D = P - R
PAGE 9 Broughton, H. B., Hunt, P. & Mackey, M. (2003) Methods for Classifying and Searching Chemical Reactions. United States Patent Application 367550. Reaction Vectors in Structure Generation
• The reaction vector, D, equals the difference between the product vector, P, and the reactant vector, R D = P – R
Given a reaction vector, D, and a reactant vector, R, the product vector, P, can be obtained P = D + R
Given a product vector, P, can we reconstruct the product molecule(s)?
O O
I 1 2 3 4 better descriptor Bond C-C C=O C-OH C-OR O # 4 1 0 2 O is required O
O
PAGE 10 J. Chem. Inf. Model., 2009, 49 (5), pp 1163–1184. Knowledge-Based Approach to de Novo Design Using Reaction Vectors So… Using known chemistry we can…
PAGE 11 Augmented Design: scaffold hopping in CDK2 Feature Similarity + Docking
• Reactions: 26K • Reagents: 18K extracted from Aldrich No rings < 4 Contain C, N, O, S, P, F, Cl, Br, I, B, Si, Se < 3 F Hann substructures removed Rb < 3 Max 15 heavy atoms • Scored feature similarity to 1H1S
Hinge Polar Cyclohexyl Fragment 4400 products 16000 products
8 starting 10 starting materials materials
PAGE 12 Augmented Design: Fast Follower Approach Find me a new IP free series?
Type II Kinase Inhibitor
PAGE 13 Spark meets Reaction Vectors Kinase inhibitor programme
Evotec Program Target Starting Point End Point Timeline contribution Literature fast H2L Kinase inhibitor Advanced Lead 5-6 FTEs 9 Months follower
Literature Type II Stage 1: Generation of new cores Stage 2: RV Backpocket expansion kinase inhibitor 1) Spark replacement of Cmpd A core 1) New structures generated with novel Cmpd A: 2) ROCs overlay, GOLD docking & building blocks shown pharmacophore match using Cmpd A 2) New structures filtered on phys. chem. crystal structure properties (cLogD 1.5-3.5, MW<500) 3) 4 new cores building blocks selected and 3) ROCs overlay / GOLD docking & Hinge Core Back conformational analysis completed: pharmacophore match completed binder pocket 4) Compounds docked in key kinase crystal structures to test for selectivity IC50 = 10 nM 5) Predictive DMPK workflow run Crystal structure 6) Scifinder IP searches conducted published 7) 14 New compounds selected for synthesis
Novel lead compound generated with desirable potency suitable for further SAR expansion IC50 = 80 nM
PAGE 14 So…
PAGE 15 What do I make next? Options
RVs & Bayes FMO
Pan-Omics eAPPS
PAGE 16 What to make next? GPR Bayesian optimisation meets reaction sequence vectors
Balance between Exploitation Exploration
Acquisition Functions
• For finding the maximum: Probability of improvement (PI) Expected improvement (EI) Upper confidence bound (UCB)
• For finding the minimum: Lower confidence bound (LCB)
PAGE 17 Nafisa Sharif & Mike Osborne 2016 The Dataset The Chemist’s QSAR
Apply acquisition The Experiment function
Train Train +1+2+3…n Test Train
Build model (50)
pIC50 Train +1+2+3…n Calculate acquisition function. Add top scoring compound to model. Rebuild model.
Test
Predict on final compounds (42). Repeat 100x… (50+1+2…151) Compounds (ordered chronologically)
PAGE 18 Results: Observing Signal Formation
Application of the acquisition function to build the training set Test Test set r No Bayesian Optimisation Test set r
PI UCB
2
2
value
value Test set r
Number of compounds added to training set Number of compounds added to training set
2
value
Test Test set r Test Test set r
2 LCB
Number of compounds added to training set EI 2
value value Chronologically Ordered
Number of compounds added to training set Number of compounds added to training set
PAGE 19 Balancing Exploration with Exploitation Mixing the acquisition functions
No Bayesian Optimisation LCB followed by UCB LCB followed by UCB
Test Train +1+2+3…n Test
Test Test set r
2 value
kappa = 2
Number of compounds added to training set
Exploration followed 2 r2 = 0.627 r = -0.345 by Exploitation No Correlation Good Correlation
PAGE 20 Modelling
“By definition all models are wrong … it’s just that some are more useful than others” George EP Box (1919-2013)
“Modelling isn’t about getting it right … It’s about understanding why you get the answer you get”
If you understand the why then you can build a better model
PAGE 21 Summary
• Demonstrated that given a molecule we can automatically suggest what chemistry we can apply and hence what molecules we can make
• Successful impact of RVs in combination with Cresset tools
• Described research into the application of Acquisition Functions to drive choice of compound(s) for synthesis from within the RV networks
PAGE 22 Acknowledgments
Dimitar Hristozov Val Gillet Mike Osborne Atanas Patronov Beining Chen Nafisa Sharif Craig Johnstone Hina Patel Michelle Southey Ben Allen Chris Stimson James Wallace
23
PAGE Building innovative drug discovery alliances
Your contact: Mike Bodkin VP Research Informatics 114 Innovation Drive, Milton Park, Abingdon Oxfordshire OX14 4RZ, UK T: +44 (0)1235 44 1207
[email protected] What do I make next? Sequence vector network
PAGE 25 AI & Augmenting Design
Local QSAR Global QSAR
7
y = 0.7044x + 0.9289 6 R2 = 0.6091
5
4
3
2
1
0 0 1 2 3 4 5 6 7 8
Given a single change, Take all the data, What is the effect? What is the effect?
Bench Chemist Computational Chemist
PAGE 26 Augmented Design
Load Molecule Generate new molecules Any scoring tool Multi-objective Best new molecules
Input Mutate Score Rank Output
Q2 = 0.76
Activity XQSAR Pool of possible products Predicted Local Models Multi-objective Pareto optimisation Actual algorithm In-silico reaction
Reaction Vectors Score Select
Starting material Docking Score
Q2 = 0.68
ADMETQSAR Global Models
Predicted Cream-off top scoring molecules for evolution Actual
PAGE 27 1. Multi-dimensional de novo design of drug-like compounds. 2013, De Novo Molecular Design ISBN 978-3-527-67700-9. 2. Validation of Reaction Vectors for de Novo Design. 2011, Library Design, Search Methods, and Applications of Fragment-Based Drug Design. ISBN 9780841224926. 3. Knowledge-Based Approach to de-Novo Design using Reaction Vectors. 2009, 49 (5), pp 1163–1184. J. Chem. Inf. Model. Ion channel dual Inhibitor design QSARs drive the objective function
Coloured by Pareto Rank: Red (High)-> Blue (Low)
2nd/3rd Iterations
V1.5 pIC50 Prediction (log scale) (log Prediction pIC50 V1.5 K
NaV1.5 pIC50 Prediction (log scale) Results (3 Iterations)
PAGE 28 Antipsychotic polypharmacology 4 objectives: QSAR + Pharmacophore Similarity
The chart shows the known affinity (Ki) values of antipsychotic drugs for a panel of receptors.
How can we go about designing a novel antipsychotic?
26K Reactions 93K Reagents
PAGE 29 Naunyn Schmiedebergs Arch Pharmacol. 2015 Mar 14 Fragment Growth in-situ PHIP2 Bromo-Domain
Reactions based around
Objective function
PAGE 30 Cox, O. B., Krojer, T., Collins, P., Monteiro, O., Talon, R., Bradley, A., … Glick, M. (2016). A poised fragment library enables rapid synthetic expansion yielding the first reported inhibitors of PHIP(2), an atypical bromodomain. Chem. Sci., 7(3), 2322–2330. http://doi.org/10.1039/C5SC03115J Interaction sampling using SIFts & K-means
• Approx 25% molecules in cluster 0 make interactions with GLU-1339 and GLN-1343
• All in cluster 2 make interactions with GLU-1349
PAGE 31 Evolutionary Design KNIME framework
PAGE 32 Fragment Molecular Orbital (FMO) QM-Based SBDD
• FMO is a quantum mechanical method that has been developed for application to large (biological) systems • FMO provides detailed analysis of protein-ligand interactions and their chemical nature Calculate individual contribution of each residue and water molecule to binding enthalpy
Glu81 Phe80
Phe82 - Electrostatic - Exchange Repulsion His84 Leu134 - Dispersion - Charge Transfer “Strength of molecular interaction at that snapshot in time”
PAGE 33 Heifetz et al., J. Med. Chem., Article ASAP; J. Chem. Inf. Model., 2016, 56 (1), pp 159–172 QM Virtual SAR expansion using FMO 1WCC (CDK2) core modifications
PDB: 1WCC
IC50 = 350mM LE < 0.51 1 2 3 4 5
6 7 8 9 10 Removal of the chlorine detrimental to the fragment binding 11 12 13
• Medium throughput FMO analysis can be
rapidly carried out to answer SAR questions
E D
• The technique is highly effective for prioritizing
the initial fragment expansion directions or IC50 = 7mM optimization for larger ligands DE = Sum PIE – Sum PIE (1WCC fragment)
PAGE 34 Modified Atom Pairs
Atom Pairs 2 (AP2) : X1(n, p, r)-2(BO)-X2(n, p, r)
X: element type n: number of bonds to heavy atoms p: number of π bonds r: number of ring memberships BO: bond order
Atom Pairs 3 (AP3): X1(n, p, r)-3-X2(n, p, r)
• Extending the bond distance in atom pairs encodes more of the environment of the reaction centre
PAGE 35 Beckmann Rearrangement
H N
N O OH Cl Cl X element type n number of bonds to heavy atoms Reaction vector p number of π bonds r number of ring memberships Negative APs Positive APs BO bond order 1- C(3,2,1)-2(1)-C(3,1,0) 1+ C(3,2,1)-2(1)-N(2,0,0)
2- C(3,1,0)-2(2)-N(2,1,0) 2+ C(3,1,0)-2(1)-N(2,0,0) Atom Pairs 2 (AP2) : X1(n, p, r)-2(BO)-X2(n, p, r)
3- N(2,1,0)-2(1)-O(1,0,0) 3+ C(3,1,0)-2(2)-O(1,1,0)
a- C(3,2,1)-3-N(2,1,0) a+ C(2,2,1)-3-N(2,0,0)
b- C(3,2,1)-3-C(1,0,0) b+ C(2,2,1)-3-N(2,0,0)
c- C(3,1,0)-3-C(2,2,1) c+ C(2,2,1)-3-N(2,0,0) Atom Pairs 3 (AP3): X1(n, p, r)-3-X2(n, p, r) d- C(3,1,0)-3-C(2,2,1) d+ N(2,0,0)-3-C(1,0,0)
e- C(3,1,0)-3-O(1,0,0) e+ N(2,0,0)-3-O(1,1,0)
f- N(2,1,0)-3-C(1,0,0) f+ O(1,1,0)-3-C(1,0,0)
PAGE 36 Applying a RV to a reactant to generate a Product
1. Removing the negative atom pairs from the reactant
H 1- 2- N N 3- O OH Cl Cl
Negative APs Positive APs
1- C(3,2,1)-2(1)-C(3,1,0) 1+ C(3,2,1)-2(1)-N(2,0,0) 9 CH3 2- C(3,1,0)-2(2)-N(2,1,0) 2+ C(3,1,0)-2(1)-N(2,0,0) H C 3- N(2,1,0)-2(1)-O(1,0,0) 3+ C(3,1,0)-2(2)-O(1,1,0) 3 C 8 HC 4 C 6 a- C(3,2,1)-3-N(2,1,0) a+ C(2,2,1)-3-N(2,0,0)
b- C(3,2,1)-3-C(1,0,0) b+ C(2,2,1)-3-N(2,0,0) C CH 5 2 Cl C 1 c- C(3,1,0)-3-C(2,2,1) c+ C(2,2,1)-3-N(2,0,0) 7 H d- C(3,1,0)-3-C(2,2,1) d+ N(2,0,0)-3-C(1,0,0)
e- C(3,1,0)-3-O(1,0,0) e+ N(2,0,0)-3-O(1,1,0)
f- N(2,1,0)-3-C(1,0,0) f+ O(1,1,0)-3-C(1,0,0)
PAGE 37 Applying a RV to a reactant to generate a Product
2. Adding positive atom pairs to the fragment
9 CH Negative APs Positive APs 3 H C 3 C 8 1- C(3,2,1)-2(1)-C(3,1,0) 1+ C(3,2,1)-2(1)-N(2,0,0) HC 4 C 6 2- C(3,1,0)-2(2)-N(2,1,0) 2+ C(3,1,0)-2(1)-N(2,0,0) A 5 CH Cl C 2 3- N(2,1,0)-2(1)-O(1,0,0) 3+ C(3,1,0)-2(2)-O(1,1,0) 7 H 1 9 9 CH3 CH3 a- C(3,2,1)-3-N(2,1,0) a+ C(2,2,1)-3-N(2,0,0) 2+ (d+) 3+ (f+) H H C C 3 C 8 3 C 8 b- C(3,2,1)-3-C(1,0,0) b+ C(2,2,1)-3-N(2,0,0) 4 6 4 6 O HC NH HC C 10 B 10 C c- C(3,1,0)-3-C(2,2,1) c+ C(2,2,1)-3-N(2,0,0) CH C CH 5 2 5 2 Cl C 1 Cl C 1 d- C(3,1,0)-3-C(2,2,1) d+ N(2,0,0)-3-C(1,0,0) 7 H 7 H
e- C(3,1,0)-3-O(1,0,0) e+ N(2,0,0)-3-O(1,1,0) 9 1+ (a+,b+,c+) 9 2+ (d+,e+) 1+ (a+) CH 11 CH f- N(2,1,0)-3-C(1,0,0) f+ O(1,1,0)-3-C(1,0,0) 3 H H 9 3 C N CH NH H 11 3 3 H 8 C 4 10 C 3 C C HC 6 C 3 C 8 4 6 8 4 6 O HC C N HC C 10 D 10 H E F 5 CH C CH Cl C 2 C CH 5 2 7 1 5 2 Cl C 1 H Cl C 1 Atom Pairs 2 (AP2) : X1(n, p, r)-2(BO)-X2(n, p, r) 7 H 7 H
No AP2s left in the reaction X element type H H 9 H H 9 vector that match atom 11 3 C N CH3 3 C N CH 4 10 4 11 C 3 n number of bonds to heavy atoms HC 6 C 8 HC C 6 8 G H p number of π bonds O 5 CH O C CH 10 r number of ring memberships Cl C 2 11 Cl 5 C 2 7 1 3+ (e+,f+) 7 1 1+ (a+,b+,c+) BO bond order H H Final Solution Duplicate solution
PAGE 38 How well does it work? Organic Chemistry Database
Number of Correctly Reproduced Reaction Type Reactions Number Per cent Epoxide reduction 450 449 99.8 Version II: Stored as SQL database Epoxide formation 450 444 98.7 Includes: Ester to amide 172 172 100.0 the reaction vector Alcohol dehydration 171 169 98.8 Claisen rearrangement 61 54 88.5 the environment around each broken bond Beckmann rearrangement 123 123 100.0 the fragmentation path Friedyl Crafts acylation 113 113 100.0 the reconstruction path Olefin metathesis 9 7 77.8 Dieckmann condensation 98 91 92.9 the original reagent and reactants Nitro reduction 231 230 99.6 Alkene oxidation 272 272 100.0
9 Cope rearrangement 453 306 67.5 CH3 H C Improved algorithm Aldol condensation 134 134 100.0 3 C 8 HC 4 C 6 Alcohol amination 97 97 100.0 A 5 CH Cl C 2 Amide reduction 51 51 100.0 7 H 1 9 9 CH3 CH3 Diels-Alder hetero 441 320 72.6 2+ (d+) 3+ (f+) H H C C 3 C 8 3 C 8 Ether halogenation 58 58 100.0 4 6 4 6 O HC NH HC C 10 B 10 C Ozonolysis 132 125 94.7 5 CH C CH Cl C 2 Cl 5 C 2 7 1 7 1 Claisen condensation 98 77 78.6 H H
Carboxylic acids to aldehydes 194 194 100.0 9 1+ (a+,b+,c+) 9 2+ (d+,e+) 1+ (a+) CH 11 CH 3 H H 9 3 C N CH NH Nitrile reduction 102 102 100.0 H 11 3 3 H 8 C 4 10 C 3 C C HC 6 C 3 C 8 4 6 8 4 6 O HC C N HC C 10 Diels-Alder cycloaddition 106 65 61.3 10 D H CH E F 5 2 C CH Cl C 1 C CH Fischer indole 230 94 40.9 Cl 5 C 2 7 H Cl 5 C 2 7 H 1 7 H 1 Alkene halogenation 310 281 90.6 No AP2s left in the reaction Nitrile hyrdrolysis 460 460 100.0 H H 9 H H 9 vector that match atom 11 3 C N CH3 3 C N CH 4 10 4 11 C 3 HC 6 C 8 HC C 6 8 Olefination 455 427 93.8 G H CH O C CH O 5 2 11 5 2 10 Wittig-Horner 211 190 90.0 Cl C 1 Cl C 1 7 H 3+ (e+,f+) 7 H 1+ (a+,b+,c+) Robinson annulation 13 10 76.9 Final Solution Duplicate solution Total 5,695 5,115 89.8
PAGE 39 Hristozov, D., Bodkin, M., Chen, B., Patel, H. & Gillet, V. J. 2011. Validation of Reaction Vectors for De Novo Design. Library Design, Search Methods, and Applications of Fragment-Based Drug Design. American Chemical Society. The algorithm looks great but ...
1. The algorithm only knows about transformation types that are in the Db! 2. The AP2/3’s cover 1 and 2 bonds. Remote functionality isn’t considered.
3. A reaction path is not a drug “optimisation”!
i ii iii iv
a b c d e
But how do we get from here to here?
PAGE 40 Reaction Sequence Vectors Tools for molecular design
i ii iii iv
a b c d e
Sequence Vectors a b More one to think about a c a d a e
b x Molecules Nodes // RVs Edges
PAGE SAR Exploration Succinyl Hydroxamates
PAGE 42 Bailey, S., et al., 2008. Bioorganic & Medicinal Chemistry Letters, 18, 6562-6567 Novel SAR Succinyl Hydroxamates
PAGE 43 James Wallace ukQSAR 2014 Principle Components Analysis of Property Space Succinyl Hydroxamates
PAGE 44 James Wallace ukQSAR 2014.