<<

ChEMBL an Open Data Resource of Medicinal Chemistry and Patent Data John P. Overington

European Molecular Biology Laboratory - European Institute ChEMBL • The world’s largest primary public database of medicinal chemistry data – ~1.4 million compounds, ~9,000 targets, ~12 million bioactivities • Truly Open Data - CC-BY-SA license • Many download/access formats – Semantic Web • RDF download, SPARQL endpoint at http://rdf.ebi.ac.uk/chembl – ChEMBL Applicances • myChEMBL – linux VM • ChEMpi – raspberry pi • ChEMBL 18 released next week Compound >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGInhibition of SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD K =4.5 nM CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLhuman Thrombin i TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY THVFRLKKWIQKVIDQFGE SAR Data

PTT (partial ED =230 nM thromboplastin 2

time) Assay SureChEMBL

• EMBL-EBI have acquired the SureChem product from Digital Science – >15 million chemical structures – Automatically extracted chemical structures from full-text patent • EMBL-EBI will Provide an ongoing free, Open resource to entire community – Add target, sequence, disease, animal model, cell-line indexing Document Similarity Document 1 Words, n-grams, … Document 2

toxicity pharmacokinetics synthetized toxicity thrombin pyridine pyridine cancer ulcerative colitis Chemicals

N N N N

O + Cl O + N N N N N N O O O Cl O S Cl Cl S O N N Cl Cl N N Cl Cl Cl O Cl O Cl Targets

• Document similarity methods currently rely on word/concept co-occurrence • Possible to extend to include overlap/similarity of shared molecular objects – Sequences and Ligands – Greater richness possible in similarity measures and searches • Sequences – Sequence similarity, domains, structures,… • Ligands – Tanimoto similarity, scaffolds,….

Polypharmacology via Binding Sites

ATC Disease indications linked by shared similarity of target site, for Ligand-gated ion channels (Pfam PF02932)

Santos et al, submitted Target – Pathway - Disease Assays in Drug Discovery

Human clinical trial

• Traditional medicines • , Artemesinin, Arsenic trioxide…. • Very slow and error prone • Not hypothesis led, ad hoc discovery Assays in Drug Discovery

Animal Human disease model clinical trial

• +ve • Higher Throughput • Greater Safety • Faster, cheaper, smaller scale • -ve • Less predictive Assays in Drug Discovery

Functional Animal Human assay disease model clinical trial

• +ve • Higher Throughput • Mechanistic insights and use of advances in basic science • Faster, cheaper, smaller scale • -ve • Less predictive Assays in Drug Discovery

Cell-based Functional Animal Human screen assay disease model clinical trial

• +ve • Higher Throughput • Mechanistic insights and use of advances in science • Faster, cheaper, smaller scale • -ve • Less predictive Assays in Drug Discovery

1980s 1960s 1950s 1920s Ancient

Biochemical Cell-based Functional Animal Human assay screen assay disease model clinical trial

• +ve • Higher Throughput • Mechanistic insights and use of advances in science • Recombinant DNA technology and Genomics • Faster, cheaper, smaller scale • -ve • Less predictive Drug Discovery Assay “Cascade”

Biochemical Cell-based Functional Animal Human assay screen assay disease model clinical trial

• Move from ‘quick, low-cost, less predictive’ assays to ‘slow, high-cost, more predictive’ assays • Make selection of which compounds to progress to later assays on basis of activity in earlier screens • Early, cheap assays are used a lot of times; later, expensive assays rarely • Attrition – failure of compounds in that screening pipeline • Clinical trials configured in staged Phase 1, 2, 3 mode Assay Costs

Costs are estimates!!

Biochemical Cell-based Functional Animal Human In silico assay screen assay disease model clinical trial

Cost (€) 0.0001 10 100 1,000 10,000 100,000,000 Targets & Diseases Connected via Drugs

Animal Biochemical Cell-based Functional Human disease assay clinical trial screen assay model

PPARγ …..

PPARa Type 2

SUR1 …..

K(ATP) channels …..

DPP-IV …..

GLP1R ….. Deep vein thrombosis Thrombin ….. Factor Xa … Target Disease Ontology of Diabetes Drugs Increase Unclear

secretion PPAR-g

Aleglitazar Sensitize to PPAR-gad insulin ATP-sensitive K+ channel Glisoxeoide Metiglinide

GLP-1 R Replace insulin

Alogliptin Diabetes DPP-IV Block carbohydrate GPR40 Fasiglifam absorption , , ) Insulin Insulin, , , Insulin Control degludec satiety/gastric a-glucosidase , , emptying Calcitonin receptor Control Glucose , , , transport SGLT-2 Remogliflozin, Sergliflozin, Biochemical Cell-based Functional Animal Human assay screen assay disease model clinical trial Mus musculus Rattus norvegicus Homo sapiens Psammomys obesus Canis familiaris

Target Assay Cellular Assay Genetic/Induced Animal Model Clinical Trial PPAR-g NOD mouse Type 1 diabetes (E10) PPAR-a Ob/Ob mouse PPAR-d db/db mouse KK mouse PTP1B Nagoya-Shibata-Yasuda (E11) DPP-IV (NSY) mouse Streptozocin-treated RXR-a mouse Alloxan-treated mouse Gestational SGLT-2 diabetes (O24) BB rat Fructose-1,6-bisphosphatase Zucker fa/fa rat Acyl-CoA desaturase CF-related Goto Kakizaki (GK) rat diabetes FXR Otsuka Long-Evans Tokushima fatty (OLETF) rat SGLT-1 Streptozocin-treated rat Glucocorticoid- Glucose-6-phosphatase Alloxan-treated rat related diabetes

G- bile acid receptor 1 Psammomys obesus

Alloxan-treated dog iPSC derived b-cells Maturity onset-diabetes (GCK mutant) of the young - MODY 2 Compounds & Bioassays in ChEMBL

Compound

Assay 1 Compound Thrombin inhibition Activity 1

Ki=4.5 nM

Assay Activity 2 Assay 2

ED2=230 nM PTT (partial

thromboplastin time) ChEMBL Assays as a Graph

Compound

Activity

Assay

17 compounds Assay 2 Assay 1

24 compounds 3 compounds

2 compounds 2 compounds

Assay 3 Assay 4 1 compound EGFR Signaling Pathway

K. Oda, et al. Mol. Syst. Biol. 1 DOI:10.1038/msb4100014 EGFR Assay Cascades From ChEMBL

Assay Network for EGFR pathway inhibitors

F. Krueger (unpublished) EGFR Assay Cascades from ChEMBL

Physicochemical properties for cSrc EGFR pathway inhibitors

Biochemical assay Cell-based assay In vivo assay

AlogP

Mol. Wt. (Da) Mol. Wt. (Da) Mol. Wt. (Da)

F. Krueger (unpublished) Extraction & Curation of PK Data Single Imatinib 400 mg dose Imatinib Polypharmacology Spectra

Tyrosine-protein kinase FYN 5.38 ATP-binding cassette sub-family G member 2 5.39 c-Jun N-terminal kinase 1 5.40 Serine/threonine-protein kinase 17A 5.41 c-Jun N-terminal kinase 3 5.50 Dual specificity protein kinase CLK4 5.53 Mixed lineage kinase 7 5.59 Tyrosine-protein kinase FGR 5.62 Tyrosine-protein kinase FRK 5.64 Maternal embryonic leucine zipper kinase 5.72 Serine/threonine-protein kinase GAK 5.72 type-A receptor 8 5.77 Serine/threonine-protein kinase RAF 5.77 -1 receptor-associated kinase 1 5.92

Carbonic anhydrase XII 6.01

Homeodomain-interacting protein kinase 4 6.02

) Tyrosine-protein kinase Lyn 6.05 1 - III 6.28 Tyrosine-protein kinase BLK 6.28 Carbonic anhydrase XIV 6.33 Tyrosine-protein kinase LCK 7.00 BCR/ABL p210 fusion protein 6.41 Platelet-derived growth factor receptor alpha 7.09 Carbonic anhydrase VI Carbonic anhydrase 15 7.11 6.41 Carbonic anhydrase IX 7.12 Phosphatidylinositol-5-phosphate 4-kinase type-2 gamma 6.42 Platelet-derived growth factor receptor beta 7.14 Macrophage colony stimulating factor receptor 6.54 Tyrosine-protein kinase ABL 7.20 Stem cell growth factor receptor 6.62 Platelet-derived growth factor receptor 7.30 Bcr/Abl fusion protein 6.66 Discoidin domain-containing receptor 2 7.34 Carbonic anhydrase VII Epithelial discoidin domain-containing receptor 1 7.37 6.96 Carbonic anhydrase I 7.50 Carbonic anhydrase II 7.52

Tyrosine-protein kinase ABL2 7.94 Concentration (ng.ml Concentration

6.0

7.0 8.0

Time (hr)

Imatinib 400 mg single dose from Jawhari et al (2011) J Bioequiv Availab 3: 161-164; Data is median pChEMBL for human targets from ChEMBL 16 Kinase Inhibitor Polypharmacology

Staurosporine Sunitinib Sorafenib Imatinib Dasatinib (no trials)

Erlotinib Gefitinib Lapatinib Tofacitinib Tozasertib (Ph. II)

US launched Adapted from Ghoreschi et al, Nature Immunology 10, 356 - 360 (2009) Imatinib Pharmacokinetics

• ‘Tides’ of target exposure during dosing schedule

imatinib

)

1

- Concentration (ng.mL Concentration N-desmethyl imatinib

Imatinib, 400 mg uid

Note log scale for concentration axis Time (hr) Acknowledgements

• ChEMBL • eTox – Anne Hersey – Ruth Akhtar – Francis Atkinson – Anna Gaulton – Patricia Bento – Louisa Bellis • Research – George Papadatos – Felix Kruger – Mark Davies – Ben Stauch – Michal Nowotka – Rita Santos • UniChem – Joey Bach Hardie – Grace Mugumbate – Jon Chambers – Gerard Van Westen