EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery
John P. Overington Welcome!
• Introductions • Overview of course material • Why ChEMBL had to be built!
• The more questions you ask the more you’ll learn what you want to know • Your feedback is important to us Project Idea
Phenotypic Target-based assay assay
Previously Novel screened
Unknown Known/similar Known/similar target structure target structure ligands
In-house compound Commercial Virtual collection compounds compounds HTS Screen Purchase of specific Commission compounds synthesis
Drug Overview of Material
• Monday pm • Important safety stuff • Meandering intro by jpo • Perspectives and Challenges in Drug Discovery – Darren Green (GSK) • Delegate presentations – Tell us what you’re interested in, what you expect, and we’ll see what we can do Overview of Material
• Tuesday • Target Analysis and Selection – Mike Barnes (Queen Mary College), Bissan Al- Lazikani (Institute of Cancer Research) and Anna Gaulton (EMBL-EBI) – Pathways, druggability, genetics, disease linkage, target annotation, prioritisation. – Hands on sessions Overview of Material
• Wednesday • Databases and resources for Compound Selection – Anne Hersey (EMBL-EBI), John Irwin (UCSF), Noel O’Boyle (NextMove) – ChEMBL, PubChem, ZINC, searching, complexities of compound structure representation – Software tools – Hands on session Overview of Material
• Thursday • Computational Chemistry – Val Gillet (Sheffield), Francis Atkinson (EMBL-EBI), George Papadatos (EMBL-EBI), Gary Battle (EMBL- EBI) – Chemoinformatics in compound design, structure- based design, protein structure, target profiling – Software tools - KNIME – Hands-on Overview of Material
• Friday Morning – Drug Repurposing – John Overington (EMBL-EBI) – Methods and Resources, Tactics for repurposing, Product Profile – Hands on session What if things go wrong?
• If you get lost, need information of transport, miss a bus, lose your computer, are arrested, need a recommendation for a nice pub, want to buy an animal to take home,….. • Tom Hancocks during the course itself • For medical emergencies – Telephone 999 • For everything else – Telephone 07557-767072
Mapping Drug and Medicinal Chemistry Space – The ChEMBL Database
John P. Overington EMBL-EBI
[email protected] Our Strategy and Hopes
• Comprehensively catalogue historical drug discovery • Include successes and failures • Large scale abstraction curation of primary literature • Direct depositions • Drugs can be small molecules, peptides, recombinant proteins, siRNA, cells, viruses, etc. • ‘Learn’ rules for drug discovery ‘success’ • Target selection and prioritisation - druggability • Lead discovery, optimisation, clinical candidate selection • Develop approaches to new target classes – e.g. PPIs • Drug combinations to improve safety and variability
Elements in Bioactive Molecules
H C C N O F S Cl Organic Chemistry
• Carbon-based chemistry – the chemistry of life – Natural - Methane, Carbon Dioxide, b-carotene…
CH4 CO2
– Synthetic - Armodafinil, Atorvastatin How Big Is Chemical Space?
• GDB databases from Jean-Louis Reymond, University of Berne, Switzerland • GDB-13 database – Small organic molecules up to 13 atoms of C, N, O, S and Cl – Simple chemical stability and synthetic feasibility rules – 977,468,314 structures – GDB-13 is the largest publicly available small organic molecule database ADMET - The Rule of Five
• To be bioactive, a molecule needs to access its ‘target’ • ADMET – Adsorption, Distribution, Metabolism, Excretion and Toxicity – The body has evolved to be really choosy over the types of molecule it lets in and tolerates • Chris Lipinski while at Pfizer uncovered the Rule of Five.. – ..proposed that an organic small molecule was likely to have poor oral drug properties if • >500 Molecular Weight (size of molecule) • >5 logP (greasiness of molecule) • >5 Hydrogen bond donors (polarity of molecule) • >10 Ns and Os (polarity of molecule) – Topical and parenteral dosed drugs are different How Big is Bioactive Chemical Space?
Likely to be ~1019 organic small molecules obeying Lipinski’s Rule of Five How Many Biological Targets Are There?
• Genes within the genome encode “target’ proteins – Bioactive molecules usually interact with proteins • Typical gene numbers in important genomes – fX174 Phage (a virus that infects bacteria) 11 – Escherichia coli (a bacteria) 4,377 – Plasmodium falciparum (the malaria parasite) 5,268 – Drosophila melanogaster (a fruit fly) ~17,000 – Homo sapiens ~21,000
Chemical and Biological Spaces
• ~10,000,000,000,000,000,000 potential small bioactive molecules – 1 g sample of each would use all the Earth’s carbon • ~100,000 potential relevant biological receptors – Humans and pathogens Presented to P&G, Cincinnati, April 2005, © 2005 Inpharmatica Ltd. Screened Screened Drug All reasonable All Exploration of bioactivity space at genomic scalespace genomic at of bioactivityExploration targets proteins 10 Proteins 2
10 Structure Structure (SAR) Activity Relationship 3
10 6
Chemogenomics Drugs Drugs 10 Drugs
ChEMBL 3 Screened
molecules 10 molecules
7 - 8
molecules reasonable All
10
20
Clustering and Families
• Small molecules and targets can be organized into ‘families’ • – this feature greatly helps analysis and data mining
Sets of related targets Sets of related small molecules Bio- and Chemoinformatics
• Bioinformatics largely to comparison of 1-D ‘strings’ – nucleic acid and protein sequences – Alignment, searching, mapping,…. • Chemoinformatics is more complex – Chemical structures are 2-D – more difficult to represent on computers – Alignment, searching are still areas of active research – Mapping is largely solved by the InChI – Open Standard and Software for unambiguous representation of chemical structures • originally developed by NIST
Aspirin InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12) Chemical Space
All compounds Drug-like compounds Available compounds
Only certain molecules have features consistent with good pharmacological properties Target Space
All targets Druggable targets Available targets
Only certain targets have binding sites capable of ligand efficient binding of drug-like ligands Accessible Pharmacological Space
Drug-like compounds but no complementary targets
Druggable targets Available and complementary compounds for compounds Druggable targets target but non-drug- but no like complementary compounds Drug Optimisation Imidazole triazole
1st generation 2nd generation 3rd generation 4th generation Prototype N N O + N N O + N N O O O O S
O Metronidazole 1962 Tinidazole 1970 Terconazole 1980
N Posaconazole 2005 N
N Cl O + Itraconazole 1984 N N Clotrimazole 1970 Ketoconazole 1978 O Azomycin N N (1956) Cl N Cl N Cl Cl Streptomyces Cl O natural product Cl S trichomonacidal Cl Sulconazole 1980 ‘toxic’ Miconazole 1970 N N Voriconazole 2002 Cl N N Cl
Cl O Econazole 1972 Fluconazole 1988 Bifonazole 1981 Fosfluconazole 2004 After W. Sneader Drug Discovery
Target Lead Lead Preclinical Phase 1 Phase 2 Phase 3 Launch Discovery Discovery Optimisation Development (Phase 4)
• Target identification • Medicinal • Microarray • High-throughput Chemistry profiling Screening (HTS) • Structure-based Indication • Target • Toxicology Safety • Fragment-based drug design PK discovery, validation • In vivo safety Efficacy & screening • Selectivity screens tolerability repurposing • Assay pharmacology Efficacy • Focused libraries • ADMET screens & expansion development • Formulation •Screening • Cellular/Animal • Biochemistry • Dose prediction collection disease models • Clinical/Animal • Pharmacokinetics disease models
Discovery Development Use Med. Chem. SAR Clinical Candidates Drugs
>1,290,000 compound records ChEMBL >6,900,000 bioactivities ~12,000 clinical ~1,600 ~44,500 abstracted papers candidates drugs content ~8,000 targets Targets of Launched Drugs
Overington et al, Nat. Rev. Drug Disc., 5, pp. 993-996 (2006) Different Types of Drugs E. Lounkine et al, Nature (2012) NFκB Pathway Drug Approvals FDA Approved Drugs Affinity of Drugs for their‘Targets’
Ki, Kd, IC50, EC50, & pA2 endpoints for drugs against their‘efficacy targets’
400
350
300
250
200 Frequency 150
100
50
0 2 3 4 5 6 7 8 9 10 11 12
-log10 affinity
10mM 1mM 100mM 10mM 1mM 100nM 10nM 1nM 100pM 10pM 1pM
Overington, et al, Nature Rev. Drug Discov. 5 pp. 993-996 (2006) Gleeson et al, Nature Rev. Drug Discov. 10 pp. 197-208 (2011) Clinical Candidates
• Database of clinical development candidates – Contains ~12,000 2-D structures/sequences • Estimated size ~35-45,000 compounds – Work in progress • Deeper coverage of key gene families • e.g. Protein kinases, 361 distinct clinical candidates Pharma Industry Productivity
File Registration number vs. USAN date 800,000 Phase 2b date
700,000
~Discovery date 600,000
500,000
400,000
300,000
200,000
100,000
0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
Overington, unpublished Pharma Industry Productivity 16 Drugs/100,000 compounds Large Pharma needs to 64 USANs/100,000 compounds 70 synthesise and test ~250,000
60 compounds for each drug
50
40
30 0.4 Drugs/100,000 compounds 20 1.9 USANs/100,000 compounds
10
0 1- 100,001- 200,001- 300,001- 400,001- 500,001- 600,001- 700,001, 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 File registration number range
Overington, unpublished Clinical Candidates What Is the ChEMBL Data? What Is the ChEMBL Data?
Compound >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGInhibition of SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD K =4.5 nM CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLhuman Thrombin i TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY THVFRLKKWIQKVIDQFGE SAR Data
PTT (partial ED =230 nM thromboplastin 2
time) Assay ChEMBL Target Types
Protein Protein complex Protein family Nucleic Acid
e.g. DNA e.g. PDE5 e.g. Nicotinic acetylcholine receptor e.g. Muscarinic receptors
Cell line Tissue Sub-cellular fraction Organism
e.g. HEK293 cells e.g. Trachea e.g. Mitochondria e.g. Drosophila A. Gaulton, L. Bellis, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, R. Akhtar, F. Atkinson, A.P. Bento, B. Al-Lazikani, D. Michalovich, & J.P. Overington (2011) ‘ChEMBL: A Large-scale Bioactivity Database For Chemical Biology And Drug Discovery’ Nucl. Acids Res. Database Issue. 40 D1100-1107 DOI:10.1093/nar/gkr777 http://www.ebi.ac.uk/chembl
43 Compound Searching
4 4 Spreadsheet Views
4 5 Ligand Efficiency
4 6 Target Class Data Assay Organism Data F.A. Krüger & J.P. Overington (2012) ‘Global analysis of small molecule binding to related protein targets’ PLoS Comp. Biol. 8, e1002333
Differences Between Human And Rat Orthologs
Distribution of affinity differences
Human vs Rat
Rat
d
pK density
pKd Human
|human pKd - rat pKd|
-log(Kd) Human 5 0 Differences Between Different Assays
Distribution of inter-assay affinity differences Binding affinity in human and
rat assays
Assay2
d
density pK
pKd Assay1 |human pKd - human pKd|
5 1 Domain-level Annotation
• Site of binding is important in understanding and controlling function • often several sites within same target protein • Recently annotated binding sites (where possible) for entire ChEMBL target dictionary • used Pfam domains http://www.pfam.org Domain-level Binding Site Taxonomy
Depleted and Enriched Pfam Domains Neur_chan_memb -1.63 zf-C4 -0.94 ANF_receptor -0.88 SH2 -0.83 Pkinase_C -0.70 fn3 -0.53 SH3_1 -0.51 Lig_chan -0.50 C2 -0.50 C1_1 -0.50 Guanylate_cyc -0.46 HATPase_c -0.46 I-set -0.44 adh_short -0.39 PH -0.39 Ank -0.39 ….. Metallophos 0.35 Phospholip_A2_1 0.38 Peptidase_M10 0.41 Asp 0.45 SNF 0.48 Hist_deacetyl 0.48 Carb_anhydrase 0.50 Peptidase_C1 0.51 Trypsin 0.51 Beta-lactamase 0.57 p450 1.00 Hormone_recep 1.19 Ion_trans 1.66 Neur_chan_LBD 2.02 Pkinase_Tyr 2.12 Pkinase 5.87 7tm_1 7.30
Krueger and Overington, unpublished Now, down to work…..