<<

EMBL-EBI/ Course: Resources for Computational Drug Discovery

John P. Overington Welcome!

• Introductions • Overview of course material • Why ChEMBL had to be built!

• The more questions you ask the more you’ll learn what you want to know • Your feedback is important to us Project Idea

Phenotypic Target-based assay assay

Previously Novel screened

Unknown Known/similar Known/similar target structure target structure ligands

In-house compound Commercial Virtual collection compounds compounds HTS Screen Purchase of specific Commission compounds synthesis

Drug Overview of Material

• Monday pm • Important safety stuff • Meandering intro by jpo • Perspectives and Challenges in Drug Discovery – Darren Green (GSK) • Delegate presentations – Tell us what you’re interested in, what you expect, and we’ll see what we can do Overview of Material

• Tuesday • Target Analysis and Selection – Mike Barnes (Queen Mary College), Bissan Al- Lazikani (Institute of Cancer Research) and Anna Gaulton (EMBL-EBI) – Pathways, druggability, , disease linkage, target annotation, prioritisation. – Hands on sessions Overview of Material

• Wednesday • Databases and resources for Compound Selection – Anne Hersey (EMBL-EBI), John Irwin (UCSF), Noel O’Boyle (NextMove) – ChEMBL, PubChem, ZINC, searching, complexities of compound structure representation – Software tools – Hands on session Overview of Material

• Thursday • Computational Chemistry – Val Gillet (Sheffield), Francis Atkinson (EMBL-EBI), George Papadatos (EMBL-EBI), Gary Battle (EMBL- EBI) – Chemoinformatics in compound design, structure- based design, structure, target profiling – Software tools - KNIME – Hands-on Overview of Material

• Friday Morning – Drug Repurposing – John Overington (EMBL-EBI) – Methods and Resources, Tactics for repurposing, Product Profile – Hands on session What if things go wrong?

• If you get lost, need information of transport, miss a bus, lose your computer, are arrested, need a recommendation for a nice pub, want to buy an animal to take home,….. • Tom Hancocks during the course itself • For medical emergencies – Telephone 999 • For everything else – Telephone 07557-767072

Mapping Drug and Medicinal Chemistry Space – The ChEMBL Database

John P. Overington EMBL-EBI

[email protected] Our Strategy and Hopes

• Comprehensively catalogue historical drug discovery • Include successes and failures • Large scale abstraction curation of primary literature • Direct depositions • Drugs can be small molecules, peptides, recombinant , siRNA, cells, viruses, etc. • ‘Learn’ rules for drug discovery ‘success’ • Target selection and prioritisation - druggability • Lead discovery, optimisation, clinical candidate selection • Develop approaches to new target classes – e.g. PPIs • Drug combinations to improve safety and variability

Elements in Bioactive Molecules

H C C N O F S Cl Organic Chemistry

• Carbon-based chemistry – the chemistry of life – Natural - , , b-carotene…

CH4 CO2

– Synthetic - Armodafinil, How Big Is Chemical Space?

• GDB databases from Jean-Louis Reymond, University of Berne, Switzerland • GDB-13 database – Small organic molecules up to 13 atoms of C, N, O, S and Cl – Simple chemical stability and synthetic feasibility rules – 977,468,314 structures – GDB-13 is the largest publicly available small organic molecule database ADMET - The Rule of Five

• To be bioactive, a molecule needs to access its ‘target’ • ADMET – Adsorption, Distribution, Metabolism, Excretion and Toxicity – The body has evolved to be really choosy over the types of molecule it lets in and tolerates • Chris Lipinski while at Pfizer uncovered the Rule of Five.. – ..proposed that an organic small molecule was likely to have poor oral drug properties if • >500 Molecular Weight (size of molecule) • >5 logP (greasiness of molecule) • >5 Hydrogen bond donors (polarity of molecule) • >10 Ns and Os (polarity of molecule) – Topical and parenteral dosed drugs are different How Big is Bioactive Chemical Space?

Likely to be ~1019 organic small molecules obeying Lipinski’s Rule of Five How Many Biological Targets Are There?

• Genes within the genome encode “target’ proteins – Bioactive molecules usually interact with proteins • Typical gene numbers in important genomes – fX174 Phage (a virus that infects bacteria) 11 – Escherichia coli (a bacteria) 4,377 – Plasmodium falciparum (the malaria parasite) 5,268 – Drosophila melanogaster (a fruit fly) ~17,000 – Homo sapiens ~21,000

Chemical and Biological Spaces

• ~10,000,000,000,000,000,000 potential small bioactive molecules – 1 g sample of each would use all the Earth’s carbon • ~100,000 potential relevant biological receptors – Humans and pathogens Presented to P&G, Cincinnati, April 2005, © 2005 Inpharmatica Ltd. Screened Screened Drug All reasonable All Exploration of bioactivity space at genomic scalespace genomic at of bioactivityExploration targets proteins 10 Proteins 2

10 Structure Structure (SAR) Activity Relationship 3

10 6

Chemogenomics Drugs Drugs 10 Drugs

ChEMBL 3 Screened

molecules 10 molecules

7 - 8

molecules reasonable All

10

20

Clustering and Families

• Small molecules and targets can be organized into ‘families’ • – this feature greatly helps analysis and data mining

Sets of related targets Sets of related small molecules Bio- and Chemoinformatics

largely to comparison of 1-D ‘strings’ – nucleic acid and protein sequences – Alignment, searching, mapping,…. • Chemoinformatics is more complex – Chemical structures are 2-D – more difficult to represent on computers – Alignment, searching are still areas of active research – Mapping is largely solved by the InChI – Open Standard and Software for unambiguous representation of chemical structures • originally developed by NIST

Aspirin InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12) Chemical Space

All compounds Drug-like compounds Available compounds

Only certain molecules have features consistent with good pharmacological properties Target Space

All targets Druggable targets Available targets

Only certain targets have binding sites capable of ligand efficient binding of drug-like ligands Accessible Pharmacological Space

Drug-like compounds but no complementary targets

Druggable targets Available and complementary compounds for compounds Druggable targets target but non-drug- but no like complementary compounds Drug Optimisation triazole

1st generation 2nd generation 3rd generation 4th generation Prototype N N O + N N O + N N O O O O S

O 1962 Tinidazole 1970 1980

N 2005 N

N Cl O + 1984 N N Clotrimazole 1970 1978 O Azomycin N N (1956) Cl N Cl N Cl Cl Streptomyces Cl O natural product Cl S trichomonacidal Cl Sulconazole 1980 ‘toxic’ 1970 N N Voriconazole 2002 Cl N N Cl

Cl O 1972 1988 Bifonazole 1981 Fosfluconazole 2004 After W. Sneader Drug Discovery

Target Lead Lead Preclinical Phase 1 Phase 2 Phase 3 Launch Discovery Discovery Optimisation Development (Phase 4)

• Target identification • Medicinal • Microarray • High-throughput Chemistry profiling Screening (HTS) • Structure-based Indication • Target • Toxicology Safety • Fragment-based drug design PK discovery, validation • In vivo safety Efficacy & screening • Selectivity screens tolerability repurposing • Assay pharmacology Efficacy • Focused libraries • ADMET screens & expansion development • Formulation •Screening • Cellular/Animal • Biochemistry • Dose prediction collection disease models • Clinical/Animal • Pharmacokinetics disease models

Discovery Development Use Med. Chem. SAR Clinical Candidates Drugs

>1,290,000 compound records ChEMBL >6,900,000 bioactivities ~12,000 clinical ~1,600 ~44,500 abstracted papers candidates drugs content ~8,000 targets Targets of Launched Drugs

Overington et al, Nat. Rev. Drug Disc., 5, pp. 993-996 (2006) Different Types of Drugs E. Lounkine et al, Nature (2012) NFκB Pathway Drug Approvals FDA Approved Drugs Affinity of Drugs for their‘Targets’

Ki, Kd, IC50, EC50, & pA2 endpoints for drugs against their‘efficacy targets’

400

350

300

250

200 Frequency 150

100

50

0 2 3 4 5 6 7 8 9 10 11 12

-log10 affinity

10mM 1mM 100mM 10mM 1mM 100nM 10nM 1nM 100pM 10pM 1pM

Overington, et al, Nature Rev. Drug Discov. 5 pp. 993-996 (2006) Gleeson et al, Nature Rev. Drug Discov. 10 pp. 197-208 (2011) Clinical Candidates

• Database of clinical development candidates – Contains ~12,000 2-D structures/sequences • Estimated size ~35-45,000 compounds – Work in progress • Deeper coverage of key gene families • e.g. Protein kinases, 361 distinct clinical candidates Pharma Industry Productivity

File Registration number vs. USAN date 800,000 Phase 2b date

700,000

~Discovery date 600,000

500,000

400,000

300,000

200,000

100,000

0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

Overington, unpublished Pharma Industry Productivity 16 Drugs/100,000 compounds Large Pharma needs to 64 USANs/100,000 compounds 70 synthesise and test ~250,000

60 compounds for each drug

50

40

30 0.4 Drugs/100,000 compounds 20 1.9 USANs/100,000 compounds

10

0 1- 100,001- 200,001- 300,001- 400,001- 500,001- 600,001- 700,001, 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 File registration number range

Overington, unpublished Clinical Candidates What Is the ChEMBL Data? What Is the ChEMBL Data?

Compound >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGInhibition of SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD K =4.5 nM CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLhuman Thrombin i TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY THVFRLKKWIQKVIDQFGE SAR Data

PTT (partial ED =230 nM thromboplastin 2

time) Assay ChEMBL Target Types

Protein Protein complex Nucleic Acid

e.g. DNA e.g. PDE5 e.g. Nicotinic receptor e.g. Muscarinic receptors

Cell line Tissue Sub-cellular fraction Organism

e.g. HEK293 cells e.g. Trachea e.g. Mitochondria e.g. Drosophila A. Gaulton, L. Bellis, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, R. Akhtar, F. Atkinson, A.P. Bento, B. Al-Lazikani, D. Michalovich, & J.P. Overington (2011) ‘ChEMBL: A Large-scale Bioactivity Database For Chemical Biology And Drug Discovery’ Nucl. Acids Res. Database Issue. 40 D1100-1107 DOI:10.1093/nar/gkr777 http://www.ebi.ac.uk/chembl

43 Compound Searching

4 4 Spreadsheet Views

4 5 Ligand Efficiency

4 6 Target Class Data Assay Organism Data F.A. Krüger & J.P. Overington (2012) ‘Global analysis of small molecule binding to related protein targets’ PLoS Comp. Biol. 8, e1002333

Differences Between Human And Rat Orthologs

Distribution of affinity differences

Human vs Rat

Rat

d

pK density

pKd Human

|human pKd - rat pKd|

-log(Kd) Human 5 0 Differences Between Different Assays

Distribution of inter-assay affinity differences Binding affinity in human and

rat assays

Assay2

d

density pK

pKd Assay1 |human pKd - human pKd|

5 1 Domain-level Annotation

• Site of binding is important in understanding and controlling function • often several sites within same target protein • Recently annotated binding sites (where possible) for entire ChEMBL target dictionary • used domains http://www.pfam.org Domain-level Binding Site Taxonomy

Depleted and Enriched Pfam Domains Neur_chan_memb -1.63 zf-C4 -0.94 ANF_receptor -0.88 SH2 -0.83 Pkinase_C -0.70 fn3 -0.53 SH3_1 -0.51 Lig_chan -0.50 C2 -0.50 C1_1 -0.50 Guanylate_cyc -0.46 HATPase_c -0.46 I-set -0.44 adh_short -0.39 PH -0.39 Ank -0.39 ….. Metallophos 0.35 Phospholip_A2_1 0.38 Peptidase_M10 0.41 Asp 0.45 SNF 0.48 Hist_deacetyl 0.48 Carb_anhydrase 0.50 Peptidase_C1 0.51 0.51 Beta-lactamase 0.57 p450 1.00 Hormone_recep 1.19 Ion_trans 1.66 Neur_chan_LBD 2.02 Pkinase_Tyr 2.12 Pkinase 5.87 7tm_1 7.30

Krueger and Overington, unpublished Now, down to work…..