EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery

EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery John P. Overington Welcome! • Introductions • Overview of course material • Why ChEMBL had to be built! • The more questions you ask the more you’ll learn what you want to know • Your feedback is important to us Project Idea Phenotypic Target-based assay assay Previously Novel screened Unknown Known/similar Known/similar target structure target structure ligands In-house compound Commercial Virtual collection compounds compounds HTS Screen Purchase of specific Commission compounds synthesis Drug Overview of Material • Monday pm • Important safety stuff • Meandering intro by jpo • Perspectives and Challenges in Drug Discovery – Darren Green (GSK) • Delegate presentations – Tell us what you’re interested in, what you expect, and we’ll see what we can do Overview of Material • Tuesday • Target Analysis and Selection – Mike Barnes (Queen Mary College), Bissan Al- Lazikani (Institute of Cancer Research) and Anna Gaulton (EMBL-EBI) – Pathways, druggability, genetics, disease linkage, target annotation, prioritisation. – Hands on sessions Overview of Material • Wednesday • Databases and resources for Compound Selection – Anne Hersey (EMBL-EBI), John Irwin (UCSF), Noel O’Boyle (NextMove) – ChEMBL, PubChem, ZINC, searching, complexities of compound structure representation – Software tools – Hands on session Overview of Material • Thursday • Computational Chemistry – Val Gillet (Sheffield), Francis Atkinson (EMBL-EBI), George Papadatos (EMBL-EBI), Gary Battle (EMBL- EBI) – Chemoinformatics in compound design, structure- based design, protein structure, target profiling – Software tools - KNIME – Hands-on Overview of Material • Friday Morning – Drug Repurposing – John Overington (EMBL-EBI) – Methods and Resources, Tactics for repurposing, Product Profile – Hands on session What if things go wrong? • If you get lost, need information of transport, miss a bus, lose your computer, are arrested, need a recommendation for a nice pub, want to buy an animal to take home,….. • Tom Hancocks during the course itself • For medical emergencies – Telephone 999 • For everything else – Telephone 07557-767072 Mapping Drug and Medicinal Chemistry Space – The ChEMBL Database John P. Overington EMBL-EBI [email protected] Our Strategy and Hopes • Comprehensively catalogue historical drug discovery • Include successes and failures • Large scale abstraction curation of primary literature • Direct depositions • Drugs can be small molecules, peptides, recombinant proteins, siRNA, cells, viruses, etc. • ‘Learn’ rules for drug discovery ‘success’ • Target selection and prioritisation - druggability • Lead discovery, optimisation, clinical candidate selection • Develop approaches to new target classes – e.g. PPIs • Drug combinations to improve safety and variability Elements in Bioactive Molecules H C C N O F S Cl Organic Chemistry • Carbon-based chemistry – the chemistry of life – Natural - Methane, Carbon Dioxide, b-carotene… CH4 CO2 – Synthetic - Armodafinil, Atorvastatin How Big Is Chemical Space? • GDB databases from Jean-Louis Reymond, University of Berne, Switzerland • GDB-13 database – Small organic molecules up to 13 atoms of C, N, O, S and Cl – Simple chemical stability and synthetic feasibility rules – 977,468,314 structures – GDB-13 is the largest publicly available small organic molecule database ADMET - The Rule of Five • To be bioactive, a molecule needs to access its ‘target’ • ADMET – Adsorption, Distribution, Metabolism, Excretion and Toxicity – The body has evolved to be really choosy over the types of molecule it lets in and tolerates • Chris Lipinski while at Pfizer uncovered the Rule of Five.. – ..proposed that an organic small molecule was likely to have poor oral drug properties if • >500 Molecular Weight (size of molecule) • >5 logP (greasiness of molecule) • >5 Hydrogen bond donors (polarity of molecule) • >10 Ns and Os (polarity of molecule) – Topical and parenteral dosed drugs are different How Big is Bioactive Chemical Space? Likely to be ~1019 organic small molecules obeying Lipinski’s Rule of Five How Many Biological Targets Are There? • Genes within the genome encode “target’ proteins – Bioactive molecules usually interact with proteins • Typical gene numbers in important genomes – fX174 Phage (a virus that infects bacteria) 11 – Escherichia coli (a bacteria) 4,377 – Plasmodium falciparum (the malaria parasite) 5,268 – Drosophila melanogaster (a fruit fly) ~17,000 – Homo sapiens ~21,000 Chemical and Biological Spaces • ~10,000,000,000,000,000,000 potential small bioactive molecules – 1 g sample of each would use all the Earth’s carbon • ~100,000 potential relevant biological receptors – Humans and pathogens Chemogenomics Exploration of bioactivity space at genomic scale Structure Activity Relationship (SAR) Drugs 103 Screened molecules 107-8 All reasonable molecules 1020 Drug targets 102 Drugs Screened proteins ChEMBL 103 All reasonable Proteins 106 Presented to P&G, Cincinnati, April 2005, © 2005 Inpharmatica Ltd. Inpharmatica © 2005 Cincinnati, April 2005, P&G, to Presented Clustering and Families • Small molecules and targets can be organized into ‘families’ • – this feature greatly helps analysis and data mining Sets of related targets Sets of related small molecules Bio- and Chemoinformatics • Bioinformatics largely to comparison of 1-D ‘strings’ – nucleic acid and protein sequences – Alignment, searching, mapping,…. • Chemoinformatics is more complex – Chemical structures are 2-D – more difficult to represent on computers – Alignment, searching are still areas of active research – Mapping is largely solved by the InChI – Open Standard and Software for unambiguous representation of chemical structures • originally developed by NIST Aspirin InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12) Chemical Space All compounds Drug-like compounds Available compounds Only certain molecules have features consistent with good pharmacological properties Target Space All targets Druggable targets Available targets Only certain targets have binding sites capable of ligand efficient binding of drug-like ligands Accessible Pharmacological Space Drug-like compounds but no complementary targets Druggable targets Available and complementary compounds for compounds Druggable targets target but non-drug- but no like complementary compounds Drug Optimisation Imidazole triazole 1st generation 2nd generation 3rd generation 4th generation Prototype N N O + N N O + N N O O O O S O Metronidazole 1962 Tinidazole 1970 Terconazole 1980 N Posaconazole 2005 N N Cl O + Itraconazole 1984 N N Clotrimazole 1970 Ketoconazole 1978 O Azomycin N N (1956) Cl N Cl N Cl Cl Streptomyces Cl O natural product Cl S trichomonacidal Cl Sulconazole 1980 ‘toxic’ Miconazole 1970 N N Voriconazole 2002 Cl N N Cl Cl O Econazole 1972 Fluconazole 1988 Bifonazole 1981 Fosfluconazole 2004 After W. Sneader Drug Discovery Target Lead Lead Preclinical Phase 1 Phase 2 Phase 3 Launch Discovery Discovery Optimisation Development (Phase 4) • Target identification • Medicinal • Microarray • High-throughput Chemistry profiling Screening (HTS) • Structure-based Indication • Target • Toxicology Safety • Fragment-based drug design PK discovery, validation • In vivo safety Efficacy & screening • Selectivity screens tolerability repurposing • Assay pharmacology Efficacy • Focused libraries • ADMET screens & expansion development • Formulation •Screening • Cellular/Animal • Biochemistry • Dose prediction collection disease models • Clinical/Animal • Pharmacokinetics disease models Discovery Development Use Med. Chem. SAR Clinical Candidates Drugs >1,290,000 compound records ChEMBL >6,900,000 bioactivities ~12,000 clinical ~1,600 ~44,500 abstracted papers candidates drugs content ~8,000 targets Targets of Launched Drugs Overington et al, Nat. Rev. Drug Disc., 5, pp. 993-996 (2006) Different Types of Drugs E. Lounkine et al, Nature (2012) NFκB Pathway Drug Approvals FDA Approved Drugs Affinity of Drugs for their‘Targets’ Ki, Kd, IC50, EC50, & pA2 endpoints for drugs against their‘efficacy targets’ 400 350 300 250 200 Frequency 150 100 50 0 2 3 4 5 6 7 8 9 10 11 12 -log10 affinity 10mM 1mM 100mM 10mM 1mM 100nM 10nM 1nM 100pM 10pM 1pM Overington, et al, Nature Rev. Drug Discov. 5 pp. 993-996 (2006) Gleeson et al, Nature Rev. Drug Discov. 10 pp. 197-208 (2011) Clinical Candidates • Database of clinical development candidates – Contains ~12,000 2-D structures/sequences • Estimated size ~35-45,000 compounds – Work in progress • Deeper coverage of key gene families • e.g. Protein kinases, 361 distinct clinical candidates Pharma Industry Productivity File Registration number vs. USAN date 800,000 Phase 2b date 700,000 ~Discovery date 600,000 500,000 400,000 300,000 200,000 100,000 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Overington, unpublished Pharma Industry Productivity 16 Drugs/100,000 compounds Large Pharma needs to 64 USANs/100,000 compounds 70 synthesise and test ~250,000 60 compounds for each drug 50 40 30 0.4 Drugs/100,000 compounds 20 1.9 USANs/100,000 compounds 10 0 1- 100,001- 200,001- 300,001- 400,001- 500,001- 600,001- 700,001, 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 File registration number range Overington, unpublished Clinical Candidates What Is the ChEMBL Data? What Is the ChEMBL Data? Compound >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGInhibition

EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery

Stem Strategy

Jeremy Farrar

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

Cryptic Inoviruses Revealed As Pervasive in Bacteria and Archaea Across Earth’S Biomes

Learning Protein Constitutive Motifs from Sequence Data Je´ Roˆ Me Tubiana, Simona Cocco, Re´ Mi Monasson*

DECIPHER: Harnessing Local Sequence Context to Improve Protein Multiple Sequence Alignment Erik S

Evidence Synthesis on the EU-UK Relationship on Research and Innovation January 2018

Janet Thornton / 19 July 2018

1 Codon-Level Information Improves Predictions of Inter-Residue Contacts in Proteins 2 by Correlated Mutation Analysis 3

Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I

EMBL-EBI Now and in the Future

Francis Crick Institute-CS-JB080719.Indd