Linking Chemistry to Bioactivity George Papadatos, PhD Senior Technical Officer ChEMBL group [email protected] Outline • Chemistry and bioactivity resources at the EBI

• Small databases • ChEBI database • ChEMBL database

• Integrated enzyme/ search • Enzyme Portal

2 24/07/2012 Information Sources in Biotechnology ChEBI Database

3 24/07/2012 Information Sources in Biotechnology What is ChEBI? • Chemical Entities of Biological Interest • Freely available • Focused on small chemical entities • Illustrated dictionary of chemical nomenclature • High quality, manually annotated • Provides ontologies • Structural and functional

• http://www.ebi.ac.uk/chebi/

4 24/07/2012 Information Sources in Biotechnology ChEBI data overview

Nomenclature Ontology metabolite 1,3,7-trimethylxanthine CNS stimulant methyltheobromine trimethylxanthines

Chemical data Database Xrefs Formula: C8H10N4O2 MSDchem: CFF Charge: 0 KEGG DRUG: D00528 Mass: 194.19

Chemical Informatics Visualisation InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES: CN1C(=O)N(C)c2ncn(C)c2C1=O

5 24/07/2012 Information Sources in Biotechnology ChEBI home page

6 24/07/2012 Information Sources in Biotechnology ChEBI entry view

7 24/07/2012 Information Sources in Biotechnology Automatic cross-references

8 24/07/2012 Information Sources in Biotechnology The ChEBI ontology Organised into two main sub-ontologies • Molecular structure ontology • Role ontology

(R)-

9 24/07/2012 Information Sources in Biotechnology Molecular structure ontology

10 24/07/2012 Information Sources in Biotechnology Role ontology

11 24/07/2012 Information Sources in Biotechnology Browsing the ChEBI ontologies

12 24/07/2012 Information Sources in Biotechnology Further ChEBI resources • Help & contact • [email protected] • SourceForge • https://sourceforge.net/projects/chebi/ • User Manual • http://www.ebi.ac.uk/chebi/userManualForward.do • Tutorial • https://www.ebi.ac.uk/chebi/tutorialForward.do • RSS Feed

13 24/07/2012 Information Sources in Biotechnology The ChEMBL Database

14 24/07/2012 Information Sources in Biotechnology What is ChEMBL?

• Open access database for drug discovery • Freely available – searchable and downloadable • Contents: • Bioactivity data manually extracted from the primary medicinal chemistry literature • Deposited data from neglected disease screening (e.g. Malaria) • Subset of data from PubChem • Bioactivity data is associated with a biological target and a chemical structure • Compounds are stored in a structure searchable format • Updated regularly with new data

15 24/07/2012 Information Sources in Biotechnology What is in ChEMBL?

Compounds

N H N H O N N N N H H N O >Thrombin H MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS O RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEG SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD Compound Bioactivities CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVL TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY THVFRLKKWIQKVIDQFGE Ki = 4.5 nM

SAR Data

Targets APTT = 11 min Assay

16 24/07/2012 Information Sources in Biotechnology Drug discovery process Clinical trials

Target Lead Lead Preclinical Phase I Phase II Phase III Launch Discovery Discovery Optimisation Development

•Target •Medicinal identification •High-throughput Chemistry •Microarray Screening (HTS) •Structure-based profiling •Fragment-based drug design •Toxicology Safety Indication •Target validation screening •Selectivity •In vivo safety PK •Assay screens pharmacology Efficacy & Discovery & •Focused libraries Tolerability development •Screening •ADMET screens •Formulation Efficacy expansion •Biochemistry collection •Cellular/Animal •Dose prediction •Clinical/Animal disease models disease models •Pharmacokinetics

Discovery Development Use

Medicinal chemistry SAR Clinical candidates Drugs

>1,000,000 distinct compounds ~12,000 candidates ~1,400 ~25,000 distinct lead series drugs ChEMBL database

17 24/07/2012 Information Sources in Biotechnology What is in ChEMBL?

ChEMBL13 Compounds: 1,143,682 Activities: 6,933,068 Publications: 44,682 Targets: 8,845

Targets 5674 organisms 1475 1431 cell lines

18 24/07/2012 Information Sources in Biotechnology ChEMBL targets

Protein Protein complex Protein family Nucleic Acid

PDE5 Nicotinic receptor Muscarinic receptors DNA

Cell Line Tissue Subcellular fraction Organism

HEK293 cells Nerve Mitochondria Drosophila

19 24/07/2012 Information Sources in Biotechnology ChEMBL assays • ChEMBL contains >6.9 million data points relating compounds to targets or effects • Assays can be classified as: • binding measurements ADMET 9% • e.g. IC50, Ki

• functional assay endpoints Binding 40% • e.g. vasodilation, growth inhibition • ADMET data Functional • e.g. LD50, half-life 51%

20 24/07/2012 Information Sources in Biotechnology ChEMBL compounds

• Chemical structures extracted as .mol files • Including stereochemistry, if known • No separation of tautomers

• Other rules for standardising NO2 groups, HCl salts • Based on FDA Substance Registration System User's Guide • Uniqueness is ensured by standard InChI identifiers • Both salts and parent are kept • Bioactivity is linked to salt form

21 24/07/2012 Information Sources in Biotechnology Marketed drugs

Select set of interest Export to Excel or Export SDF

22 24/07/2012 Information Sources in Biotechnology How to access ChEMBL?

1. Web interface • Intuitive and secure • Compound, assay, target search 2. SQL dumps and flat files • Oracle, MySQL, Postgresql* dumps and .sd file 3. RESTful web services • Exact, substructure & similarity search • https://www.ebi.ac.uk/chemblws/compounds/substructure/CC(=O)Oc1c cccc1C(O)=O • Bioactivities for compound, assay and target id • https://www.ebi.ac.uk/chemblws/compounds/CHEMBL25/bioactivities • https://www.ebi.ac.uk/chembldb/index.php/ws

23 24/07/2012 Information Sources in Biotechnology Other ChEMBL tools

24 24/07/2012 Information Sources in Biotechnology What do people do with ChEMBL? • Chemical space visualisation • (Q)SAR analysis • Data modeling, activity cliffs, FW, MMP analysis • Bioisosteric replacement mining • Target selection • (off-)target prediction and ADR analysis • Polypharmacology networks • Neglected tropical disease research

25 24/07/2012 Information Sources in Biotechnology ChEMBL resources

ChEMBL blog: http://chembl.blogspot.com

If you would like help: [email protected] For ChEMBL news and data releases: http://listserver.ebi.ac.uk/mailman/listinfo/chembl-announce

26 24/07/2012 Information Sources in Biotechnology The ChEBI and ChEMBL databases

• Dictionary of molecular entities, focused on small • Database of bioactive, drug-like small molecules molecules • Incorporates an ontological classification • Contains 2D structures, calculated • Uses nomenclature, symbolism and terminology properties and abstracted bioactivities endorsed by international scientific bodies, such as • Curates structures from published primary IUPAC literature

27 24/07/2012 Information Sources in Biotechnology The ChEBI and ChEMBL databases

• All ChEMBL compounds have been submitted into ChEBI • Searching on either database will give you the same results • ChEBI and ChEMBL compounds link out to each other via hyperlinks • Both databases encourage user suggestions and comments on quality, errors and new features

• ChEBI is focused on the ontology of the compounds • ChEMBL is focused on the associated bioactivity data

28 24/07/2012 Information Sources in Biotechnology The Enzyme Portal

29 24/07/2012 Information Sources in Biotechnology The Enzyme portal • A one-stop shop which integrates • Protein information (UniProt) • 3D structures (PDBe) • Protein-catalyzed reactions (Rhea) • Biochemical pathways (Reactome) • Enzyme nomenclature (IntEnz) • Small molecule chemistry (ChEBI and ChEMBL) • Cofactors and reaction mechanisms (CoFactor and MACiE) • https://www.ebi.ac.uk/enzymeportal/

30 24/07/2012 Information Sources in Biotechnology Searching the portal for enzymes

31 24/07/2012 Information Sources in Biotechnology Result tabs

32 24/07/2012 Information Sources in Biotechnology Acknowledgements • ChEMBL group • ChEBI group & Enzyme Portal • John Overington • Christoph Steinbeck • Anne Hersey • Paula de Matos • Kazuyoshi Ikeda • EBI • Dominic Clark • Jennifer McDowall

• All of you for listening!

33 24/07/2012 Information Sources in Biotechnology Linking Chemistry to Bioactivity George Papadatos, PhD Senior Technical Officer ChEMBL group [email protected] Back-up slides

35 24/07/2012 Information Sources in Biotechnology Data statistics

• Focused towards compounds with drug-like properties by extraction from medicinal chemistry journals • Includes small molecules (~92%) and (~7%) • Abstracted from 43,418 papers across 34 journals • 1,222,969 compound records • 1,077,189 distinct compound structures • 5,654,847 activities • Binding, functional and ADMET • 8,703 targets, incl. 5,420 protein targets and 2,442 human targets • Deposition of PubChem Substances and Bioassay assays

36 24/07/2012 Information Sources in Biotechnology