Linking Chemistry to Bioactivity George Papadatos, PhD Senior Technical Officer ChEMBL group [email protected] Outline • Chemistry and bioactivity resources at the EBI
• Small molecule databases • ChEBI database • ChEMBL database
• Integrated enzyme/protein search • Enzyme Portal
2 24/07/2012 Information Sources in Biotechnology ChEBI Database
3 24/07/2012 Information Sources in Biotechnology What is ChEBI? • Chemical Entities of Biological Interest • Freely available • Focused on small chemical entities • Illustrated dictionary of chemical nomenclature • High quality, manually annotated • Provides ontologies • Structural and functional
• http://www.ebi.ac.uk/chebi/
4 24/07/2012 Information Sources in Biotechnology ChEBI data overview
Nomenclature Ontology caffeine metabolite 1,3,7-trimethylxanthine CNS stimulant methyltheobromine trimethylxanthines
Chemical data Database Xrefs Formula: C8H10N4O2 MSDchem: CFF Charge: 0 KEGG DRUG: D00528 Mass: 194.19
Chemical Informatics Visualisation InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES: CN1C(=O)N(C)c2ncn(C)c2C1=O
5 24/07/2012 Information Sources in Biotechnology ChEBI home page
6 24/07/2012 Information Sources in Biotechnology ChEBI entry view
7 24/07/2012 Information Sources in Biotechnology Automatic cross-references
8 24/07/2012 Information Sources in Biotechnology The ChEBI ontology Organised into two main sub-ontologies • Molecular structure ontology • Role ontology
(R)-adrenaline
9 24/07/2012 Information Sources in Biotechnology Molecular structure ontology
10 24/07/2012 Information Sources in Biotechnology Role ontology
11 24/07/2012 Information Sources in Biotechnology Browsing the ChEBI ontologies
12 24/07/2012 Information Sources in Biotechnology Further ChEBI resources • Help & contact • [email protected] • SourceForge • https://sourceforge.net/projects/chebi/ • User Manual • http://www.ebi.ac.uk/chebi/userManualForward.do • Tutorial • https://www.ebi.ac.uk/chebi/tutorialForward.do • RSS Feed
13 24/07/2012 Information Sources in Biotechnology The ChEMBL Database
14 24/07/2012 Information Sources in Biotechnology What is ChEMBL?
• Open access database for drug discovery • Freely available – searchable and downloadable • Contents: • Bioactivity data manually extracted from the primary medicinal chemistry literature • Deposited data from neglected disease screening (e.g. Malaria) • Subset of data from PubChem • Bioactivity data is associated with a biological target and a chemical structure • Compounds are stored in a structure searchable format • Updated regularly with new data
15 24/07/2012 Information Sources in Biotechnology What is in ChEMBL?
Compounds
N H N H O N N N N H H N O >Thrombin H MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS O RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEG SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD Compound Bioactivities CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVL TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY THVFRLKKWIQKVIDQFGE Ki = 4.5 nM
SAR Data
Targets APTT = 11 min Assay
16 24/07/2012 Information Sources in Biotechnology Drug discovery process Clinical trials
Target Lead Lead Preclinical Phase I Phase II Phase III Launch Discovery Discovery Optimisation Development
•Target •Medicinal identification •High-throughput Chemistry •Microarray Screening (HTS) •Structure-based profiling •Fragment-based drug design •Toxicology Safety Indication •Target validation screening •Selectivity •In vivo safety PK •Assay screens pharmacology Efficacy & Discovery & •Focused libraries Tolerability development •Screening •ADMET screens •Formulation Efficacy expansion •Biochemistry collection •Cellular/Animal •Dose prediction •Clinical/Animal disease models disease models •Pharmacokinetics
Discovery Development Use
Medicinal chemistry SAR Clinical candidates Drugs
>1,000,000 distinct compounds ~12,000 candidates ~1,400 ~25,000 distinct lead series drugs ChEMBL database
17 24/07/2012 Information Sources in Biotechnology What is in ChEMBL?
ChEMBL13 Compounds: 1,143,682 Activities: 6,933,068 Publications: 44,682 Targets: 8,845
Targets 5674 proteins organisms 1475 1431 cell lines
18 24/07/2012 Information Sources in Biotechnology ChEMBL targets
Protein Protein complex Protein family Nucleic Acid
PDE5 Nicotinic acetylcholine receptor Muscarinic receptors DNA
Cell Line Tissue Subcellular fraction Organism
HEK293 cells Nerve Mitochondria Drosophila
19 24/07/2012 Information Sources in Biotechnology ChEMBL assays • ChEMBL contains >6.9 million data points relating compounds to targets or effects • Assays can be classified as: • binding measurements ADMET 9% • e.g. IC50, Ki
• functional assay endpoints Binding 40% • e.g. vasodilation, growth inhibition • ADMET data Functional • e.g. LD50, half-life 51%
20 24/07/2012 Information Sources in Biotechnology ChEMBL compounds
• Chemical structures extracted as .mol files • Including stereochemistry, if known • No separation of tautomers
• Other rules for standardising NO2 groups, HCl salts • Based on FDA Substance Registration System User's Guide • Uniqueness is ensured by standard InChI identifiers • Both salts and parent molecules are kept • Bioactivity is linked to salt form
21 24/07/2012 Information Sources in Biotechnology Marketed drugs
Select set of interest Export to Excel or Export SDF
22 24/07/2012 Information Sources in Biotechnology How to access ChEMBL?
1. Web interface • Intuitive and secure • Compound, assay, target search 2. SQL dumps and flat files • Oracle, MySQL, Postgresql* dumps and .sd file 3. RESTful web services • Exact, substructure & similarity search • https://www.ebi.ac.uk/chemblws/compounds/substructure/CC(=O)Oc1c cccc1C(O)=O • Bioactivities for compound, assay and target id • https://www.ebi.ac.uk/chemblws/compounds/CHEMBL25/bioactivities • https://www.ebi.ac.uk/chembldb/index.php/ws
23 24/07/2012 Information Sources in Biotechnology Other ChEMBL tools
24 24/07/2012 Information Sources in Biotechnology What do people do with ChEMBL? • Chemical space visualisation • (Q)SAR analysis • Data modeling, activity cliffs, FW, MMP analysis • Bioisosteric replacement mining • Target selection • (off-)target prediction and ADR analysis • Polypharmacology networks • Neglected tropical disease research
25 24/07/2012 Information Sources in Biotechnology ChEMBL resources
ChEMBL blog: http://chembl.blogspot.com
If you would like help: chembl[email protected] For ChEMBL news and data releases: http://listserver.ebi.ac.uk/mailman/listinfo/chembl-announce
26 24/07/2012 Information Sources in Biotechnology The ChEBI and ChEMBL databases
• Dictionary of molecular entities, focused on small • Database of bioactive, drug-like small molecules molecules • Incorporates an ontological classification • Contains 2D structures, calculated • Uses nomenclature, symbolism and terminology properties and abstracted bioactivities endorsed by international scientific bodies, such as • Curates structures from published primary IUPAC literature
27 24/07/2012 Information Sources in Biotechnology The ChEBI and ChEMBL databases
• All ChEMBL compounds have been submitted into ChEBI • Searching on either database will give you the same results • ChEBI and ChEMBL compounds link out to each other via hyperlinks • Both databases encourage user suggestions and comments on quality, errors and new features
• ChEBI is focused on the ontology of the compounds • ChEMBL is focused on the associated bioactivity data
28 24/07/2012 Information Sources in Biotechnology The Enzyme Portal
29 24/07/2012 Information Sources in Biotechnology The Enzyme portal • A one-stop shop which integrates • Protein information (UniProt) • 3D structures (PDBe) • Protein-catalyzed reactions (Rhea) • Biochemical pathways (Reactome) • Enzyme nomenclature (IntEnz) • Small molecule chemistry (ChEBI and ChEMBL) • Cofactors and reaction mechanisms (CoFactor and MACiE) • https://www.ebi.ac.uk/enzymeportal/
30 24/07/2012 Information Sources in Biotechnology Searching the portal for enzymes
31 24/07/2012 Information Sources in Biotechnology Result tabs
32 24/07/2012 Information Sources in Biotechnology Acknowledgements • ChEMBL group • ChEBI group & Enzyme Portal • John Overington • Christoph Steinbeck • Anne Hersey • Paula de Matos • Kazuyoshi Ikeda • EBI • Dominic Clark • Jennifer McDowall
• All of you for listening!
33 24/07/2012 Information Sources in Biotechnology Linking Chemistry to Bioactivity George Papadatos, PhD Senior Technical Officer ChEMBL group [email protected] Back-up slides
35 24/07/2012 Information Sources in Biotechnology Data statistics
• Focused towards compounds with drug-like properties by extraction from medicinal chemistry journals • Includes small molecules (~92%) and peptides (~7%) • Abstracted from 43,418 papers across 34 journals • 1,222,969 compound records • 1,077,189 distinct compound structures • 5,654,847 activities • Binding, functional and ADMET • 8,703 targets, incl. 5,420 protein targets and 2,442 human targets • Deposition of PubChem Substances and Bioassay assays
36 24/07/2012 Information Sources in Biotechnology