Chemical Resources – ChEMBL
Anne Hersey – ChEMBL Group
EBI is an Outstation of the European Molecular Biology Laboratory. Overview of Talk
• ChEMBL Content
• ChEMBL Use Cases
• UniChem
Opportunity for hands-on use of ChEMBL tomorrow afternoon
2 What is ChEMBL • Open access database for drug discovery • Freely available (searchable and downloadable) • Content: • Bioactivity data manually extracted from the primary medicinal chemistry literature from journals such as J. Med. Chem. • Deposited data e.g. neglected disease screening, GSK kinase set • Subset of data from PubChem • Bioactivity data is associated with a biological target and a chemical structure • Compounds are stored in a structure searchable format • Protein targets are linked to protein sequences in UniProt • Updated regularly with new data • Secure searching (https://www.ebi.ac.uk/chembldb )
3 Accessing ChEMBL Data
Pipeline Pilot and knime protocols that use these webservices are available on forum pages
4 ChEMBL Database Content ChEMBL_14 Compounds: 1,213,239 Activities: 10,129,256 Publications: 46,133 Targets: 9,003
Targets 5918 proteins Compounds* organisms 517261 to 1213239 1475 1433 cell lines
•Increase of >200,000 compds from literature since ChEMBL01 •~1% overlap between ChEMBL literature and PubChem compds
5 * Includes PubChem Compounds Organisation of ChEMBL Data
Activities Compounds • Type (e.g IC50) • Values Targets • Units • Type • Sequence • Organism • Names Assays • Properties • Experimental detail
6 ChEMBL Compounds • Chemical structures in journal articles are drawn as .mol files • If the stereochemistry is known it is drawn as a specific enantiomer • Tautomers of the same compound are not treated as separate compounds. The form shown is as in the paper
• Other “rules” for standardising NO2 groups, HCl salts etc Based on FDA Substance Registration System User's Guide http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSystem- UniqueIngredientIdentifierUNII/default.htm • Identifying unique compounds is done using Standard InChI • Salts and parent molecules are grouped together for displaying bioactivity data although activity data is recorded against the specific salt
7 ChEMBL Assays – Binding, Functional, ADMET Binding Assays • Assays which directly measure the binding of a compound to a particular target (e.g., competition binding assays with a radioligand)
Functional Assays • Whole organism assays (e.g., anti-infectives/parasitics) • Cell-based assay over-expressing target (e.g., GPCR Calcium mobilisation)
ADMET Assays • Absorption, Distribution, Metabolism, Excretion,Toxicity (e.g t1/2 in rat)
8
ChEMBL Targets:
Protein Protein complex Protein family Nucleic Acid
e.g., PDE5 e.g., Nicotinic acetylcholine receptor e.g., Muscarinic receptors e.g., DNA
Cell Line Tissue Sub-cellular Fraction Organism
e.g., HEK293 cells e.g., Nervous e.g., Mitochondria e.g., Drosophila
9 ChEMBL Activities
• Bioactivity value for a particular compound in a specific assay against a specific target
e.g. data for Imatinib (Gleevec, CHEMBL941)
10 Use Cases • Use Case 1 • Interested in a specific compound (or substructure) • Which compounds are similar • Which targets do they bind to • What are their potential liabilities
• Use Case 2 • Interested in a specific target • Identify compounds that bind to that target • What data is available on these compounds • Select compounds with good potency/good drug-like properties to start synthetic project • If there is no data for compounds binding to my protein is there data for closely related members of the same family?
11 Searching by Target in ChEMBL Find target by name, BLAST search or target tree
Identify Potent Compounds IC50<10nM
12 Bioactivity Data download data
13 Compound Activity values Assay details Target details Refs structures
13 Information on a Target – Target Report Card
14 Compound Searching in ChEMBL
86 similar at >=90%
15 Also sub-structure searches Filter on Lipinski Rule of Five etc Chart View
16 Compound Search – Compound Report Card
17 Searching Assays
18 Marketed Drugs
Export to Excel Export SDF UniChem http://chembl.blogspot.com/2011/11/unichem.html
20 Multiple EBI Resources hold Compound Structure Data
- Maintaining links between DBs is a manual/time consuming for each source. - Business rules for constructing identifiers not consistent – users confused.
„10003‟ „RIBOSTAMYCIN‟
„CHEMBL221572‟
EU_OPENSCREEN „EU-?????‟
etc ? „RIO‟ ??‟
21 UniChem: a resource overlaying EBI chemistry resources
EU_OPENSCREEN etc, etc…
Uses standard InChI to compare chemical structures across databases Enables tracking of „id-to-structure‟ assignments over time All EBI DBs share the benefit of maintained links between chemistry resources
22 Additional benefits of UniChem: Maintained links to external sources are shared.
EU_OPENSCREEN etc
- All EBI DBs share the benefits of maintained links to external resources. - The „mapping service‟ will be opened for use by external users.
23 UniChem - Sources
24 24 UniChem – InChi or identifier searching
25 UniChem – InChi or Identifier Searching InChi Searching
Identifier Searching
26 UniChem – Source Mapping
27 Useful information Other ChEMBL Resources If you would like help: chembl[email protected] For ChEMBL news and data releases subscribe to: http://listserver.ebi.ac.uk/mailman/listinfo/chembl- announce
ChEMBL Paper (NAR 2011)
ChEMBL Blog: http://chembl.blogspot.com
28 Acknowledgements
ChEMBL Group John Overington Collaborators Anne Hersey Imperial Cancer Research, University of Anna Gaulton Dundee, University of Cambridge, Mark Davies Sanger Centre, University of Maryland, NCBI, TDR, IUPHAR, Bayer-Schering, Jon Chambers Pfizer, GSK, Schering-Plough, MMV, Louisa Bellis Novartis, St Jude Children‟s Research Kazuyoshi Ikeda Hospital Patricia Bento
Shaun McGlinchey EMBL-EBI colleagues Yvonne Light
Felix Krueger Former Inpharmatica colleagues Ben Stauch Ruth Akhtar Francis Atkinson Rita Santos
ChEMBL Hands-on Use Case 1 • Search for target by name (Adenosine A2a) • Select human target • Look at Target Report Card • Return to Target Search Results (back button) and tick just the human target • Filter bioactivity data for Ki<10nM • Sort the data by value • Output data to EXCEL • Select and copy a few CHEMBL_IDs • Paste into Compound Search - List Search • View all data on compounds
30 https://www.ebi.ac.uk/chembldb ChEMBL Hands-on Use Case 2 • Search by compound name (sildenafil) • Look at data on parent and salt • Look at links to crystal structure and clinical trials data • Select sildenafil structure and use as query • Go to compound search page • Do similarity search >=85% • Go to graph view of data • Filter by compound properties e.g logP<5 and Mol Wt<500 • Look at bioactivity data on new dataset
31 https://www.ebi.ac.uk/chembldb ChEMBL Hands-on
Use Case 3 • Get FASTA sequence for IRAK2 from Uniprot http://www.uniprot.org/ • Paste sequence into ChEMBL – Protein Target Search • Sort results by BLAST Score • View bioactivity data on 3 most similar proteins (IRAK1, IRAK3, IRAK4) • Look at target report card for IRAK4. Look at database links for this protein e.g. PDBe • Select the few most ligand efficient compounds that bind to this target and view the data
32 ChEMBL Hands-on
Use Case 4 • Search for assays that contain the word “Alzheimer” • Sort on activity counts • Look at document report card for assay
Use Case 5 • Browse drugs • Filter on oral • Show download to excel • Drug approvals • Show links to blog
33 https://www.ebi.ac.uk/chembldb Answers
34 Use Case 1
35 36 37 Paste CHEMBL IDs
38 39 Use Case 2
40 41 42 43 Use Case 3
44 http://www.uniprot.org Protein Sequence of Interest e.g from UniProt http://www.uniprot.org
Sort by BLAST Score
Data on IRAK1,IRAK3 and 45 IRAK4 but not IRAK2 IRAK1, IRAK3 and IRAK4 data
46 47
Use Case 4
48 Use Case 5
49 50