EMBL-EBI Powerpoint Presentation
Total Page:16
File Type:pdf, Size:1020Kb
Chemical Resources – ChEMBL Anne Hersey – ChEMBL Group EBI is an Outstation of the European Molecular Biology Laboratory. Overview of Talk • ChEMBL Content • ChEMBL Use Cases • UniChem Opportunity for hands-on use of ChEMBL tomorrow afternoon 2 What is ChEMBL • Open access database for drug discovery • Freely available (searchable and downloadable) • Content: • Bioactivity data manually extracted from the primary medicinal chemistry literature from journals such as J. Med. Chem. • Deposited data e.g. neglected disease screening, GSK kinase set • Subset of data from PubChem • Bioactivity data is associated with a biological target and a chemical structure • Compounds are stored in a structure searchable format • Protein targets are linked to protein sequences in UniProt • Updated regularly with new data • Secure searching (https://www.ebi.ac.uk/chembldb ) 3 Accessing ChEMBL Data Pipeline Pilot and knime protocols that use these webservices are available on forum pages 4 ChEMBL Database Content ChEMBL_14 Compounds: 1,213,239 Activities: 10,129,256 Publications: 46,133 Targets: 9,003 Targets 5918 proteins Compounds* organisms 517261 to 1213239 1475 1433 cell lines •Increase of >200,000 compds from literature since ChEMBL01 •~1% overlap between ChEMBL literature and PubChem compds 5 * Includes PubChem Compounds Organisation of ChEMBL Data Activities Compounds • Type (e.g IC50) • Values Targets • Units • Type • Sequence • Organism • Names Assays • Properties • Experimental detail 6 ChEMBL Compounds • Chemical structures in journal articles are drawn as .mol files • If the stereochemistry is known it is drawn as a specific enantiomer • Tautomers of the same compound are not treated as separate compounds. The form shown is as in the paper • Other “rules” for standardising NO2 groups, HCl salts etc Based on FDA Substance Registration System User's Guide http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSystem- UniqueIngredientIdentifierUNII/default.htm • Identifying unique compounds is done using Standard InChI • Salts and parent molecules are grouped together for displaying bioactivity data although activity data is recorded against the specific salt 7 ChEMBL Assays – Binding, Functional, ADMET Binding Assays • Assays which directly measure the binding of a compound to a particular target (e.g., competition binding assays with a radioligand) Functional Assays • Whole organism assays (e.g., anti-infectives/parasitics) • Cell-based assay over-expressing target (e.g., GPCR Calcium mobilisation) ADMET Assays • Absorption, Distribution, Metabolism, Excretion,Toxicity (e.g t1/2 in rat) 8 ChEMBL Targets: Protein Protein complex Protein family Nucleic Acid e.g., PDE5 e.g., Nicotinic acetylcholine receptor e.g., Muscarinic receptors e.g., DNA Cell Line Tissue Sub-cellular Fraction Organism e.g., HEK293 cells e.g., Nervous e.g., Mitochondria e.g., Drosophila 9 ChEMBL Activities • Bioactivity value for a particular compound in a specific assay against a specific target e.g. data for Imatinib (Gleevec, CHEMBL941) 10 Use Cases • Use Case 1 • Interested in a specific compound (or substructure) • Which compounds are similar • Which targets do they bind to • What are their potential liabilities • Use Case 2 • Interested in a specific target • Identify compounds that bind to that target • What data is available on these compounds • Select compounds with good potency/good drug-like properties to start synthetic project • If there is no data for compounds binding to my protein is there data for closely related members of the same family? 11 Searching by Target in ChEMBL Find target by name, BLAST search or target tree Identify Potent Compounds IC50<10nM 12 Bioactivity Data download data 13 Compound Activity values Assay details Target details Refs structures 13 Information on a Target – Target Report Card 14 Compound Searching in ChEMBL 86 similar at >=90% 15 Also sub-structure searches Filter on Lipinski Rule of Five etc Chart View 16 Compound Search – Compound Report Card 17 Searching Assays 18 Marketed Drugs Export to Excel Export SDF UniChem http://chembl.blogspot.com/2011/11/unichem.html 20 Multiple EBI Resources hold Compound Structure Data - Maintaining links between DBs is a manual/time consuming for each source. - Business rules for constructing identifiers not consistent – users confused. „10003‟ „RIBOSTAMYCIN‟ „CHEMBL221572‟ EU_OPENSCREEN „EU-?????‟ etc ? „RIO‟ ??‟ 21 UniChem: a resource overlaying EBI chemistry resources EU_OPENSCREEN etc, etc… Uses standard InChI to compare chemical structures across databases Enables tracking of „id-to-structure‟ assignments over time All EBI DBs share the benefit of maintained links between chemistry resources 22 Additional benefits of UniChem: Maintained links to external sources are shared. EU_OPENSCREEN etc - All EBI DBs share the benefits of maintained links to external resources. - The „mapping service‟ will be opened for use by external users. 23 UniChem - Sources 24 24 UniChem – InChi or identifier searching 25 UniChem – InChi or Identifier Searching InChi Searching Identifier Searching 26 UniChem – Source Mapping 27 Useful information Other ChEMBL Resources If you would like help: [email protected] For ChEMBL news and data releases subscribe to: http://listserver.ebi.ac.uk/mailman/listinfo/chembl- announce ChEMBL Paper (NAR 2011) ChEMBL Blog: http://chembl.blogspot.com 28 Acknowledgements ChEMBL Group John Overington Collaborators Anne Hersey Imperial Cancer Research, University of Anna Gaulton Dundee, University of Cambridge, Mark Davies Sanger Centre, University of Maryland, NCBI, TDR, IUPHAR, Bayer-Schering, Jon Chambers Pfizer, GSK, Schering-Plough, MMV, Louisa Bellis Novartis, St Jude Children‟s Research Kazuyoshi Ikeda Hospital Patricia Bento Shaun McGlinchey EMBL-EBI colleagues Yvonne Light Felix Krueger Former Inpharmatica colleagues Ben Stauch Ruth Akhtar Francis Atkinson Rita Santos ChEMBL Hands-on Use Case 1 • Search for target by name (Adenosine A2a) • Select human target • Look at Target Report Card • Return to Target Search Results (back button) and tick just the human target • Filter bioactivity data for Ki<10nM • Sort the data by value • Output data to EXCEL • Select and copy a few CHEMBL_IDs • Paste into Compound Search - List Search • View all data on compounds 30 https://www.ebi.ac.uk/chembldb ChEMBL Hands-on Use Case 2 • Search by compound name (sildenafil) • Look at data on parent and salt • Look at links to crystal structure and clinical trials data • Select sildenafil structure and use as query • Go to compound search page • Do similarity search >=85% • Go to graph view of data • Filter by compound properties e.g logP<5 and Mol Wt<500 • Look at bioactivity data on new dataset 31 https://www.ebi.ac.uk/chembldb ChEMBL Hands-on Use Case 3 • Get FASTA sequence for IRAK2 from Uniprot http://www.uniprot.org/ • Paste sequence into ChEMBL – Protein Target Search • Sort results by BLAST Score • View bioactivity data on 3 most similar proteins (IRAK1, IRAK3, IRAK4) • Look at target report card for IRAK4. Look at database links for this protein e.g. PDBe • Select the few most ligand efficient compounds that bind to this target and view the data 32 ChEMBL Hands-on Use Case 4 • Search for assays that contain the word “Alzheimer” • Sort on activity counts • Look at document report card for assay Use Case 5 • Browse drugs • Filter on oral • Show download to excel • Drug approvals • Show links to blog 33 https://www.ebi.ac.uk/chembldb Answers 34 Use Case 1 35 36 37 Paste CHEMBL IDs 38 39 Use Case 2 40 41 42 43 Use Case 3 44 http://www.uniprot.org Protein Sequence of Interest e.g from UniProt http://www.uniprot.org Sort by BLAST Score Data on IRAK1,IRAK3 and 45 IRAK4 but not IRAK2 IRAK1, IRAK3 and IRAK4 data 46 47 Use Case 4 48 Use Case 5 49 50 .