<<

Chemical Resources – ChEMBL

Anne Hersey – ChEMBL Group

EBI is an Outstation of the European Molecular Biology Laboratory. Overview of Talk

• ChEMBL Content

• ChEMBL Use Cases

• UniChem

Opportunity for hands-on use of ChEMBL tomorrow afternoon

2 What is ChEMBL • database for drug discovery • Freely available (searchable and downloadable) • Content: • Bioactivity data manually extracted from the primary medicinal literature from journals such as J. Med. Chem. • Deposited data e.g. neglected disease screening, GSK kinase set • Subset of data from PubChem • Bioactivity data is associated with a biological target and a chemical structure • Compounds are stored in a structure searchable format • Protein targets are linked to protein sequences in UniProt • Updated regularly with new data • Secure searching (https://www.ebi.ac.uk/chembldb )

3 Accessing ChEMBL Data

Pipeline Pilot and knime protocols that use these webservices are available on forum pages

4 ChEMBL Database Content ChEMBL_14 Compounds: 1,213,239 Activities: 10,129,256 Publications: 46,133 Targets: 9,003

Targets 5918 proteins Compounds* organisms 517261 to 1213239 1475 1433 cell lines

•Increase of >200,000 compds from literature since ChEMBL01 •~1% overlap between ChEMBL literature and PubChem compds

5 * Includes PubChem Compounds Organisation of ChEMBL Data

Activities Compounds • Type (e.g IC50) • Values Targets • Units • Type • Sequence • Organism • Names Assays • Properties • Experimental detail

6 ChEMBL Compounds • Chemical structures in journal articles are drawn as .mol files • If the stereochemistry is known it is drawn as a specific enantiomer • Tautomers of the same compound are not treated as separate compounds. The form shown is as in the paper

• Other “rules” for standardising NO2 groups, HCl salts etc Based on FDA Substance Registration System User's Guide http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSystem- UniqueIngredientIdentifierUNII/default.htm • Identifying unique compounds is done using Standard InChI • Salts and parent are grouped together for displaying bioactivity data although activity data is recorded against the specific salt

7 ChEMBL Assays – Binding, Functional, ADMET Binding Assays • Assays which directly measure the binding of a compound to a particular target (e.g., competition binding assays with a radioligand)

Functional Assays • Whole organism assays (e.g., anti-infectives/parasitics) • Cell-based over-expressing target (e.g., GPCR Calcium mobilisation)

ADMET Assays • Absorption, Distribution, Metabolism, Excretion,Toxicity (e.g t1/2 in rat)

8

ChEMBL Targets:

Protein Protein complex Protein family Nucleic Acid

e.g., PDE5 e.g., Nicotinic receptor e.g., Muscarinic receptors e.g., DNA

Cell Line Tissue Sub-cellular Fraction Organism

e.g., HEK293 cells e.g., Nervous e.g., Mitochondria e.g., Drosophila

9 ChEMBL Activities

• Bioactivity value for a particular compound in a specific assay against a specific target

e.g. data for (Gleevec, CHEMBL941)

10 Use Cases • Use Case 1 • Interested in a specific compound (or substructure) • Which compounds are similar • Which targets do they bind to • What are their potential liabilities

• Use Case 2 • Interested in a specific target • Identify compounds that bind to that target • What data is available on these compounds • Select compounds with good potency/good drug-like properties to start synthetic project • If there is no data for compounds binding to my protein is there data for closely related members of the same family?

11 Searching by Target in ChEMBL Find target by name, BLAST search or target tree

Identify Potent Compounds IC50<10nM

12 Bioactivity Data download data

13 Compound Activity values Assay details Target details Refs structures

13 Information on a Target – Target Report Card

14 Compound Searching in ChEMBL

86 similar at >=90%

15 Also sub-structure searches Filter on Lipinski Rule of Five etc Chart View

16 Compound Search – Compound Report Card

17 Searching Assays

18 Marketed Drugs

Export to Excel Export SDF UniChem http://chembl.blogspot.com/2011/11/unichem.html

20 Multiple EBI Resources hold Compound Structure Data

- Maintaining links between DBs is a manual/time consuming for each source. - Business rules for constructing identifiers not consistent – users confused.

„10003‟ „RIBOSTAMYCIN‟

„CHEMBL221572‟

EU_OPENSCREEN „EU-?????‟

etc ? „RIO‟ ??‟

21 UniChem: a resource overlaying EBI chemistry resources

EU_OPENSCREEN etc, etc…

Uses standard InChI to compare chemical structures across databases Enables tracking of „id-to-structure‟ assignments over time All EBI DBs share the benefit of maintained links between chemistry resources

22 Additional benefits of UniChem: Maintained links to external sources are shared.

EU_OPENSCREEN etc

- All EBI DBs share the benefits of maintained links to external resources. - The „mapping service‟ will be opened for use by external users.

23 UniChem - Sources

24 24 UniChem – InChi or identifier searching

25 UniChem – InChi or Identifier Searching InChi Searching

Identifier Searching

26 UniChem – Source Mapping

27 Useful information Other ChEMBL Resources If you would like help: [email protected] For ChEMBL news and data releases subscribe to: http://listserver.ebi.ac.uk/mailman/listinfo/chembl- announce

ChEMBL Paper (NAR 2011)

ChEMBL Blog: http://chembl.blogspot.com

28 Acknowledgements

ChEMBL Group John Overington Collaborators Anne Hersey Imperial Cancer Research, University of Anna Gaulton Dundee, University of Cambridge, Mark Davies Sanger Centre, University of Maryland, NCBI, TDR, IUPHAR, Bayer-Schering, Jon Chambers Pfizer, GSK, Schering-Plough, MMV, Louisa Bellis Novartis, St Jude Children‟s Research Kazuyoshi Ikeda Hospital Patricia Bento

Shaun McGlinchey EMBL-EBI colleagues Yvonne Light

Felix Krueger Former Inpharmatica colleagues Ben Stauch Ruth Akhtar Francis Atkinson Rita Santos

ChEMBL Hands-on Use Case 1 • Search for target by name ( A2a) • Select human target • Look at Target Report Card • Return to Target Search Results (back button) and tick just the human target • Filter bioactivity data for Ki<10nM • Sort the data by value • Output data to EXCEL • Select and copy a few CHEMBL_IDs • Paste into Compound Search - List Search • View all data on compounds

30 https://www.ebi.ac.uk/chembldb ChEMBL Hands-on Use Case 2 • Search by compound name () • Look at data on parent and salt • Look at links to crystal structure and clinical trials data • Select sildenafil structure and use as query • Go to compound search page • Do similarity search >=85% • Go to graph view of data • Filter by compound properties e.g logP<5 and Mol Wt<500 • Look at bioactivity data on new dataset

31 https://www.ebi.ac.uk/chembldb ChEMBL Hands-on

Use Case 3 • Get FASTA sequence for IRAK2 from Uniprot http://www.uniprot.org/ • Paste sequence into ChEMBL – Protein Target Search • Sort results by BLAST Score • View bioactivity data on 3 most similar proteins (IRAK1, IRAK3, IRAK4) • Look at target report card for IRAK4. Look at database links for this protein e.g. PDBe • Select the few most ligand efficient compounds that bind to this target and view the data

32 ChEMBL Hands-on

Use Case 4 • Search for assays that contain the word “Alzheimer” • Sort on activity counts • Look at document report card for assay

Use Case 5 • Browse drugs • Filter on oral • Show download to excel • Drug approvals • Show links to blog

33 https://www.ebi.ac.uk/chembldb Answers

34 Use Case 1

35 36 37 Paste CHEMBL IDs

38 39 Use Case 2

40 41 42 43 Use Case 3

44 http://www.uniprot.org Protein Sequence of Interest e.g from UniProt http://www.uniprot.org

Sort by BLAST Score

Data on IRAK1,IRAK3 and 45 IRAK4 but not IRAK2 IRAK1, IRAK3 and IRAK4 data

46 47

Use Case 4

48 Use Case 5

49 50