<<

ChEMBL resources and KNIME

George Papadatos [email protected]

Outline

• ChEMBL data • ChEMBL nodes • Web services v2.0 • UniChem • Cheminformatics utilities • myChEMBL • SureChEMBL and Open PHACTS ChEMBL: Data for drug discovery 1. Scientific facts 3. Insight, tools and resources for translational drug discovery

>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE Compound RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG Ki = 4.5nM PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE Bioactivity data Assay/Target Assay/Target APTT = 11 min.

2. Organization, integration, curation and standardization of pharmacology data ChEMBL: Data for drug discovery 1. Scientific facts 3. Insight, tools and resources for translational drug discovery

>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE Compound RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG Ki = 4.5nM PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE Bioactivity data Assay/Target Assay/Target APTT = 11 min.

2. Organization, integration, curation and standardization of pharmacology data KNIME at the EBI

• Access ChEBI and ChEMBL databases via KNIME nodes • Trusted community nodes • Algorithms development • Document classification • Share example workflows and use cases • Provide KNIME training to scientists and researchers • drug discovery courses, EMBL courses • CDK community nodes development

hp://tech.knime.org/book/embl-ebi-nodes ChEMBL nodes ChEMBL KNIME nodes Example: All bioactivities for hERG

All bioacvies for hERG

Acvity value, assay descripon, compound, reference Example: Compound searching in ChEMBL

List of NNs

Query Example: Polypharmacology profile

Find NNs Filter, summarise & pivot

Retrieve bioacvies

Query Compounds Web services v2.0

• Many more entities à granularity • Pagination, filtering, ordering UniChem integration EMBL-EBI chemistry resources

RDF and REST API interfaces

Atlas PDBe ChEBI ChEMBL SureChEMBL 3rd Party Data

ZINC, PubChem, ThomsonPharma Ligand Ligand Nomenclature Bioacvity DOTF, IUPHAR, Chemical DrugBank, KEGG, induced structures of primary and data from structures NIH NCC, transcript from secondary literature from patent eMolecules, FDA response structurally metabolites. and literature SRS, PharmGKB, defined Chemical deposions Selleck, …. protein Ontology complexes

750 15K 24K 1.5M ~17M ~70M

UniChem – InChI-based chemical resolver (full + relaxed ‘lenses’) >90M

REST API Interface - hps://www.ebi.ac.uk/unichem/ hps://www.ebi.ac.uk/unichem/ Novelty checking with UniChem Cheminformatics utilities Cheminformatics utilities (aka ‘Beaker’)

• Chemical format conversions • Dynamic image generation • Image processing (via OSRA) • Descriptors and property calculations • Chemical modifications and standardization

https://www.ebi.ac.uk/chembl/api/utils/docs Example: Image to Structure

image URL myChEMBL integration

Accessing local data with myChEMBL Using KNIME to connect to myChEMBL

SELECT mr.*, md.chembl_id, cp.full_mwt, cp.alogp from mols_rdkit mr, molecule_dictionary md, compound_properties cp where mr.m @> '$${SMolecule}$$'::qmol and mr.molregno = md.molregno and md.molregno = cp.molregno; SureChEMBL and Open PHACTS SureChEMBL and Open PHACTS

SureChEMBL

SciBite Termite

Open PHACTS API

https://dev.openphacts.org/docs/develop https://github.com/openphacts/OPS-Knime/ http://rdf.ebi.ac.uk/resource/surechembl/patent/US-8877786-B2

MCS scaffold US-8877786-B2 Substituted carbamoylmethylamino derivatives as novel NEP inhibitors

Most relevant targets and diseases http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL371804 Foretinib, a kinase inhibitor in clinical phase II Patent publication date histogram Found in 89 EP, WO and US patents

Most relevant diseases

Most relevant targets Summary

• KNIME: democratizes access to data and tools • Access public domain structure and bioactivity data and services with KNIME • ChEMBL KNIME Nodes • UniChem • Cheminformatics services • myChEMBL • SureChEMBL Publications Acknowledgements

• Francis Atkinson • Thorsten Meinl • Louisa Bellis • KNIME • Jon Chambers • KNIME community • Michał Nowotka • Anne Hersey

• Stefan Beisken • Edmund Duesbury

• Daniela Digles

All workflow examples are available on request. ChEMBL resources and KNIME

George Papadatos [email protected]