ChEMBL resources and KNIME
George Papadatos [email protected]
Outline
• ChEMBL data • ChEMBL nodes • Web services v2.0 • UniChem • Cheminformatics utilities • myChEMBL • SureChEMBL and Open PHACTS ChEMBL: Data for drug discovery 1. Scientific facts 3. Insight, tools and resources for translational drug discovery
>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE Compound RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG Ki = 4.5nM PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE Bioactivity data Assay/Target Assay/Target APTT = 11 min.
2. Organization, integration, curation and standardization of pharmacology data ChEMBL: Data for drug discovery 1. Scientific facts 3. Insight, tools and resources for translational drug discovery
>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE Compound RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG Ki = 4.5nM PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE Bioactivity data Assay/Target Assay/Target APTT = 11 min.
2. Organization, integration, curation and standardization of pharmacology data KNIME at the EBI
• Access ChEBI and ChEMBL databases via KNIME nodes • Trusted community nodes • Algorithms development • Document classification • Share example workflows and use cases • Provide KNIME training to scientists and researchers • Wellcome Trust drug discovery courses, EMBL courses • CDK community nodes development
h p://tech.knime.org/book/embl-ebi-nodes ChEMBL nodes ChEMBL KNIME nodes Example: All bioactivities for hERG
All bioac vi es for hERG
Ac vity value, assay descrip on, compound, reference Example: Compound searching in ChEMBL
List of NNs
Query Example: Polypharmacology profile
Find NNs Filter, summarise & pivot
Retrieve bioac vi es
Query Compounds Web services v2.0
• Many more entities à granularity • Pagination, filtering, ordering UniChem integration EMBL-EBI chemistry resources
RDF and REST API interfaces
Atlas PDBe ChEBI ChEMBL SureChEMBL 3rd Party Data
ZINC, PubChem, ThomsonPharma Ligand Ligand Nomenclature Bioac vity DOTF, IUPHAR, Chemical DrugBank, KEGG, induced structures of primary and data from structures NIH NCC, transcript from secondary literature from patent eMolecules, FDA response structurally metabolites. and literature SRS, PharmGKB, defined Chemical deposi ons Selleck, …. protein Ontology complexes
750 15K 24K 1.5M ~17M ~70M
UniChem – InChI-based chemical resolver (full + relaxed ‘lenses’) >90M
REST API Interface - h ps://www.ebi.ac.uk/unichem/ h ps://www.ebi.ac.uk/unichem/ Novelty checking with UniChem Cheminformatics utilities Cheminformatics utilities (aka ‘Beaker’)
• Chemical format conversions • Dynamic image generation • Image processing (via OSRA) • Descriptors and property calculations • Chemical modifications and standardization
https://www.ebi.ac.uk/chembl/api/utils/docs Example: Image to Structure
image URL myChEMBL integration
Accessing local data with myChEMBL Using KNIME to connect to myChEMBL
SELECT mr.*, md.chembl_id, cp.full_mwt, cp.alogp from mols_rdkit mr, molecule_dictionary md, compound_properties cp where mr.m @> '$${SMolecule}$$'::qmol and mr.molregno = md.molregno and md.molregno = cp.molregno; SureChEMBL and Open PHACTS SureChEMBL and Open PHACTS
SureChEMBL
SciBite Termite
Open PHACTS API
https://dev.openphacts.org/docs/develop https://github.com/openphacts/OPS-Knime/ http://rdf.ebi.ac.uk/resource/surechembl/patent/US-8877786-B2
MCS scaffold US-8877786-B2 Substituted carbamoylmethylamino acetic acid derivatives as novel NEP inhibitors
Most relevant targets and diseases http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL371804 Foretinib, a kinase inhibitor in clinical phase II Patent publication date histogram Found in 89 EP, WO and US patents
Most relevant diseases
Most relevant targets Summary
• KNIME: democratizes access to data and tools • Access public domain structure and bioactivity data and services with KNIME • ChEMBL KNIME Nodes • UniChem • Cheminformatics services • myChEMBL • SureChEMBL Publications Acknowledgements
• Francis Atkinson • Thorsten Meinl • Louisa Bellis • KNIME • Jon Chambers • KNIME community • Michał Nowotka • Anne Hersey
• Stefan Beisken • Edmund Duesbury
• Daniela Digles
All workflow examples are available on request. ChEMBL resources and KNIME
George Papadatos [email protected]