Data For Drugs

John Overington, CIO [email protected]

@johnpoverington

©2017 Medical Discovery Catapult. All rights reserved. Medical Discovery Catapult and the Medical Discovery Catapult logo are among the trademarks or registered trademarks owned by or licensed to Medical Discovery Catapult. All other marks are the property of their respective owners. • The Medicines Discovery Catapult • ChEMBL, SureChEMBL & UniChem • Errors, Errors, Everywhere • Drug Blending • Resistance • Competitive Intelligence • Are Antibacterials Really Different?

• Assay Networks ©2016 Medical Discovery Catapult. rights Catapult. reserved. All ©2016 Discovery Medical

2 The Medicines Discovery Catapult The UK Catapult Programme The Catapult centres are a network of world-leading centres designed to transform the UK’s capability for innovation in specific areas and help drive future economic growth. Medicines Discovery Catapult

5 Medicines Discovery Catapult

• Supporting innovative ‘Fast-to-Patient’ Medicines Discovery • A not-for-profit company set up and funded by Innovate UK • Helping to solve shared problems through new disease-based Syndicates corner-stoned by medical research charities • Focus on translating potential drug candidates into clinical trials as quickly as possible for the good of the wealth and health of the UK • Doing wet science, informatics, virtual discovery, technology development, process challenge • Lower barrier to entry and improve market liquidity ChEMBL, SureChEMBL & UniChem ChEMBL – https://www.ebi.ac.uk/chembl

• The world’s largest primary public database of medicinal chemistry data • ~1.7 million compounds • ~11,000 targets • ~14 million bioactivities • Truly Open Data - CC-BY-SA license • ChEMBL data also loaded into BindingDB, PubChem BioAssay and BARD • MyChEMBL VM, RDF, full relational download….

A. Gaulton et al (2012) Nucleic Acids Research Database Issue. 40 D1100-1107 ChEMBL

Compound >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEG Inhibition of SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD K =4.5 nM CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVL human Thrombin i TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY THVFRLKKWIQKVIDQFGE

PTT (partial ED =230 nM thromboplastin 2

time) Assay Affinity of Drugs for Their Efficacy Targets

Ki, Kd, IC50, EC50, & pA2 endpoints for drugs against their‘efficacy targets’

400

350

300

250

200 Frequency 150

100

50

0 2 3 4 5 6 7 8 9 10 11 12

-log10 affinity

10mM 1mM 100mM 10mM 1mM 100nM 10nM 1nM 100pM 10pM 1pM

Overington, et al, Nature Rev. Drug Disc. 5 pp. 993-996 (2006) Gleeson et al, Nature Rev. Drug Disc. 10 pp. 197-208 (2011) SureChEMBL– https://www.surechembl.org

• New Public chemistry patent resource • Donated by Digital Science – SureChem commercial product • Automatically extracted chemical structures from full- text patents • ~15 million chemical structures • Updated daily • Full chemistry download UniChem – https://www.ebi.ac.uk/unichem

• Simple chemical integration service • >144 million structures from ~30 sources • URI/resource ID/Standard InChI based lookups • Available chemicals, PubChem, ZINC, real time, private • Chemical structure ‘Time Machine’

J. Chambers et al (2013) J. Cheminf. DOI:10.1186/1758-2946-5-3 Some Personal Perspectives on ChEMBL

• Things that worked well • Single, major visionary funder – Wellcome Trust • Focus on data content/backend not GUI • Bioinformatics and cheminformatics • Clear License – CC-BY-SA - same license as Wikipedia content • Private/secure https: services from start • Opportunism – SureChEMBL • Open data re-envigorated cheminformatics research • Things that didn’t work so well • Community curation attempts • Publisher interactions – except Royal Society of Chemistry • Insufficient staff in outreach/training • Migration to FOSS was too slow Errors, Errors, Everywhere The Reproducibility Crisis

Begley & Lee (2012) Nature DOI:10.1038/483531 & Prinz et al (2011) NRDD DOI:10.1038/nrd3439-c1 Errors in ChEMBL

“The more complex the parameter, the more frequent the errors”

Enhanced data model for ChEMBL can appear as errors – complexes, receptor sets, model organisms Tiikkainen et al (2013) JCIM DOI:10.1021/ci400099q Errors in SureChEMBL

Senger et al J Cheminf (2015) DOI:10.1186/s13321-015-0097-z Inter-species Assay Variability Same compound, same end-point for rat and human orthologs

Scatter plot of measured Distribution of potency potencies differences 12

0.6 2

y 10 n = 2.781

t

n

f

a

y $

8 t

i 0.4

e

rat

s

m

n

a

e

r d

pKi 6

F

o norm. dens. norm.

h 0.2 t

r 4 o

2 2 4 6 8 10 12 −4 −2 0 2 4 orthopKiFra humanme$afnty1 diff(human,diff rat)

Krüger & Overington (2012) PLoS Comp. Biol. DOI:10.1371/journal.pcbi.1002333 Inter-lab Variability Same compound, same species, different publication

Scatter plot of measured Distribution of potency potencies differences 12 n = 3.000

2 0.6 y

t 10

n

f

a

$ y

e 8 t

i 0.4

m

s

a

n

Assay2

r

e

i norm. dens. norm.

F 6

d

e

l pK

p 0.2

m 4

a s

2

2 4 6 8 10 12 −4 −2 0 2 4 sampleFrpKi Assay1ame$afnty1 diff(assay1,diff assay2)

Krüger & Overington (2012) PLoS Comp. Biol. DOI:10.1371/journal.pcbi.1002333 Krüger & Overington (2012) (2012) Overington Krüger & Inter - species PLoS Comp. Biol. Comp. PLoS

densitydensity DOI:10.1371/journal.pcbi.1002333 vs Inter pK ii - - pK lab Variability ij Inter Inter - - orthologue publication Large-Scale Cell-line Screening Data

M.J. Garnett et al (2012) Nature DOI:10.1371/journal.pcbi.1002333 & J. Barretina et al (2012) Nature DOI:10.1038/nature11003 Inconsistent Cell-line Screening Data

B. Haibe-Kains et al (2013) Nature DOI:10.1038/nature12831 (see also Stransky et al (2015) Nature DOI:10.1038/nature15736) Primary Data – Batches and Replicates

http://www.wexlerwallace.com/wp-content/uploads/2012/04/Southeast-Laborers-Health-v-Pfizer.pdf Incorrect Chemical Structures Bosutinib Voxtalisib

http://cen.acs.org/articles/90/web/2012/05/Bosutinib-Buyer-Beware.html and Overington & Wennerberg unpublished Drug Blending Drug Targeting

Single Drug Multiple Drugs Classic Drug Single Discovery, Drug Blending Target Ehrlich’s ‘Magic Bullet’

Multiple Designed Combination Targets Polypharmacology Therapy

Overington and Al-Lazikani - unpublished Monotherapy vs Polypharmacology

Monotherapy, monopharmacology Monotherapy, polypharmacology

Illustrative only

Cetuximab : EGFR Erlotinib : EGFR

Overington and Al-Lazikani - unpublished Combination Therapy vs Blending

Combination therapy, polypharmacology Combination therapy, monopharmacology

Erlotinib : EGFR Losmapimod : p38a Erlotinib : EGFR Gefitinib : EGFR

Overington and Al-Lazikani - unpublished Drug Targeting Drug Single Drug Multiple Drugs

Classic Drug Discovery, Drug Blending Ehrlich’s ‘Magic

Bullet’ Single Target Single

Target Designed/Serendipito Combination

us Polypharmacology Therapy Multiple Targets Multiple

Overington and Al-Lazikani - unpublished Drug

Cmax Tmax • Drugs do not work under steady state conditions

Absorption Elimination

Rate ka Rate kel

30 10.05.2017 Master headline Drug Action

• n.b. effective concentration at site of drug action can be higher or lower than plasma concentration

% effect 100

75

50

MEC ∝ XC50 ‘efficacy’ target 25

31 10.05.2017 Master headline MEC = Minimum Effective Concentration Adverse Drug Reactions (ADRs)

• Acute ADRs are usually related to adverse

pharmacology at/around Cmax MEC ADR target • Cmax can vary greatly due to drug dose and a wide range of environmental and genetic factors • Occurrence and duration of side-effects appears stochastic

• Examples • QT prolongation/hERG effects for cisapride – potentially fatal % effect 100 • Blurred vision side effect for sildenafil – an inconvenience 75

50

25

32 10.05.2017 Master headline One Drug - Many Targets

• As concentration increases, an ever larger number of targets are modulated

XC50 ADR target • Polypharmacology – many effects from one drug • Same target -> different effects • Different target -> different effects XC50 ‘off’ target • Dose dependent

MEC Efficacy target

33 10.05.2017 Master headline Lower Dose, Shorter Duration of Action

Cmax 75mg dose

Cmax 37.5mg dose

75 mg dose, ka = 0.5, kel = 0.2

37.5 mg dose, ka = 0.5, kel = 0.2 Cmax 18.75mg dose 18.75 mg dose, ka = 0.5, kel = 0.2 MEC Imatinib Polypharmacology Spectra

Tyrosine-protein kinase FYN 5.38 ATP-binding cassette sub-family G member 2 5.39 c-Jun N-terminal kinase 1 5.40 Serine/threonine-protein kinase 17A 5.41 c-Jun N-terminal kinase 3 5.50 Dual specificity protein kinase CLK4 5.53 Mixed lineage kinase 7 5.59 -protein kinase FGR 5.62 Tyrosine-protein kinase FRK 5.64 Maternal embryonic leucine zipper kinase 5.72 Serine/threonine-protein kinase GAK 5.72 Ephrin type-A receptor 8 5.77 Serine/threonine-protein kinase RAF 5.77

Interleukin-1 receptor-associated kinase 1 5.92

)

1 -

Carbonic anhydrase XII 6.01 Homeodomain-interacting protein kinase 4 6.02 Tyrosine-protein kinase Lyn 6.05 Carbonic anhydrase III 6.28 Tyrosine-protein kinase BLK 6.28 Carbonic anhydrase XIV 6.33 BCR/ABL p210 fusion protein 6.41 Carbonic anhydrase VI 6.41 Phosphatidylinositol-5-phosphate 4-kinase type-2 gamma 6.42

Concentration (ng.ml Concentration Macrophage colony stimulating factor receptor 6.54 Stem cell growth factor receptor 6.62 Tyrosine-protein kinase LCK 7.00 Bcr/Abl fusion protein 6.66 Platelet-derived growth factor receptor alpha 7.09 Carbonic anhydrase VII 6.96 Carbonic anhydrase 15 6.07.11 Carbonic anhydrase IX 7.12 Platelet-derived growth factor receptor beta 7.14 Tyrosine-protein kinase ABL 7.20 Platelet-derived growth factor receptor 7.30 Discoidin domain-containing receptor 2 7.34 Epithelial discoidin domain-containing receptor 1 7.37 Carbonic anhydrase I 7.50 Carbonic anhydrase II 7.52 Tyrosine-protein kinase ABL2 7.94

7.0 8.0

Time (hr)

Imatinib 400 mg single dose from Jawhari et al (2011) J Bioequiv Availab 3: 161-164; Data is median pChEMBL for human targets from ChEMBL 16 Approved Drugs 7.5 nM 1.3 nM 0.3 nM 0.6 nM

Rosuvastatin * * Tenivastatin is the hydrolyzed form of 1.7 nM 2.3 nM 12.0 nM 0.12 nM Simvastatin

IC50 values in a rat microsomal assay. From ‘HMGCoA Reductase Inhibitors’, ed. Schmitz and Torzewski (2002) Statin Efficacy & Safety

IC50 (nM) 1.7

7.5 /L)

mmol 12.0

0.12

C C ( -

2.3 Clinical safety of Atorvastatin

0.3 Incidence of side effects (%) effects side of Incidence Mean difference LDL in difference Mean Daily dose (mg)

Daily dose (mg) Statin Systems Pharmacology

Plasma Volume of Oral Clearance Half-life Passive Trade Pro- Lipophilicity Absorption protein Distribution T Main Metabolic INN M.Wt. bioavailability C max T permeability Main Transporters name drug Log D (%) binding V L (hr) 1/2 F (%) d (L/hr/kg) (hr) (nm/s) (%) (L/kg) BCRP, MRP2, NTCP, OATP1A2, OATP1B1, Atorvastatin Lipitor 559 1.00-1.25 30 12 80-90 5.2 0.25 2-3 15-30 23 3A4 OATP1B3, OATP2B1 Pgp Cerivastatin Baycol 460 1.5-1.75 98 60 >99 0.33 0.2 2-3 137 2C8 3A4 BCRP and OATP1B1 BCRP, OATP1B1, Fluvastatin Lescol 411 1.00-1.25 98 19-29 >99 0.15-0.17 0.97 <1 0.5-2.3 2C8 2C9 3A4 OATP1B3 and OATP2B1 Lovastatin Mevacor Y 405 3.91 30 5 >95 0.26-1.1 2-4 2.9 328 3A4 OATP1B1 and P-gp BCRP, MRP2, NTCP, Pitavastatin Livalo 421 1.5 80 >60 96 2 1 11 35 2C9 OATP1B1, OATP1B3 and Pgp OATP1B1, OATP1B3, OATP2B1, Pgp, MRP2, Pravastatin Pravchol 425 -1.00 - -0.7 34 18 43-55 0.46 0.81 1-1.5 1.3-2.8 7.5 non CYP BCRP and OAT3 in renal clearance OATP1A2, OATP1B1, OATP1B3, OATP2B1, Rosuvastati Crestor 482 -0.5 - -0.25 50 20 88 1.8 0.67 3-4 20.8 4.4 2C9 BCRP, Pgp, MRP2 and n NTCP, OAT3 in renal clearance SimvastatinLipophilicZocor Y 419 4.4 60-80 5 94-98 0.45 4 2-3 352 2C8 3A4 PolarBCRP and Pgp Lovastatin* Simvastatin* Atorvastatin Cerivastatin Fluvastatin Pitavastatin Pravastatin

Data adapted from Generaux et al. Xenobiotica, 2011; 41: 639–651, and 2011 US prescribing information Statin Pharmacogenetics

Figures from Niemi, Clinical Pharmacology & Therapeutics, 87, pp. 130-133 (2010) Combination Therapy

Combination therapy, polypharmacology Combination therapy, monopharmacology

Erlotinib : EGFR Losmapimod : p38a Erlotinib : EGFR Gefitinib : EGFR

Multiple Targets Single Target Combination Therapy – Multiple Targets

• Combine drugs against different targets and look for improved outcomes • mechanistic synergy, sensitization, treatment of comorbidities, increased compliance? • Examples • Hyzaar (losartan and potassium-hydrochlorothiazide) – hypertension • Vytorin (simvastatin and ) – hypercholestemia • Dosing levels not usually reduced in combination products compared to monotherapy Combination Drugs - Dosing

88 Drugs used in combinations, covering 36 targets USAN

Dose (mg) Al-Lazikani & Overington, unpublished. Data from Chembl13 load of FDA Orange Book Drug Blending

• Combine drugs against the same targets and look for improved outcomes • Combined ‘Me too’ drugs, but…..

• Differing off-target bioactivity spectra, Cmax, Tmax, AUC∞ • Benefits • Minimize effects of genetic variation of target/ADMET system • Efficacy target ‘sees’ pooled concentrations • Off-targets ‘see’ reduced concentrations of components • Reduced resistance in anti-infective and anti-cancer settings • Ability to dose higher in anti-infective setting • Improved population-level safety and minimized intrapatient variability

Al-Lazikani & Overington, unpublished. Data from Chembl13 load of FDA Orange Book Drug Blending – single agent

MEC ADR Drug A

Drug A, dose = 75 mg Drug A, dose = 150 mg

MEC Drug A Drug Blending – two agents

MEC ADR Drug B

MEC ADR Drug A

Drug A, dose = 75 mg

Drug B, dose = 30 mg Pooled concentration of A and B MEC Efficacy Drug A MEC Efficacy Drug B

Resistance ©2016 Medical Discovery Catapult. rights Catapult. reserved. All ©2016 Discovery Medical

46 Mechanisms of Drug Resistance

Efficacy Metabolism Transport Target

Expression Expression Expression Coding Mutation

HIV-1 Proteinase Beta lactamase PGP (antivirals) (antibacterials) (anti-cancers)

Real World Drug Resistance ©2016 Medical Discovery Catapult. rights Catapult. reserved. All ©2016 Discovery Medical

48 Gainor and Shaw (2013) J. Clin, Oncol. 31 3987-3996 Selected Clinical EGFR Inhibitors EGFR Selected Clinical Selectivity data taken from OSI - 744, CP Erlotinib Tarceva - 358774 Ghoreschi et al, Nature AZD Gefitinib Iressa Immunol - 1839 . (2009) 10 356 - 360 AEE - 788 Tykerb/Tyverb GW Lapatinib - 572016 49 ©2016 Medical Discovery Catapult. All rights reserved. Overlay of EGFR Inhibitors

2-D overlay of Erlotinib, Gefitinib, Lapatinib and AEE-788 Hydrophobic Pocket II, Allosteric site

Adenine mimic

Adenine ring of ATP

Hydrophobic Pocket I

Overington and Van Westen - unpublished Drug Resistance

Subsite 1 Core site Subsite 2 Mutation / Selection Wild-type Target Wild-type Target Mutant Target

n.b. Many alternative mechanisms for resistance exist! Resistance – Switched Sequential Therapy

Mutation / Selection Wild-type Target Mutant Target Mutant Target

Mutation / Selection Wild-type Target Mutant Target Mutant Target Resistance – Blending

Mutant Target Blend sensitive

Mutation / Wild-type Target Selection Mutant Target Wild-type Target Blend sensitive Blend sensitive

What is probability of jointly resistant mutant simultaneously arising? Mutant Target Blend resistant DNA Mutations Can Change Coded Protein

Protein Phe Pro Met Arg Gly Asp

Gene T T C C C A A T G C G T G G A G A C

mutation A Tyr

C Ser G Cys

translation Mutation Probabilities Are Not Random

Alexandrov et al., Nature 500, 415–421 (22 August 2013) doi:10.1038/nature12477 Mutation Probabilities Are Not Random

Alexandrov et al, Nature 500, 415–421 (22 August 2013) doi:10.1038/nature12477 Different Profiles = Different Mutants

Gene T T C C C A A T G C G T G G A G A C

Signature 7 (melanoma) T T T T T

Ser

Phe Leu Cys Asp Protein Phe Pro Met Arg Gly Asp

Leu Thr Ser Glu

Gln

A A A A A Signature 18 (neuroblastoma) Gene T T C C C A A T G C G T G G A G A C Resistance Is Practically Bounded

• What if only a relatively limited repertoire of mutations were possible in a tumour? • Take all CDS in Ensembl • Apply Alexandrov et al frequencies to score all possible mutations in all genes • Gives precomputed library of mutants specific to particular cancer background • Can select from this set efficacy targets for drugs • Can model binding site differences • Effect + Likelihood • Does a ‘blend’ of inhibitors to same target offer significant advantages in resistance • Inhibitors will have distinct target binding, metabolism and transport SARs – more robust to multiple types of resistance Mutational Profiles in Different Cancers

Competitive Intelligence ©2016 Medical Discovery Catapult. rights Catapult. reserved. All ©2016 Discovery Medical

60 Privileged Target Families ChEMBL17 Drugs

Santos & Overington, unpublished Clinical Kinome

Overington, Al-Lazikani & Wennerberg, unpublished Clinical Kinome

• 399 Clinical stage human kinase inhibitors • 29 Approved small molecule kinase inhibitors • 15 -tinib – tyrosine kinase inhibitors • 5 -rolimus – mTor inhibitors • 4 -rafenib – Raf inhibitors • 2 -anib – angiogenesis inhibitors • 1 -metinib – met inhibitor • 1 brutinib – Bruton tyrosine kinase inhbitors • 1 -dil – Rho kinase inhibitor (Japan only) • 38 Phase 3 • 143 Phase 2 • 189 Phase 1 • Phase 1:2 ratio is atypical due to many kinase inhibitor trials being phase 1/2 oncology trials Kinase Inhibitors in Clinical Development

Overington, Bellis, Al-Lazikani & Wennerberg, unpublished Kinase Inhibitor Attrition

Overington, unpublished Kinase Inhibitor Productivity

Overington, unpublished

Are Antibacterials Really Different? ©2016 Medical Discovery Catapult. rights Catapult. reserved. All ©2016 Discovery Medical

67 Antibacterial Physicochemical Properties

• Antibacterials widely known to fall in a different region of ‘chemical space’ to ‘human’ drugs • Larger and more polar • Mostly natural products • Seen as exceptions to Lipinski’s rule-of- five

O’Shea & Moser (2008) J Med Chem DOI:10.1021/jm700967e Antibacterial Drug Target Classes

Protein RNA/riboprotein

PBP 30S ribosomal subunit

H C CH NH2 3 3 OH N H CH3 (R) N H3C H H OH (S) (R) CH3

O N HO (R) CH3 NH2 O OH O OH O OH O O HO Amoxicillin Oral Oral Natural product-derived Natural product Target Class View of Physicochemistry

Mugumbate & Overington (2015) Biorg Med Chem DOI:10.1016/j.bmc.2015.04.063 Oligonucleotide vs Oligopeptide Polarity 23 natural amino acids 4 RNA nucleosides

Element % protein % RNA (unweighted (unweighted monomer monomer composition) composition) C 65 45 N 17 33 O 17 17 S 1 0 P 0 5

• RNA species are significantly more polar than proteins • Binding site composition comparisons underway RNA Target Ligands • Discovery of a novel class of , binding to a ncRNA riboflavin riboswitch - ribB • Screened 57,000 known synthetic antibacterials for riboflavin essentiality

OH

HO (R) (S) OH

(S) CH3 OH Roseoflavin (IC50 = 300nM) N N N O H3C FMN antimetabolite, binds ribB NH -1 H3C MIC > 128 mg/ml E.coli MB5746

O

CH3 HN

S N N Ribocil (IC50 = 300nM)

N Competitive binding wrt FMN -1 N N NH2 MIC > 2 mg/ml E.coli MB5746

Howe et al (2015) Nature DOI:10.1038/nature15542 Implications • Clear difference observed in physicochemical properties of antibacterials • Basis of differences are likely to be due to target class, not organism • Structural analysis supports larger, more polar ligands for RNA targets • Antibacterial protein-targeted compound similar to Human protein-targeted compounds • Historical antibacterial data likely biased to RNA-directed chemotypes • Alignment of compound collections to RNA-directed property profile likely to generate more RNA-directed compounds from phenotypic hits • These however may make great drugs!

Mugumbate & Overington (2015) Biorg Med Chem DOI:10.1016/j.bmc.2015.04.063

Assay Networks ©2016 Medical Discovery Catapult. rights Catapult. reserved. All ©2016 Discovery Medical

74 Assays from Target to Clinic

Cell- Animal Human Biochemical Functional based disease clinical assay assay screen model trial

Build assay networks Link to animal models Directed graph of all Understand attrition from co-occurrence and genetics assays from targets to through drug development trials Zwierzyna, Atkinson & Overington, unpublished Assay Graph of Approved Drugs FDA approved drugs linked by shared activity in a phenotypic assay

MEMANTINE FEXOFENADINE

KETAMINE KETOTIFEN

ESCITALOPRAM LEVOMILNACIPRAN Inflammation CROMOLYN

MECLOFENAMIC ACID

CITALOPRAM RISEDRONIC ACID MILNACIPRAN PYRIDOSTIGMINE

INDOMETHACIN MAZINDOL ALENDRONIC ACID Pain PAMIDRONIC ACID

ASPIRIN ATOMOXETINE FUROSEMIDE PAROXETINE PROCAINAMIDE IBANDRONIC ACID NEOSTIGMINE ZOLEDRONIC ACID DULOXETINE

EDROPHONIUM CHLORTHALIDONE INDAPAMIDE DICHLORPHENAMIDE DESVENLAFAXINE ETHOXZOLAMIDE

VENLAFAXINE METHYLNALTREXONE HYDROCHLOROTHIAZIDE FLUOXETINE DORZOLAMIDE MEPERIDINE CYPROHEPTADINE LOPERAMIDE TRICHLORMETHIAZIDE METHAZOLAMIDE DESIPRAMINE MAFENIDE TACRINE SERTRALINE RIVASTIGMINE METHAMPHETAMINE NALMEFENE BRINZOLAMIDE DEXTROAMPHETAMINE

GALANTAMINE ACETAMINOPHEN BROMPHENIRAMINE

FENTANYL ETHOPROPAZINE DOXEPIN TOLTERODINE HYDROQUINONE DONEPEZIL SCOPOLAMINE CYCLOPHOSPHAMIDE FLUVOXAMINE ERLOSAMIDE TETRACAINE CLOMIPRAMINE PROMETHAZINE SOLIFENACIN OXYBUTYNIN DARIFENACIN

PENTAZOCINE DECAMETHONIUM

ATROPINE VEMURAFENIB

ALVIMOPAN AMLODIPINE CHLORPROMAZINE ILOPERIDONE Signal transduction ZIPRASIDONE DICYCLOMINE NILOTINIB

PROMAZINE PIMOZIDE

RISPERIDONE

BOSUTINIB DASATINIB MIRTAZAPINE PAZOPANIB LOXAPINE NIFEDIPINE DILTIAZEM NEFAZODONE HALOPERIDOL VANDETANIB SORAFENIB CHLORPROTHIXENE AFATINIB NINTEDANIB QUETIAPINE THIORIDAZINE RUXOLITINIB PYRILAMINE ARIPIPRAZOLE PILOCARPINE IMATINIB TOFACITINIB

AXITINIB CRIZOTINIB TRIFLUOPERAZINE NICARDIPINE GEFITINIB CLOZAPINE OLANZAPINE BUSPIRONE

AMOXAPINE SUNITINIB LAPATINIB ERLOTINIB DOXAZOSIN CERITINIB

APOMORPHINE PRAMIPEXOLE TAMSULOSIN HISTAMINE

PERGOLIDE ASENAPINE PRAZOSIN CARBACHOL CHLORIDE

ROPINIROLE

ACETYLCHOLINE TERAZOSIN ALFUZOSIN PONATINIB

DOPAMINE PHENTOLAMINE BROMOCRIPTINE VORINOSTAT

FENOLDOPAM METHACHOLINE

NICOTINE BELINOSTAT

OXYMETAZOLINE

TOLAZOLINE BRIMONIDINE VARENICLINE ETHINYL ESTRADIOL DOCETAXEL

CLONIDINE ROMIDEPSIN ROTIGOTINE Dopaminergics TESTOSTERONE TUBOCURARINE

PHENYLEPHRINE APRACLONIDINE GUANABENZ FULVESTRANT ESTRADIOL

ERIBULIN

RALOXIFENE VINORELBINE BASE VINBLASTINE NOREPINEPHRINE DEXMEDETOMIDINE BAZEDOXIFENE

EPINEPHRINE MITOMYCIN TAMOXIFEN DIETHYLSTILBESTROL COLCHICINE DAUNORUBICIN ESTRONE ARTENIMOL

LEVONORDEFRIN METAPROTERENOL VINCRISTINE PACLITAXEL

DIGITOXIN DOXORUBICIN MILTEFOSINE TERBUTALINE EPIRUBICIN

PYRIMETHAMINE ETOPOSIDE PODOFILOX MITOXANTRONE

DIGOXIN CYTARABINE

ISOPROTERENOL FLUOROURACIL

DACTINOMYCIN TRIMETREXATE FORMOTEROL LEVOSALBUTAMOL TOPOTECAN PRALATREXATE GEMCITABINE

DEFEROXAMINE THIOGUANINE AZACITIDINE TENIPOSIDE DNA replication & regulation

VILANTEROL SALMETEROL FLOXURIDINE PENTAMIDINE

PINDOLOL

IRINOTECAN MELPHALAN ARFORMOTEROL INDACATEROL LORATADINE

MERCAPTOPURINE

CLADRIBINE

METOPROLOL ALBUTEROL

BETAXOLOL SOTALOL THIOTEPA

CLOFARABINE Oncology oncology

DELAVIRDINE ENZALUTAMIDE BEXAROTENE LOVASTATIN CAPTOPRIL GRANISETRON EFAVIRENZ ALITRETINOIN

LISINOPRIL

ENALAPRILAT METOCLOPRAMIDE

QUINAPRIL

ETRAVIRINE BICALUTAMIDE TRIAZOLAM NEVIRAPINE ROSIGLITAZONE TRETINOIN ONDANSETRON STAVUDINE PRAVASTATIN ENALAPRIL CHLORDIAZEPOXIDE ISOTRETINOIN

CISAPRIDE

ROSUVASTATIN TAZAROTENE VORTIOXETINE SIMVASTATIN

ZOLPIDEM ZALCITABINE DEXAMETHASONE TROGLITAZONE ALPRAZOLAM MIFEPRISTONE PIOGLITAZONE FLUMAZENIL

ADAPALENE

ZIDOVUDINE CERIVASTATIN DIDANOSINE ATORVASTATIN

TELMISARTAN

ESZOPICLONE PROGESTERONE LOSARTAN

PLERIXAFOR

VALSARTAN NORETHINDRONE

SAQUINAVIR DARUNAVIR ATAZANAVIR MEDROXYPROGESTERONE

LOPINAVIR NELFINAVIR

RITONAVIR

AMPRENAVIR

INDINAVIR

IPRATROPIUM ACLIDINIUM SAXAGLIPTIN PARGYLINE DEGARELIX FLUTICASONE FUROATE OXYTOCIN MIGLUSTAT SILDENAFIL ALPROSTADIL SIROLIMUS NILUTAMIDE SULFISOXAZOLE CANAGLIFLOZIN DEXMETHYLPHENIDATE BOSENTAN BUDESONIDE VARDENAFIL CYCLOSPORINE FLUTAMIDE DAPAGLIFLOZIN METHYLPHENIDATE MACITENTAN

LINAGLIPTIN GANIRELIX ACETATE

ILOPROST VASOPRESSIN MIGLITOL SITAGLIPTIN CETRORELIX TIOTROPIUM GLYCOPYRROLATE FLUTICASONE PROPIONATE TADALAFIL TACROLIMUS

SELEGILINE ALOGLIPTIN ABARELIX

PHENELZINE EPOPROSTENOL

DESMOPRESSIN ACARBOSE

RASAGILINE

ENTACAPONE SINCALIDE THEOPHYLLINE APIXABAN TELAPREVIR CARFILZOMIB LEFLUNOMIDE GONADORELIN CARBOPLATIN GANCICLOVIR CALCITRIOL EPLERENONE PROCYCLIDINE

TOLCAPONE PENTAGASTRIN CAFFEINE DEXTROTHYROXINE RIVAROXABAN BOCEPREVIR BORTEZOMIB TERIFLUNOMIDE LEUPROLIDE OXALIPLATIN CIDOFOVIR CALCIPOTRIENE SPIRONOLACTONE SULFASALAZINE Zwierzyna & Overington, unpublished Assay Clustering Using Word Embedding

Word2vec clustering on noun phrases from ChEMBL phenotypic (F) assays

Zwierzyna & Overington, unpublished PCA of Word2Vec Assay Descriptions

Each assay description: average over its word vectors. Data points projected from a 200-dimensional space to 2D using PCA

Zwierzyna & Overington, unpublished Word2vec Embedding of Assays ChEMBL assays of known drugs annotated with different ATC codes (~15k of ~94k) N03 (antiepileptic) M01 (anti-inflammatory) L01 (antineoplastic)

C02 (antihypertensive) A10 (antidiabetic) N02 ()

Overington,unpublished Zwierzyna Zwierzyna & Word2vec Assay Graph C09 angiotensin system

C02 anithypertensive G04 urological J01 antibacterial L01 antineoplastic

A10 antidiabetic P01 N03 antiepileptic N05 C10 lipid N03 antiepileptic modifying

M01 anti-inflammatory N02 M02 muscular pain M02 muscular pain L01 antineoplastic C01 cardiac therapy A03 antiemetics A07 antidiarrheals N06 antidepressants N01 N05 psycholeptics

A11 vitamins Zwierzyna & Overington, unpublished Acknowledgements

Bissan Al-Lazikani Aroon Hingorani, Marc Marti-Renom Juan Pablo-Casas Francesco Martinez

Magda Zwierzyna, Mark Davies Krister Wennerberg

WT086151/Z/08/Z (2008-2014) WT104104/Z/14/Z (2014-2019) Medicines Discovery Catapult Mereside, Alderley Park, Alderley Edge, Cheshire, SK10 4TG md.catapult.org.uk @MediDiscCat [email protected]