<<

Rationalizing CMap Gene Expression Readouts via Target Prediction

Nolen Joy Perualila Non-Clinical Statistics Conference 2014 9 October 2014 RESEARCH GROUP

Hasselt University,Belgium Nolen Joy Perualila Ziv Shkedy Durham University,UK Adetayo Kasim Cambridge University,UK Aakash Chavan Ravindranath Andreas Bender Janssen Pharmaceutica NV,Belgium Luc Bijnens Willem Talloen Hinrich W.H. Gohlmann¨ QSTAR http://qstar-consortium.org

N. J. Perualila NCS2014 2/18 OUTLINE

1 Background Mechanism of Action (MoA) of Compounds

2 Data Sources Target prediction Data Gene expression Data

3 Analysis Flow

4 Results Associating Protein Targets to compounds Associating Genes with compounds Gene-set Enrichment Using Pathways to understand MoA Biclustering of CMap Gene expression data

5 Discussion

N. J. Perualila NCS2014 3/18 MOA OF COMPOUNDS

Connectivity Map In silico

Aim: To find subsets of compounds that share similar target prediction and gene expression profiles.

N. J. Perualila NCS2014 4/18 Lead compounds must be able to bind well to the target protein like a key into a lock. If a drug binds to one protein, its drug target, it may also bind to another one (or many) (non-selective ligands).

EARLY DRUG DISCOVERY

Compound Protein Target The development of every drug begins with the search for a target on which the drug can act.

N. J. Perualila NCS2014 5/18 If a drug binds to one protein, its drug target, it may also bind to another one (or many) (non-selective ligands).

EARLY DRUG DISCOVERY

Compound - Protein Target The development of every drug begins with the search for a target on which the drug can act. Lead compounds must be able to bind well to the target protein like a key into a lock.

N. J. Perualila NCS2014 5/18 EARLY DRUG DISCOVERY

Compound - Protein Target The development of every drug begins with the search for a target on which the drug can act. Lead compounds must be able to bind well to the target protein like a key into a lock. If a drug binds to one protein, its drug target, it may also bind to another one (or many) (non-selective ligands).

N. J. Perualila NCS2014 5/18 EARLY DRUG DISCOVERY The development of every drug begins with the search for a target on which the drug can act. Lead compounds must be able to bind well to the target protein like a key into a lock. If a drug binds to one protein, its drug target, it may also bind to another one (or many) image source: http://vds.cm.utexas.edu/ (non-selective ligands). Which drugs will bind to which protein?

N. J. Perualila NCS2014 5/18 OVERVIEW:TARGET PREDICTION TOOL

Calculates the likelihood of binding for every protein target (see Koutsoukas,2011)..

N. J. Perualila NCS2014 6/18 Drugs sharing common targets result in similar gene-expression profiles.

COMPOUNDS,TARGETS, AND GENES

Ligand-binding modifies the biological functions of protein target, a series of target-related downstream genes are then influenced.

image source: http://cc.scu.edu.cn/G2S/eWebEditor/uploadfile/20120810155023582.jpg

N. J. Perualila NCS2014 7/18 COMPOUNDS,TARGETS, AND GENES

Ligand-binding modifies the biological functions of protein target, a series of target-related downstream genes are then influenced. Drugs sharing common targets result in similar gene-expression profiles. image source: http://cc.scu.edu.cn/G2S/eWebEditor/uploadfile/20120810155023582.jpg

N. J. Perualila NCS2014 7/18 DATA SOURCES

1 The target prediction score  ...  T11 T12 T1I matrix (binary)  T T ... T   21 22 2I  (J targets x I compounds)  ....    T =     ....  1 compound i hit target j,   Tji =  ....  0 otherwise. TJ1 TJ2 ... TJI

  X11 X12 ... X1I  X X ... X  2 The gene expression matrix  21 22 2I   ....    (G genes x I compounds) X =   .  ....    X = expression level of  ....  gi . gene g for compound i XG1 XG2 ... XGI

N. J. Perualila NCS2014 8/18 APPLICATION:CONNECTIVITY MAP

I = 35 compounds. MC7 cell line, 6 hours after exposure, dose at 10 micromolars. G' 2400 genes after pre-processing. J = 477 protein targets.

N. J. Perualila NCS2014 9/18 For every target- driven compound cluster a cluster of compounds What are the shared targets? What genes are Fisher’s Exact differentially Test: top K targets expressed? LIMMA: top N Which Pathways differentially expressed genes biological pathways are overlap affected? Pathways/ MLP

ANALYSIS FLOW

Step I

Target-prediction based Clustering of Compounds

Step II

N. J. Perualila NCS2014 10/18 What are the shared targets? What genes are Fisher’s Exact differentially Test: top K targets expressed? LIMMA: top N Which Pathways differentially expressed genes biological pathways are overlap affected? Pathways/ MLP

ANALYSIS FLOW

Step I

Target-prediction based Clustering For every target- of Compounds driven compound cluster

a cluster of compounds

Step II

N. J. Perualila NCS2014 10/18 What genes are differentially expressed? LIMMA: top N Which Pathways differentially expressed genes biological pathways are overlap affected? Pathways/ MLP

ANALYSIS FLOW

Step I

Target-prediction based Clustering For every target- of Compounds driven compound cluster a cluster of compounds What are the shared targets? Step II Target j Fisher’s Exact 0 Test: top K targets cluster 1 in out

N. J. Perualila NCS2014 10/18 What genes are differentially expressed? LIMMA: top N Which differentially expressed genes biological pathways are overlap affected? Pathways/ MLP

ANALYSIS FLOW

Step I

Target-prediction based Clustering For every target- of Compounds driven compound cluster a cluster of compounds What are the shared targets? Step II

Fisher’s Exact Test: top K targets

Pathways

N. J. Perualila NCS2014 10/18 Which biological pathways are overlap affected? Pathways/ MLP

ANALYSIS FLOW

Step I

Target-prediction based Clustering For every target- of Compounds driven compound cluster a cluster of compounds What are the shared targets? Step II What genes are Fisher’s Exact differentially Test: top K targets expressed? LIMMA: top N Pathways differentially expressed genes

N. J. Perualila NCS2014 10/18 Which biological pathways are overlap affected?

ANALYSIS FLOW

Step I

Target-prediction based Clustering For every target- of Compounds driven compound cluster a cluster of compounds What are the shared targets? Step II What genes are Fisher’s Exact differentially Test: top K targets expressed? LIMMA: top N Pathways differentially expressed genes

Pathways/ MLP

N. J. Perualila NCS2014 10/18 ANALYSIS FLOW

Step I

Target-prediction based Clustering For every target- of Compounds driven compound cluster a cluster of compounds What are the shared targets? Step II What genes are Fisher’s Exact differentially Test: top K targets expressed? LIMMA: top N Which Pathways differentially expressed genes biological pathways are overlap affected? Pathways/ MLP

N. J. Perualila NCS2014 10/18 TARGET PREDICTION-BASED CLUSTERING

Similarity matrix based on Tanimoto scores.

Color Key

0 0.6 Value

W−13 butein imatinib probucol rofecoxib celecoxib LM−1685 metformin diclofenac tioguanine SC−58125 phenformin LY−294002 nitrendipine benserazide fluphenazine dexverapamil flufenamic arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide N−phenylanthranilic acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane compounds

N. J. Perualila NCS2014 11/18 ASSOCIATING TARGETSTOCOMPOUNDS

Identify the top predicted protein targets of compounds in cluster 1. butein W−13 fasudil imatinib probucol rofecoxib celecoxib diclofenac LM−1685 tioguanine clozapine felodipine metformin nifedipine haloperidol SC−58125 LY−294002 phenformin verapamil benserazide thioridazine nitrendipine fluphenazine flufenamic acid flufenamic dexverapamil trifluoperazine chlorpromazine phenyl biguanide phenyl arachidonic acid arachidonic prochlorperazine 4,5−dianilinophthalimide tetraethylenepentamine N−phenylanthranilic acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane

D.2._dopamtor

Histamine_tor

NADPH_oxide_1

Cytochrome2D6 y Muscarinic_M1 Targets Muscarinic_M3

D.1B._dopator

X5.hydroxyr_6

Compounds

N. J. Perualila NCS2014 12/18 IDI1 0.8 INSIG1 MSMO1 LPIN1 SQLE 0.6 HMGCS1 NPC2 BHLHE40

0.4

0.2 concentration

2 log

0.0

−0.2 W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine felodipine nifedipine verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin thioridazine LY−294002 nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide N−phenylanthranilic acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane

ASSOCIATING GENESWITH COMPOUNDS

Identify the most differentially expressed genes for compounds in cluster 1.

IDI1

SQLE MSMO1 INSIG1

MNT

SRSF7 HMGCS1 CCNG2 KLHL24 CCR1 SLC38A2 NPC2 PPIF SGCE PNO1 BARD1 HMGCRLPIN1 LDLR TGS1 −log p−value 0 2 4 6 8 10 12

−0.4 −0.2 0.0 0.2 log fold change

N. J. Perualila NCS2014 13/18 ASSOCIATING GENESWITH COMPOUNDS

Identify the most differentially expressed genes for compounds in cluster 1.

IDI1 0.8 INSIG1 MSMO1 IDI1 LPIN1 SQLE 0.6 HMGCS1 SQLE MSMO1 NPC2 INSIG1 BHLHE40

0.4

MNT

SRSF7 0.2 concentration

HMGCS1 CCNG2 KLHL24 CCR1 2 SLC38A2 NPC2 PPIF SGCE log PNO1 BARD1 HMGCRLPIN1 0.0 LDLR TGS1 −log p−value

−0.2 0 2 4 6 8 10 12 W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine felodipine nifedipine verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin thioridazine LY−294002 nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic

−0.4 −0.2 0.0 0.2 chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl

log fold change tetraethylenepentamine 4,5−dianilinophthalimide N−phenylanthranilic acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane

N. J. Perualila NCS2014 13/18 GO pathways containing the top Top genes contributing to choles- gene sets with MLP analysis. terol biosynthetic process.

Significance

GO:0006695\ 0 1 2 3 4 5 Significance of tested genes involvedSignificance of tested genes set GO:0006695 in gene biosynthetic

DHCR24:24−dehydrocholesterol reductase

G6PD:glucose−6−phosphate dehydrogenase

HMGCR:3−hydroxy−3−methylglutaryl−CoA reductas GO:0016126\ GO:0008203\ sterol biosynthetic cholesterol metabolic HMGCS1:3−hydroxy−3−methylglutaryl−CoA synthas IDI1:isopentenyl−diphosphate delta isomerase

INSIG1:insulin induced gene 1

INSIG2:insulin induced gene 2 GO:0046165\ GO:0016125\ GO:0006694\ biosynthetic sterol metabolic biosynthetic PEX2:peroxisomal biogenesis factor 2 MSMO1:methylsterol monooxygenase 1

SOD1:superoxide dismutase 1, soluble

SQLE:squalene epoxidase GO:0006066\ GO:0008202\ GO:0008610\ CNBP:CCHC−type finger, nucleic acid bind alcohol metabolic steroid metabolic lipid biosynthetic

BIOLOGICAL PATHWAYS:CLUSTER 1

Compounds Pathway Target Genes clozapine thioridazine chlorpromazine Steroid metabolic process Cytochrome P450 2D6 INSIG1 trifluoperazine LDLR prochlorperazine fluphenazine

clozapine, thioridazine, chlorpromazine, trifluoperazine, prochlorperazine,fluphenazine

INSIG1 CYP450 2D6 LDLR Steroid Metabolic Process

N. J. Perualila NCS2014 14/18 BIOLOGICAL PATHWAYS:CLUSTER 1

Compounds Pathway Target Genes clozapine thioridazine chlorpromazine Steroid metabolic process Cytochrome P450 2D6 INSIG1 trifluoperazine LDLR prochlorperazine fluphenazine

GO pathways containing the top Top genes contributing to choles- gene sets with MLP analysis. terol biosynthetic process.

Significance

GO:0006695\ 0 1 2 3 4 5 Significance of tested genes involvedSignificance of tested genes set GO:0006695 in gene cholesterol biosynthetic

DHCR24:24−dehydrocholesterol reductase

G6PD:glucose−6−phosphate dehydrogenase

HMGCR:3−hydroxy−3−methylglutaryl−CoA reductas GO:0016126\ GO:0008203\ sterol biosynthetic cholesterol metabolic HMGCS1:3−hydroxy−3−methylglutaryl−CoA synthas IDI1:isopentenyl−diphosphate delta isomerase

INSIG1:insulin induced gene 1

INSIG2:insulin induced gene 2 GO:0046165\ GO:0016125\ GO:0006694\ alcohol biosynthetic sterol metabolic steroid biosynthetic PEX2:peroxisomal biogenesis factor 2 MSMO1:methylsterol monooxygenase 1

SOD1:superoxide dismutase 1, soluble

SQLE:squalene epoxidase GO:0006066\ GO:0008202\ GO:0008610\ CNBP:CCHC−type zinc finger, nucleic acid bind alcohol metabolic steroid metabolic lipid biosynthetic

N. J. Perualila NCS2014 14/18 Biclustering of gene expression data provides a simultaneous local search of a subset of genes for which a similar expression profiles were detected across a subset of compounds

GENE EXPRESSION DATA ANALYSIS

STAG1 PQBP1 MPHOSPH10 AKAP9 BET1 RASGRP1 HIST1H2BG USPL1 CCDC28A HIST1H2BD TRIB1 ZNF586 LRPPRC TXNRD1 FAM117A FAM13A SPRY1 CDK4 LOC100293516 AKR1C3 LDLR TOMM6 ARPC5 FLOT1 SMC4 HNRNPR NET1 TRIM13 CNIH PRMT3 LOC100506168 MSMO1   AKR1C2 DDIT4 RTN2 IRX5 ... GADD45A X X X DHRS2 11 12 1I PDIA6 SPATA1 SLC6A8 CDKN1B SQLE LPIN1 HIST2H2BE GCLC GCLM MIR22HG MED6 SPRY2  ...  PPIF X X X BHLHE40 21 22 2I RBM7 RBM5 KIF20A LOC100129361 FGL2   HMGCR CDKN1A GIT1 PELO INSIG1 TOB1   BCAP31 DHRS9

.... GENES HIST1H2BK HBP1 KDM3A HIST1H2AE   LOC100507619 TXNDC9 = . LOC100506469 X FABP4 CALHM2 CDK7 ATP9A   PDCD6IP CEBPZ .... CDK2AP2 NEAT1 UBA2 NPC1   CYP46A1 LOC100506963 IDI1 MAPRE2 NQO1 HIST1H3B SF3B4   ADCK3 LOC100505761 TSPAN1 .... HMGCS1 STAR NAMPT   SLC7A11 SETX OSGIN1 LPAR6 TAF15 EIF1 SH2B3 POP7 HIST1H3D TOM1L1 RBM4B ... SERPINE1 X X X MRPS31 G1 G2 GI MICA TSPAN5 ZMPSTE24 PMAIP1 CDH11 HCG9 HIST1H1C HMGXB4 SAE1 HMOX1 ARL4C CFLAR W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine nifedipine felodipine verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin LY−294002 thioridazine nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide N−phenylanthranilic acid N−phenylanthranilic COMPOUNDS 15−delta prostaglandin J2 arachidonyltrifluoromethane

Heatmap of the Expression Profiles

N. J. Perualila NCS2014 15/18 GENE EXPRESSION DATA ANALYSIS

INSIG1 MSMO1 SQLE IDI1 LPIN1 HMGCR HMGCS1 BHLHE40 NPC1 FAM117A DDIT4 LDLR NEAT1 HMOX1 TXNRD1 SLC7A11 AKR1C3 AKR1C2 GCLM OSGIN1 MIR22HG SERPINE1 GCLC NQO1 PELO PMAIP1 GADD45A ZNF586 FABP4 RBM4B STAR MAPRE2   GIT1 TAF15 SLC6A8 CFLAR ... CYP46A1 X X X SETX 11 12 1I FGL2 HIST1H2BD HIST2H2BE HIST1H3D CCDC28A HBP1 RTN2 HIST1H3B HIST1H2AE HIST1H1C HIST1H2BK CALHM2  ...  ADCK3 X X X KDM3A 21 22 2I HIST1H2BG MED6 LOC100129361 PDCD6IP TOMM6   SH2B3 LOC100293516 TOM1L1 HMGXB4 SPATA1 LOC100505761   LOC100506168 LOC100506469

.... GENES LOC100506963 MICA LOC100507619 SMC4   UBA2 SAE1 = . ATP9A X PQBP1 ARPC5 TSPAN5 CDH11   TSPAN1 PPIF .... KIF20A ARL4C RASGRP1 LRPPRC   PDIA6 BCAP31 NAMPT TOB1 AKAP9 FAM13A CEBPZ   LPAR6 DHRS9 CNIH .... RBM7 RBM5 TXNDC9   PRMT3 MPHOSPH10 CDK4 DHRS2 TRIM13 USPL1 EIF1 FLOT1 TRIB1 CDK7 HNRNPR ... MRPS31 X X X POP7 G1 G2 GI SPRY1 SPRY2 HCG9 SF3B4 CDK2AP2 IRX5 ZMPSTE24 CDKN1A STAG1 NET1 CDKN1B BET1 W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine felodipine nifedipine verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin LY−294002 thioridazine nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide COMPOUNDS acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane Biclustering of gene expression data provides a simultaneous local search of a subset of genes for which a similar expression profiles were detected across a subset of compounds

N. J. Perualila NCS2014 15/18 ⇒ A bicluster Biclustering on Gene expression data using FABIA identifies similar cluster of compounds and subset of genes

BICLUSTERING OF GENE EXPRESSION DATA

Target prediction-based clustering IDI1 0.8 INSIG1 MSMO1 of compounds + identification of LPIN1 SQLE 0.6 HMGCS1 differentially expressed genes for a NPC2 compound cluster of interest. BHLHE40 0.4 ⇒ Gives a subset of genes

0.2 concentration exhibiting consistent patterns over a 2 subset of compounds. log 0.0

−0.2 W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine felodipine nifedipine verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin thioridazine LY−294002 nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide N−phenylanthranilic acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane

N. J. Perualila NCS2014 16/18 Biclustering on Gene expression data using FABIA identifies similar cluster of compounds and subset of genes

BICLUSTERING OF GENE EXPRESSION DATA

Target prediction-based clustering IDI1 0.8 INSIG1 MSMO1 of compounds + identification of LPIN1 SQLE 0.6 HMGCS1 differentially expressed genes for a NPC2 compound cluster of interest. BHLHE40 0.4 ⇒ Gives a subset of genes

0.2 concentration exhibiting consistent patterns over a 2 subset of compounds. log 0.0 ⇒ A bicluster

−0.2 W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine felodipine nifedipine verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin thioridazine LY−294002 nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide N−phenylanthranilic acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane

N. J. Perualila NCS2014 16/18 identifies similar cluster of compounds and subset of genes

BICLUSTERING OF GENE EXPRESSION DATA

Target prediction-based clustering of compounds + identification of differentially expressed genes for a compound cluster of interest. ⇒ Gives a subset of genes exhibiting consistent patterns over a

subset of compounds. concentration

2 ⇒ A bicluster log Biclustering on Gene expression

data using FABIA −0.2 0.0 0.2 0.4 0.6 0.8 W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine felodipine nifedipine verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin thioridazine LY−294002 nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide N−phenylanthranilic acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane

FABIA Bicluster 1

N. J. Perualila NCS2014 16/18 BICLUSTERING OF GENE EXPRESSION DATA

Target prediction-based clustering

of compounds + identification of Genes HC differentially expressed genes for a Fabia compound cluster of interest. ⇒ Gives a subset of genes exhibiting consistent patterns over a

subset of compounds. concentration

2 ⇒ A bicluster log Biclustering on Gene expression

data using FABIA −0.2 0.0 0.2 0.4 0.6 0.8 W−13 butein fasudil imatinib probucol rofecoxib celecoxib LM−1685 clozapine felodipine nifedipine

identifies similar cluster of verapamil metformin diclofenac tioguanine SC−58125 haloperidol phenformin thioridazine LY−294002 nitrendipine benserazide fluphenazine dexverapamil trifluoperazine flufenamic acid flufenamic chlorpromazine arachidonic acid arachidonic prochlorperazine phenyl biguanide phenyl tetraethylenepentamine 4,5−dianilinophthalimide compounds and subset of genes acid N−phenylanthranilic 15−delta prostaglandin J2 arachidonyltrifluoromethane

FABIA Bicluster 1

N. J. Perualila NCS2014 16/18 DISCUSSION

The similarity of the biclustering results with the integrated approach shows that accounting for another source of information in the analysis of gene expression data gives a more refined search of ‘biclusters’ containing a subset of ‘mechanistically’ related compounds regulating a subset of functionally related genes. Combining two sources of data provides a better understanding of the mechanism of action of a compound cluster. The approach is not only limited to the use of gene expression and target prediction data.

N. J. Perualila NCS2014 17/18 THANK YOU!

N. J. Perualila NCS2014 18/18