<<

Vilar et al - 2008

Supplementary Methods.

RNA isolation and Microarray Procedures. Primary tumors and normal tissues were obtained at the time of surgical resection, then they were snap frozen in liquid nitrogen at -80qC, embedded in optimum cutting temperature freezing media (Miles Scientific, Naperville, IL), cryotome sectioned, stained with H&E, and evaluated by a surgical pathologist (T.J.G.). Areas with >70% tumor cellularity were identified for RNA isolation. Selected sections of tumor samples were homogenized in Trizol (Invitrogen, Carlsbad, CA), and total cellular RNA was purified according the instructions of the manufacturer, with additional purification using RNeasy spin columns (Qiagen, Valencia, CA). RNA quality was assessed by 1% agarose gel electrophoresis, and samples were included only if the 18S and 28S bands were discrete and approximately equal.

Expression levels were measured using Affymetrix (Santa Clara, CA) HuFL arrays consisting of

7129 probe sets, each representing a transcript. Preparation of target cRNA, hybridization, and scanning were performed according to the protocols of the manufacturer.

Expression analysis was done as previously described (1). In brief, each probe set typically consists of 20 perfectly complementary 25 base long probes as well as 20 mismatch probes that are identical except for an altered central base. Mismatch probe values were subtracted from the perfect match values and the middle 60% of these differences averaged as the expression measure for that probe set. A quantile normalization procedure was then used to adjust for differences in the probe intensity distribution across different arrays, in which a linear spline was applied to each array to make 99 evenly spaced quantiles (at .01, .02, ..., .99) have the same values for all of the arrays. The data was log-transformed using log(100 + max(X + 100, 0)). The mean expression levels for the two biological states of interest (MSI vs MSS) were compared by two-sampled t-test. Heatmap was generated using the publicly available software Java

Treeview.

Gene set enrichment testing. For enrichment testing of functional lists of we selected a larger set of genes that gave P<0.01 for a two-sample T-test and that also had average fold- differences between the groups of at least 1.2-fold, which selected 678 probe sets. Using 1000 data sets in which the sample labels were randomly permuted we obtained only 6.0% this many

1 Vilar et al - 2008 probe sets on average. We collapsed probe-sets to 5633 distinct genes using IDs, and called a gene different if any probe set for the gene was selected as differing, which resulted in selecting 308 up and 319 down genes. We tested these two lists of genes for over- representation in 201 KEGG pathways, using one-sided Fisher's exact tests, and estimated false discovery rates for the most significant intersections by repeating the tests with 100 random permutations of the gene identifiers.

External data sources and generation of compound list in “Connectivity

Map”. Watanabe set contained a list of 177 probe sets that were differentially expressed a significant level of P<0.05 (2) and Koinuma 24 probes at a significance of P<0.001 and an absolute difference in mean expression level t50 U (3). We selected the data set from Banerjea generated from sporadic MSI tumors only. Because this set contains a list of 2184 probes at a significance of P<0.005, we restricted it to 499 down and upregulated sorted by fold-change (4).

Kruhoffer et al built a maximum likelihood classifier using a ‘leave-one-out’ cross-validation scheme. This process included nine genes that were used in at least 70% of the cross validations (5). Probes have to be consistently annotated across platforms using HG-U133A

Affymetrix array nomenclature. Therefore, we mapped the probes lists across various generations of Affymetrix GeneChip arrays using the information provided by the company in its website. This process led to the loss of 107 out of 177 probes in the data set from Watanabe data set and 7 out of 24 in Koinuma data set which used the HG-U133A&B platform. In the case of MECC signature, which was built from HuFL gene chips, probe sets were mapped using

BLAST. Only one probe set (M11119_at) failed to be located in HG-U133A. The final probe sets from each publication submitted to Connectivity Map web site is detailed in Supplementary

Tables S2 and S16-19.

DNA Isolation and Determination of MSI in tumor samples. Briefly, forward primers for

BAT25, BAT26, D2S123, and D5S346 and reverse primers for D17S250 were labeled with [ -

32P]ATP, and included in a 20 µL PCR reaction that included 1 µL of microdissected DNA. PCR products were run on 6% polyacrylamide gels for 3 hours at 65 W, and exposed to film at –

80°C for 12 to 20 hours. Films were double scored and entered as stable, unstable, or loss of

2 Vilar et al - 2008 heterozygosity. Tumors were designated as MSI-high if there was instability at two or more markers. For uninformative markers, a sixth marker from the consensus panel was substituted.

MLH1 immohistochemical analysis in tumor samples. A total of 19 samples were analyzed by immunohistochemistry (5 MSI-H and 14 MSS cases) as previously described (6).

MLH1 and MSH2 germline mutational studies. We have studied the germline mutational status of MLH1 and MSH2 in all the patients harboring MSI-H and BRAF-negative tumors.

MLH1 and MSH2 mutations were assessed by direct sequencing. Primer sequences and PCR reactions are available upon request. In addition MLPA assay was performed on 4 patients’

DNA samples in whom mutations had not been previously detected by direct sequencing in order to detect delections. Reagents for MLPA test were purchased from MRC-Holland

(Amsterdam, the Netherlands). Details on methods, probes sequences and protocol can be found on www.mrc-holland.com

Cytotoxicity experiments. For cytotoxicity experiments 3 u 103 cells were seeded in triplicate per well. After 24 hours medium was replaced with medium containing seven different concentrations of each drug, as it is shown in Supplementary Information Table S9. In control wells cells were fed with standard medium. After 5 days on treatment cell viability was estimated on the basis of their ability to metabolize the tretazolium salt WST-1 to formazan by mitochondrial dehydrogenases. Quantification of absorbances was analyzed using a

Spectramax 190 (Molecular Devices Corporation, Sunnyvale, CA). The percentage of surviving cells at each concentration relative to the non-treated group was recorded. The drug concentrations resulting in 50% of growth inhibition (IC50) were then estimated by fitting a sigmoid shaped dose-response curve using the drc package in R, also reconfirmed by using

GraphPad Prism 4.0 (GraphPad Software, San Diego, CA).

Differences in drug sensitivity (mean IC50 of three independent experiments for every cell line) between cell lines classified by microsatellite status (MSS and MSI-H) were determined using an unpaired two-sided t-test considering P<0.05 as statistically significant. In addition, we explore independently the cell lines with displaying MSI-H due to hypermethylation of hMLH1 promoter (Meth-MSI). The following comparisons were established: MSS vs MSI-H, MSS vs

3 Vilar et al - 2008 met-MSI. The same comparisons were established using cell proliferation data at a certain drug concentration.

Analysis of BRAF and KRAS mutations. Mutation analyses for BRAF V600E mutation and

KRAS codon 12/13 mutations were done on DNA extracted from cell lines using a DNeasy kit

(Qiagen, Valencia, CA) and from paraffin embedded tumor slides. A 108-bp fragment of the

BRAF gene spanning codon 600 was amplified using the following primers: F, 5'-

TTCATGAAGACCTCACAGTAAAAA-3' and R, 5'-GTCTGGATCCATTTTGTGGA-3'. Similarly,

KRAS codon 12 and 13 mutations were also analyzed as described previously (7). An amplified

216-bp fragment that spanned codons 12 and 13 was amplified using the following primers: F,

5'-TGTGTGACATGTTCTAATATAGTCAC-3' and R, 5'-TTACTGGTGCAGGACCATTCT-3'. After gel electrophoresis to confirm the amplified product, the PCR product was cleaned using

QIAquick Purification kit (Qiagen Valencia, CA). A 3-µL aliquot of the product was sequenced using the forward and reverse primer with an ABI BigDye TerV3.1 Cycle Sequencing kit on an

ABI 9700 thermocycler (Applied Biosystems, Inc., Foster City, CA). The data were analyzed using Sequencher 4.6 software (Gene Codes Corporation, Ann Arbor, MI).

MSI analysis of cell lines under treatment. Microsatellite markers loci were amplified by three multiplex PCRs as following: 1) D10S197, BAT26, beta-catenin; 2) D18S58, BAT40, D2S123; 3)

D17S250, BAT25, TGF-b-RIIF, D5S346F (See Table S15 for primers sequences). All three

PCRs were performed under the same conditions using AmpliTaqGold DNA Polymerase

(Applied Biosystems, Foster City, CA): the 10 min of initiation step at 95°C followed by three cycles of 95°C for 1 min, 57°C for 1 min, 72°C for 1.5 min and then 40 cycles of 95°C for 1 min,

51°C for 1 min, 72°C for 1.5 min, and finished by the final elongation step of 72°C for 10 min.

The PCR fragments were detected by capillary electrophoresis on ABI370 at the University of

Michigan Sequencing Core. The chromatograms were analyzed using GeneMarker v1.51 software (SoftGenetics, LLC., State Collage, USA). The patterns of the microsatellite markers before and after treatment were compared to find changes in markers length.

4 Vilar et al - 2008

Cell cycle analysis and Annexin V assays. For analysis cells were washed in cold

PBS, fixed in 70% ethanol, washed, resuspended in 25 µg/ml Propidium Iodine (PI) with 100

µg/ml RNase A (BD Pharmigen, San Jose, CA), and incubated for 30 min at 37°C. Fluorescence was measured on a BD Biosciences FACSCaliburs flow cytometer within 1 h. Data were analyzed using the MODFIT 2.0 program (Verity Software House, Topsham, ME).

For Annexin V assays cells were washed in cold PBS and suspended in 100 µL Annexin-V binding buffer containing 5 µg/ml PI and 0.5 µg/ml Annexin-V-FITC, incubated for 15 min at room temperature in the dark and diluted in 400 µL Annexin-V binding buffer (BD Pharmigen,

San Jose, CA). Fluorescence was measured on a BD Biosciences FACSCalibur flow cytometer within 1 h. The collected events were gated on the forward and side scatter plots to exclude cellular debris.

References:

1. Giordano TJ, Shedden KA, Schwartz DR, et al. Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles. Am J Pathol

2001;159: 1231-8.

2. Watanabe T, Kobunai T, Toda E, et al. Distal colorectal cancers with microsatellite instability

(MSI) display distinct gene expression profiles that are different from proximal MSI cancers.

Cancer Res 2006;66: 9804-8.

3. Koinuma K, Yamashita Y, Liu W, et al. Epigenetic silencing of AXIN2 in colorectal carcinoma with microsatellite instability. Oncogene 2006;25: 139-46.

4. Banerjea A, Ahmed S, Hands RE, et al. Colorectal cancers with microsatellite instability display mRNA expression signatures characteristic of increased immunogenicity. Mol Cancer

2004;3: 21.

5. Kruhoffer M, Jensen JL, Laiho P, et al. Gene expression signatures for colorectal cancer microsatellite status and HNPCC. Br J Cancer 2005;92: 2240-8.

6. Lindor NM, Burgart LJ, Leontovich O, et al. Immunohistochemistry versus microsatellite instability testing in phenotyping colorectal tumors. J Clin Oncol 2002;20: 1043-8.

5 Vilar et al - 2008

7. Ogino S, Kawasaki T, Brahmandam M, et al. Sensitive sequencing method for KRAS mutation detection by Pyrosequencing. J Mol Diagn 2005;7: 413-21.

6 Vilar et al - 2008

Supplementary Figures Legends.

Figure S1. (A) Microsatellite and KRAS mutational status and response to Rapamycin in vitro. Analysis of cell viability using the WST-1 reagent in CRC cell lines treated with Rapamycin 17 µM. Rapamycin response in CRC cell lines separated according to microsatellite status (MSS vs MSI-H, P=NS), mismatch repair gene status (MSS vs Meth-MSI, P=0.053), or according to presence or absence of mutations in the KRAS pathway (P=NS). Note that exclusion of HCT- 116 of the analysis results in significant differences between MSI-H* and MSS (P=0.02). (B) Microsatellite and KRAS mutational status and response to LY-294002 in vitro. Analysis of cell viability in CRC cell lines treated with LY-294002 17 µM. LY-294002 response in CRC cell lines separated according to microsatellite status (MSS vs MSI,-H **P=0.0021), mismatch repair gene status (MSS vs Meth-MSI, *P=0.0157), or according to presence or absence of mutations in the KRAS pathway (P=NS)

Figure S2. Number of cells in G1 phase determined by flow cytometry performed on SW-48, RKO and HT-29 cell lines treated with Rapamycin 3 µM during five consecutive days and controls grew in drug-free supplemented medium. Note that MSI-H cells experienced a Values significantly higher (*) than control values are indicated. Error bars, SD.

7 Vilar et al - Figure S1

A B Vilar et al - Figure S2 Vilar et al - 2008

Supplementary Tables.

Table S1. Clinicopathologic characteristics of CRC patients included in the gene expression signature defining MSI tumors from MECC. Overall, no patient met Amsterdam criteria. We found that four patients that were MSI-H harbor BRAF V600E mutation; this fact is consistent with a sporadic origin. In addition, we completed a molecular diagnostic workup for all the MSI-H

BRAF negative cases, and found a truncating mutation (MSH2 Q324X) and a missense mutation (MSH2 A636P) in two different cases. The MLH1 positive case* was weakly positive for MLH1 and negative for MHS2 immunohistochemistry.

Characteristics MSI (n = 13) MSS (n = 38) P-value Age Years 71.9 74.5 NS Range 48.7 - 83.8 23.1 - 95.08 Patient gender Male 8 20 NS Female 5 18 Tumor Site Proximal colon 11 11 0.006 Distal colon 1 18 Rectum 1 6 Not available 0 3 Histological type Well-differentiated 4 4 0.081 Moderate-differentiated 7 32 Poor-differentiated 2 1 Not available 0 1 Duke’s Stage A 1 5 NS B 8 21 C 4 9 D 0 2 Not available 0 1 ML1H Immunohistochemistry Positive 1* 14 NA Negative 4 0 Not assessed 8 24 BRAF V600E mutational status Positive 4 0 0.004 Negative 9 33 Not assessed 0 5

8 Vilar et al - 2008

Table S2. HuFL Affymetrix probe set IDs from MECC data set. Upregulated and downregulated probes in MSI tumors have been separated by the line. HuFL probe ID Gene Gene Title Symbol M11567_rna1_at ANG , , RNase A family, 5 X51698_s_at TFF2 trefoil factor 2 U30828_at SFRS6 splicing factor, arginine/serine-rich 6 X75091_s_at SET SET translocation S72024_s_at EIF5A eukaryotic translation initiation factor 5A U27185_at RARRES1 retinoic acid receptor responder 1 U09770_at CRIP1 cysteine-rich 1 M31516_s_at CD55 CD55 molecule, decay accelerating factor for complement U51903_at IQGAP2 IQ motif containing GTPase activating protein 2 M24486_s_at P4HA1 procollagen-proline, 2-oxoglutarate 4-dioxygenase, alpha polypeptide I D00596_at TYMS thymidylate synthetase X76648_at GLRX glutaredoxin X76732_at NUCB2 nucleobindin 2 D89289_at FUT8 fucosyltransferase 8 X16135_at HNRNPL heterogeneous nuclear ribonucleoprotein L S67325_at PCCB propionyl Coenzyme A carboxylase, beta polypeptide U17969_at EIF5A eukaryotic translation initiation factor 5A U37690_at POLR2L polymerase II polypeptide L U14193_at GTF2A2 general factor IIA, 2 S82597_rna1_s_at GALNT1 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N- acetylgalactosaminyltransferase 1 L04490_at NDUFA9 NADH dehydrogenase 1 alpha subcomplex, 9 M19309_s_at TNNT1 troponin T type 1 L19872_at AHR aryl hydrocarbon receptor M31158_at PRKAR2B protein kinase, cAMP-dependent, regulatory, type II, beta D14710_at ATP5A1 ATP synthase, H+ transporting, mitochondrial F1 complex, alpha subunit 1, cardiac muscle M63175_at AMFR autocrine motility factor receptor U63717_at OSTF1 osteoclast stimulating factor 1 HG1862- CALM1 1 HT1897_at M83751_at ARMET arginine-rich, mutated in early stage tumors X66899_at EWSR1 Ewing sarcoma breakpoint region 1 Z12830_at SSR1 signal sequence receptor, alpha U04209_at MFAP1 Microfibrillar-associated protein 1 U40038_at GNAQ guanine nucleotide binding protein, q polypeptide L76703_at PPP2R5E protein 2, regulatory subunit B', epsilon isoform U30888_at USP14 ubiquitin specific peptidase 14 U19523_at GCH1 GTP cyclohydrolase 1 X85137_s_at KIF11 kinesin family member 11 U13022_at CASP2 caspase 2, apoptosis-related cysteine peptidase U08989_at SLC1A1 solute carrier family 1, member 1 L25441_at PGGT1B protein geranylgeranyltransferase type I, beta subunit U57093_at RAB27B RAB27B, member RAS oncogene family U79260_at FTO fat mass and obesity associated U72621_at PLAGL1 pleiomorphic adenoma gene-like 1 Z35102_at STK38 serine/threonine kinase 38 M18533_at DMD Dystrophin U02493_at NONO non-POU domain containing, octamer-binding L37043_at CSNK1E casein kinase 1, epsilon

9 Vilar et al - 2008

M11119_at ------X59871_at TCF7 transcription factor 7 D80002_at POFUT1 protein O-fucosyltransferase 1 U57627_at OCRL oculocerebrorenal syndrome of Lowe M82882_at ELF1 E74-like factor 1 X13916_at LRP1 low density lipoprotein-related protein 1 U07418_at MLH1 mutL homolog 1, colon cancer, nonpolyposis type 2 D10522_at MARCKS myristoylated alanine-rich protein kinase C substrate J04111_at JUN jun oncogene HG2994- ELN Elastin HT4850_s_at U59878_at RAB32 RAB32, member RAS oncogene family U35048_at TSC22D1 TSC22 domain family, member 1 X57346_at YWHAB tyrosine 3-monooxygenase/ 5-monooxygenase activation protein, beta polypeptide AB000220_at SEMA3C sema domain, immunoglobulin domain, short basic domain, secreted, 3C U66661_at GABRE gamma-aminobutyric acid A receptor, epsilon Z29067_at NEK3 NIMA -related kinase 3 L33881_at PRKCI protein kinase C, iota D50683_at TGFBR2 transforming growth factor, beta receptor II U12255_at FCGRT Fc fragment of IgG, receptor, transporter, alpha M55131_at CFTR cystic fibrosis transmembrane conductance regulator U49188_at SERINC3 serine incorporator 3 D86956_at HSPH1 heat shock protein 1 X14253_s_at TDGF1 teratocarcinoma-derived growth factor 1 M29874_s_at CYP2B7P1 cytochrome P450, family 2, subfamily B, polypeptide 7 pseudogene 1

10 Vilar et al - 2008

Table S3. List of probe-sets of genes that appear in at least two signatures in the discovery set.

Genes that appear in common in the “intersection” signature between discovery and validation set are marked with a shadow.

Discovery Set Gene Probe ID symbol 208050_s_at CASP Caspase 2, apoptosis-related cysteine peptidase 211140_s_at CASP Caspase 2, apoptosis-related cysteine peptidase 209812_x_at CASP Caspase 2, apoptosis-related cysteine peptidase 205081_at CRIP1 Cysteine-rich protein 1 201123_s_at EIF5A Eukaryotic translation initiation factor 5a 201122_x_at EIF5A Eukaryotic translation initiation factor 5a 202072_at HNRPL Heterogeneous nuclear ribonucleoprotein l 35201_at HNRPL Heterogeneous nuclear ribonucleoprotein l 206108_s_at SFRS6 Splicing factor, arginine/serine-rich 6 202589_at TYMS Thymidylate synthetase 206754_s_at CYP2B7P1 Cytochrome p450, family 2, subfamily b, polypeptide 7 pseudogene 1 206976_s_at HSPH1 Heat shock 105kda/110kda protein 1 202520_s_at MLH1 Mutl homolog 1, colon cancer, nonpolyposis type 2 206286_s_at TDGF1 Teratocarcinoma-derived growth factor 1

11 Vilar et al - 2008

Table S4. List of probe-sets of genes that appear in at least two signatures in the validation set.

Genes that appear in common in the “intersection” signature between discovery and validation set are marked with a shadow.

Validation Set Probe ID Gene Gene Title symbol 35201_at HNRPL Heterogeneous nuclear ribonucleoprotein l 202072_at HNRPL Heterogeneous nuclear ribonucleoprotein l 203444_s_at MTA2 Metastasis associated 1 family, member 2 206108_s_at SFRS6 Splicing factor, arginine/serine-rich 6 212062_at ATP9A Atpase, class ii, type 9a 216129_at ATP9A Atpase, class ii, type 9a 218345_at TMEM176A Hepatocellular carcinoma-associated antigen 112 222244_s_at TUG1 Taurine upregulated gene 1 209049_s_at ZMYND8 Protein kinase c binding protein 1

12 Vilar et al - 2008

Table S5. Compounds tested once in the Connectivity Map that obtained negative connectivity score less than -0.6 in at least two signatures in the discovery set. CS, connectivity score; n, number of signatures in which the individual compound obtained a CS less than -0.6.

Compound N Mean CS 3-hydroxy-L-kynurenine 2 -0.653 5114445 4 -0.746 5152487 3 -0.769 Azathioprine 4 -0.778 Celastrol 4 -0.741 Clotrimazole 4 -0.886 Cytochalasin B 2 -0.687 MG-132 3 -0.736 Pararosaniline 3 -0.795 Phenanthridinone 3 -0.713 Sulfasalazine 3 -0.798 Tioguanine 2 -0.846

13 Vilar et al - 2008

Table S6. Compounds tested once in the connectivity map that obtainednegative connectivity score less than - 0.6 r in at least two signatures in the validation set. CS, connectivity score; n, number of signatures in which the individual compound obtained a CS less than -0.6.

Compound n Mean CS (-)-catechin 2 -0.647 1,5-isoquinolinediol 2 -0.603 3-hydroxy-L-kynurenine 2 -0.723 DL-PPMP 2 -0.798 Gefitinib 3 -0.703 Pararosaniline 2 -0.679 Pentamidine 2 -0.666 Phenanthridinone 2 -0.717 Phenformin 2 -0.712 Phenylbiguanide 2 -0.741 Prednisolone 2 -0.716

14 Vilar et al - 2008

Table S7. Final list of compounds from discovery set. Total negative connectivity scores, total positive connectivity scores, percentage of the difference between the number of positive and negative connectivity scores, summary statistic and mean connectivity scores are presented for every compound. Summary measures stands for –6(log P-value) where the sum extends over each of the five compound lists used in the drug-discovery set (Watanabe et al, Koinuma et al and MECC data set, and the two artificially generated lists, union and intersection signatures).

Larger value of this measure indicates stronger evidence across lists generated from different studies. Note that this measure was not calculated for those compounds which did not receive a

P-value in all lists. Final ranking was established based on sequential ranking techniques as detailed in the text. Note that higher values for total negative connectivity scores, percentage of the difference and summary statistic lead to lower ranks, but for mean connectivity scores lower values lead to lower ranks. -ve, negative; +ve, positive; Mean CS, mean connectivity score;

Sum stat, summary statistic.

Rank Compound Total Total % dif. Sum Mean Appears in -ve +ve stat CS Validation set 1 LY-294002 33 4 78.4 10.66 -0.22 Y 2 17AAG 42 8 68.0 14.41 -0.21 Y 3 Trichostatin A 22 5 63.0 20.23 -0.17 N 4 17-DAG 8 1 77.8 10.91 -0.54 N 5 5224221 7 1 75.0 11.83 -0.42 N 6 Geldanamycin 16 4 60.0 7.60 -0.23 Y 7 9 0 100.0 3.86 -0.39 N 8 Monorden 16 5 52.4 11.62 -0.12 Y 9 Resveratrol 11 2 69.2 4.92 -0.24 N 10 Prazosin 7 1 75.0 5.31 -0.35 N 11 Rapamycin 17 7 41.7 5.44 -0.13 Y 12 11 1 83.3 1.58 -0.19 N 13 SC-58125 9 1 80.0 2.81 -0.16 Y 14 Calmidazolium 6 2 50.0 9.26 -0.29 N 15 Valproic Acid 26 15 26.8 6.81 -0.06 N 16 15-delta- J2 9 3 50.0 5.03 -0.17 Y 17 7 0 100.0 - -0.22 Y 18 Raloxifene 7 1 75.0 2.35 -0.24 N 19 Monastrol 11 6 29.4 5.28 -0.06 N 20 Tretinoin 11 4 46.7 2.41 -0.11 Y 21 Fulvestrant 9 7 12.5 11.24 -0.02 Y 22 Rofecoxib 10 5 33.3 3.65 -0.10 N 23 13 6 36.8 2.10 -0.08 Y 24 alpha-Estradiol 8 2 60.0 2.25 -0.10 Y 25 Celecoxib 7 3 40.0 4.40 -0.09 Y 26 Tetraethylenepentamine 8 6 14.3 4.10 -0.06 Y 27 Genistein 6 6 0.0 5.10 0.01 N

15 Vilar et al - 2008

Table S8. Final list of compounds from validation set. Total negative connectivity scores, total positive connectivity scores, percentage of the difference between the number of positive and negative connectivity scores, summary statistic and mean connectivity scores are presented for every compound. Summary measures stands for –6(log P-value) where the sum extends over each of the four compound lists used in the drug-discovery set (Kruhoffer et al and Banerjea et al, and the two artificially generated lists union and intersection signatures). Larger value of this measure indicates stronger evidence across lists generated from different studies. Note that this measure was not calculated for those compounds which did not receive a P-value in all lists.

Final ranking was established based on sequential ranking techniques as detailed in the text.

Note that higher values for total negative connectivity scores, percentage of the difference and summary statistic lead to lower ranks, but for mean connectivity scores lower values lead to lower ranks. -ve, negative; +ve, positive; Mean CS, mean connectivity score; Sum stat, summary statistic.

Rank Compound Total Total % dif. Sum Mean -ve +ve stat CS 1 Rapamycin 16 0 100.0 14.28 -0.27 2 LY-294002 31 5 72.2 24.36 -0.29 3 Wortmannin 14 2 75.0 5.85 -0.31 4 5252917 6 0 100.0 8.92 -0.49 5 17-AAG 31 11 47.6 7.99 -0.19 6 Monorden 15 3 66.7 3.56 -0.21 7 Geldanamycin 11 4 46.7 4.16 -0.21 8 Sodium Phenylbutyrate 13 3 62.5 3.31 -0.20 9 Tamoxifen 6 0 100.0 3.20 -0.27 10 SC-58125 8 2 60.0 4.07 -0.17 11 Fluphenazine 5 1 66.7 3.90 -0.13 12 Fulvestrant 11 10 4.8 7.86 -0.02 13 7 4 27.3 3.44 -0.15 14 Celecoxib 6 2 50.0 4.85 -0.05 15 Tetraethylenepentamine 9 6 20.0 2.41 -0.17 16 15-delta Prostaglandin J2 6 1 71.4 - -0.14 17 alpha-Estradiol 7 3 40.0 2.19 -0.13 18 Tretinoin 11 8 15.8 2.12 -0.10 19 Metformin 5 3 25.0 6.49 -0.02 20 4 3 14.3 1.01 -0.05

16 Vilar et al - 2008

Table S9. Cell lines selected to validate candidate compounds. MSI-H, high frequency microsatellite instability; MSS, microsatellite stability; mut, mutated; wt, wild type; meth,promoter hypermethylation; Mut-MSI, microsatellite instability cell lines harboring mutation in any of mismatch repair genes; Meth-MSI, microsatellite-instability due to hypermethylation of MLH1;

Cell line MSI status MLH1 MSH2 MSH6 KRAS BRAF MSI Group HCT-116 MSI-H mut wt wt G13D wt Mut-MSI LoVo MSI-H wt mut wt G13D wt Mut-MSI HCT-15 MSI-H wt wt mut G13D wt Mut-MSI HCT-8 MSI-H wt wt mut G13D wt Mut-MSI SW-48 MSI-H meth wt wt wt wt Meth-MSI RKO MSI-H meth wt wt wt V600E Meth-MSI HT-29 MSS wt wt wt wt V600E MSS SW-480 MSS wt wt wt G12V wt MSS

17 Vilar et al - 2008

Table S10. IC50 for Rapamycin, LY-294002, 17-AAG and Trichostatin A. Standard Deviations have been calculated based on the results of three independent experiments for every cell line.

SD, standard deviation; IC50, 50% inhibitory concentration.

Cell line RKO SW-48 HCT-116 HCT-15 LoVo HCT-8 HT-29 SW-480 Rapamycin IC50, [µM] 5.77 3.08 45.37 0.97 2.71 1.25 14.33 22.45 SD 0.75 2.54 7.52 0.67 2.29 0.11 3.23 4.58 LY-294002 IC50, [µM] 11.09 9.64 54.14 6.82 4.16 6.48 14.42 15.62 SD 4.71 1.47 21.52 1.29 0.78 2.44 3.81 2.38 17-AAG IC50, [Nm] 92.58 6.5 579.3 1340 1056 135.9 16.92 179.6 SD 15.46 2.5 124.2 1271 620 138.9 10.62 152.6 Trichostatin A IC50, [nM] 86.84 52.92 52.51 138.7 23.17 15.31 150.4 47.36 SD 38.46 22.03 42.47 22.78 0.79 3.86 152.4 23.97

18 Vilar et al - 2008

Table S11. IC50 comparisons in cell lines by microsatellite status. MSS, microsatellite stable; methMSI, MSI due to hypermethylation of MLH1; MSI-H, microsatellite intability-high; MSI-L, microsatellite instability-low.

Rapamycin Fold P-value change MSS versus MSI-H 1.87 0.540 MSS versus MSI-H (excluding HCT-116) 6.67 0.002 MSS versus meth-MSI 4.16 0.082 LY-294002

MSS versus MSI-H 0.98 0.990 MSS versus MSI-H (excluding HCT-116) 1.97 0.016 MSS versus meth-MSI 1.45 0.038

19 Vilar et al - 2008

Table S12. Gene Set Enrichment Testing performed from expression data from the MECC study. Upregulated and downregulated probes in MSI tumors have been separated by the line.

No. No. genes Actual Title FDR genes from P-value MECC Dentatorubropallidoluysian atrophy (DRPLA) 11 5 0.00016 0.05 Neurodegenerative Disorders 31 7 0.00115 0.055 Proteasome 18 5 0.00224 0.103 One carbon pool by folate 12 4 0.00305 0.105 VEGF signaling pathway 51 8 0.00592 0.138 Huntington's disease 25 5 0.01017 0.198 Oxidative phosphorylation 57 8 0.01158 0.19 Phosphatidylinositol signaling system 52 7 0.02195 0.272 Glycan structures - biosynthesis 1 31 5 0.0249 0.284 Adherens junction 54 7 0.02649 0.269 Inositol phosphate metabolism 32 5 0.02823 0.271 Alzheimer's disease 22 4 0.02936 0.260 Ether lipid metabolism 13 3 0.03076 0.247 Porphyrin and chlorophyll metabolism 23 4 0.03405 0.246 Keratan sulfate biosynthesis 6 2 0.03861 0.255 Protein export 6 2 0.03861 0.239 Pyrimidine metabolism 48 6 0.04513 0.284 O-Glycan biosynthesis 7 2 0.05213 0.325 N-Glycan biosynthesis 16 3 0.05339 0.326 signaling pathway 38 5 0.05406 0.314 Apoptosis 63 7 0.05477 0.301 Long-term depression 64 7 0.05876 0.299 Melanogenesis 64 7 0.05876 0.286 Pancreatic cancer 64 7 0.05876 0.274 GnRH signaling pathway 71 7 0.09156 0.384 Long-term potentiation 59 6 0.1015 0.415 Type II diabetes mellitus 33 4 0.1033 0.402 Vitamin B6 metabolism 2 1 0.1034 0.412 Retinol metabolism 2 1 0.1034 0.398 Wnt signaling pathway 75 7 0.1141 0.421 Glycosphingolipid biosynthesis - neo-lactoseries 11 2 0.1183 0.434 Colorectal cancer 65 6 0.1428 0.524 Glycolysis / Gluconeogenesis 51 5 0.1443 0.514 Cyanoamino acid metabolism 3 1 0.1551 0.575 MAPK signaling pathway 175 13 0.1601 0.573 Axon guidance 70 6 0.1823 0.603 Purine metabolism 86 7 0.1892 0.594 Linoleic acid metabolism 15 2 0.1964 0.601 Prostate cancer 72 6 0.1992 0.591 Terpenoid biosynthesis 4 1 0.2015 0.597 Tryptophan metabolism 43 4 0.2067 0.585 mTOR signaling pathway 29 3 0.2095 0.585 Sphingolipid metabolism 16 2 0.2168 0.596 Amyotrophic lateral sclerosis (ALS) 16 2 0.2168 0.583 metabolism 30 3 0.2241 0.576 Glyoxylate and dicarboxylate metabolism 5 1 0.2452 0.645 Complement and coagulation cascades 62 5 0.25 0.636 Wnt signaling pathway 75 10 0.009 0.93 Colorectal Cancer 65 9 0.01033 0.545 Notch signaling pathway 25 5 0.01174 0.4033 TGF-Beta signaling pathway 68 9 0.01373 0.3375 Cell cycle 78 9 0.0311 0.618 Adherens junction 54 7 0.03132 0.52

20 Vilar et al - 2008

Basal cell carcinoma 24 4 0.0437 0.657 B cell receptor signaling pathway 48 6 0.05202 0.716 Melanogenesis 64 7 0.06836 0.853 Hedgehog signaling pathway 29 4 0.0785 0.858 Pathogenic Escherichia coli infection 29 4 0.0785 0.78 Pathogenic Escherichia coli infection 29 4 0.0785 0.715 Basal transcription factors 20 3 0.1005 0.810

21 Vilar et al - 2008

Table S13. Detailed information on Gene Ids, total number of genes, number of genes present and P-values of those pathways more closely related with the PI3K-AKT-mTOR pathway that arise on the Gene Set Enrichment Testing. Note that key genes closely related with the PI3K-

AKT-mTOR pathway appear in this analysis.

No. of No. of Pathway name genes, P-value genes total

VEGF signaling pathway 51 8 0.00592 CDC42 cell division cycle 42 PTGS2 prostaglandin-endoperoxide synthase 2 PLA2G2A A2, group IIA PLA2G4A , group IVA CHP calcium binding protein P22 PPP3CA 3, catalytic subunit, alpha isoform PIK3R3 phosphoinositide-3-kinase, regulatory subunit 3 MAPK1 -activated protein kinase 1 Phosphatidylinositol signaling system 52 7 0.02195 PLCB3 , beta 3 CALM1 calmodulin 1 PIK3R3 phosphoinositide-3-kinase, regulatory subunit 3 PIP4K2A phosphatidylinositol-5-phosphate 4-kinase, type II, alpha INPP1 inositol polyphosphate-1-phosphatase ITPK1 inositol 1,3,4-triphosphate 5/6 kinase ITPKA inositol 1,4,5-trisphosphate 3-kinase A Inositol phosphate metabolism 32 5 0.02823 ITPKA inositol 1,4,5-trisphosphate 3-kinase A ITPK1 inositol 1,3,4-triphosphate 5/6 kinase INPP1 inositol polyphosphate-1-phosphatase PIP4K2A phosphatidylinositol-5-phosphate 4-kinase, type II, alpha PLCB3 phospholipase C, beta 3 Type II diabetes mellitus 33 4 0.1033 PKM2 pyruvate kinase, muscle PIK3R3 phosphoinositide-3-kinase, regulatory subunit 3 INSR insulin receptor MAPK1 mitogen-activated protein kinase 1 MAPK signaling pathway 175 13 0.1601 DUSP4 dual specificity phosphatase 4 PTPRR protein tyrosine phosphatase, receptor type, R PLA2G2A phospholipase A2, group IIA PLA2G4A phospholipase A2, group IVA MAPK1 mitogen-activated protein kinase 1 FGFR2 fibroblast growth factor receptor 2 CASP3 caspase 3, apoptosis-related cysteine peptidase FAS Fas (TNF receptor superfamily, member 6) IL1R2 interleukin 1 receptor, type II CDC42 cell division cycle 42 CHP calcium binding protein P22 PPP3CA protein phosphatase 3, catalytic subunit, alpha isoform

22 Vilar et al - 2008

MAP4K1 mitogen-activated protein kinase kinase kinase kinase 1 mTOR signaling pathway 29 3 0.2095 HIF1A hypoxia-inducible factor 1, alpha subunit MAPK1 mitogen-activated protein kinase 1 PIK3R3 phosphoinositide-3-kinase, regulatory subunit 3 Insulin signaling pathway 103 6 0.4975 MAPK1 mitogen-activated protein kinase 1 PKM2 pyruvate kinase, muscle CALM1 calmodulin 1 PRKAR2B protein kinase, cAMP-dependent, regulatory, type II, beta INSR insulin receptor PIK3R3 phosphoinositide-3-kinase, regulatory subunit 3

23 Vilar et al - 2008

Table S14. Drug concentrations used in cytotoxicity experiments.

Compound Concentration [M] 17AAG #1 5.00 x 10-6 #2 1.67 x 10-6 #3 5.56 x 10-7 #4 1.85 x 10-7 #5 6.17 x 10-8 #6 2.06 x 10-8 #7 6.86 x 10-9 LY294002 #1 1.00 x 10-4 #2 3.33 x 10-5 #3 1.11 x 10-5 #4 3.70 x 10-6 #5 1.23 x 10-6 #6 4.12 x 10-7 #7 1.37 x 10-7 Rapamycin #1 1.00 x 10-4 #2 3.33 x 10-5 #3 1.11 x 10-5 #4 3.70 x 10-6 #5 1.23 x 10-6 #6 4.12 x 10-7 #7 1.37 x 10-7 Trichostatin A #1 5.00 x 10-6 #2 1.67 x 10-6 #3 5.56 x 10-7 #4 1.85 x 10-7 #5 6.17 x 10-8 #6 2.06 x 10-8 #7 6.86 x 10-9

24 Vilar et al - 2008

Table S15. Primer sequences and labels used for microsatellite status analyses.

Marker Primer Label BAT26-F 5’-TGACTACTTTTGACTTCAGCC-3’ FAM

BAT26-R 5’-AACCATTCAACATTTTAACCC-3’ #

D10S197-F 5’-GCACCACTGCACTTCAGGT-3’ FAM

MIX1 D10S197-R 5’-GTCCTCAGGTCTCCCTCCTT-3’ #

Beta-catenin-F 5’-AGAATATCTGTAATGGTACTGACTTTGC-3’ HEX

Beta-catenin-R 5’-GAAATTGCTGTAGCAGTATTCACTATAA-3’ #

D18S58-F 5’-CTCCCGGCTGGTTTTATTTA-3’ FAM

D18S58-R 5’-ATCCCTTAGGAGGCAGGAAA-3’ #

BAT40-F 5’-CCTACACCACAACCCTGCTT-3’ FAM

MIX2 BAT40-R 5’-GCACTCTAGCCTGGGTGGTA-3’ #

D2S123-F 5’-AAACAGGATGCCTGCCTTTA-3’ FAM

D2S123-R 5’-GGACTTTCCACCTATGGGAC-3’ #

BAT25-F 5’-TCGCCTCCAAGAATGTAAGT-3’ FAM

BAT25-R 5’-TCTGCATTTTAACTATGGCTC-3’ #

D17S250-F 5’-GGAAGAATCAAATAGACAAT-3’ FAM

D17S250-R 5’-GCTGGCCATATATATATTTAAACC-3’ #

MIX3 TGF-b-RII-F 5’-TGACTTTATTCTGGAAGATGCTG-3’ FAM

TGF-b-RII-R 5’-AACACATGAAGAAAGTCTCACCA-3’ #

D5S346-F 5’-ACTCACTCTAGTGATAAATCG-3’ HEX

D5S346-R 5’-AGCAGATAAGACAGTATTACTAGTT-3’ #

25 Vilar et al - 2008

Table S16. HG-U133A Affymetrix probes used from Watanabe et al data set. Upregulated and downregulated probes in MSI tumors have been separated by the line.

HG-U133A probe ID Gene symbol 215505_s_at STRN3 201123_s_at EIF5A 214385_s_at MUC5AC 206108_s_at SFRS6 204014_at DUSP4 203444_s_at MTA2 214303_x_at MUC5AC 213380_x_at MSTP9 205614_x_at MST1 210145_at PLA2G4A 205501_at --- 203349_s_at ETV5 205278_at GAD1 204073_s_at C11orf9 212859_x_at MT1E 211140_s_at CASP2 204653_at TFAP2A 205081_at CRIP1 209812_x_at CASP2 202072_at HNRNPL 204415_at IFI6 201122_x_at EIF5A 209949_at NCF2 216320_x_at MST1 216834_at RGS1 204364_s_at REEP1 206755_at CYP2B6 214912_at --- 220065_at TNMD 205184_at GNG4 205471_s_at DACH1 202410_x_at IGF2 219658_at PTCD2 206696_at GPR143 216775_at USP53 213953_at KRT20 218806_s_at VAV3 213788_s_at FLJ35348 205472_s_at DACH1 203628_at IGF1R 44790_s_at C13orf18 211207_s_at ACSL6 206149_at CHP2 204712_at WIF1 216255_s_at GRM8 205097_at SLC26A2 217617_at --- 212768_s_at OLFM4 219727_at DUOX2 210282_at ZMYM2 206239_s_at SPINK1

26 Vilar et al - 2008

213757_at EIF5A 205405_at SEMA5A 222313_at --- 204607_at HMGCS2 205549_at PCP4 218002_s_at CXCL14 213789_at TBC1D25 206286_s_at TDGF1 213048_s_at --- 222257_s_at ACE2 213369_at PCDH21 219962_at ACE2 205892_s_at FABP1 206754_s_at CYP2B7P1 206143_at SLC26A3 207457_s_at LY6G6D 205910_s_at CEL 207412_x_at CELP 218963_s_at KRT23

27 Vilar et al - 2008

Table S17. HG-U133A Affymetrix probes used from Koinuma et al data set. Upregulated and downregulated probes in MSI tumors have been separated by the line.

HG-U133 probe ID Gene symbol 200043_at ERH 201533_at CTNNB1 202589_at TYMS 203476_at TPBG 203915_at CXCL9 203932_at HLA-DMB 204533_at CXCL10 206858_s_at HOXC6 208799_at PSMB5 217933_s_at LAP3 218662_s_at NCAPG 221505_at ANP32E 35201_at HNRNPL 201324_at EMP1 202520_s_at MLH1 206976_s_at HSPH1 214039_s_at LAPTM4B

28 Vilar et al - 2008

Table S18. HG-U133A Affymetrix probes used from Kruhoffer et al data set. Upregulated and downregulated probes in MSI tumors have been separated by the line.

HG-U133A probe ID Gene symbol 202072_at HNRNPL 203444_s_at MTA2 204533_at CXCL10 206108_s_at SFRS6 209048_s_at ZMYND8 212062_at ATP9A 213047_x_at SET 218345_at TMEM176A 222244_s_at TUG1

29 Vilar et al - 2008

Table S19. HG-U133A Affymetrix probes used from Banerjea et al data set. Note that only the first 25 probes of 499 are shown in this table. Upregulated and downregulated probes in MSI tumors have been separated by the line.

HG-U133A probe ID Gene symbol 203444_s_at MTA2 205626_s_at CALB1 206108_s_at SFRS6 204698_at ISG20 201123_s_at EIF5A 210143_at ANXA10 215505_s_at STRN3 205164_at GCAT 37145_at GNLY 213201_s_at TNNT1 204015_s_at DUSP4 202072_at HNRNPL 203240_at FCGBP 206858_s_at HOXC6 205081_at CRIP1 32128_at CCL18 213309_at PLCL2 204678_s_at KCNK1 204014_at DUSP4 204669_s_at RNF24 214358_at ACACA 201533_at CTNNB1 202859_x_at IL8 219412_at RAB38 219082_at AMDHD2 205910_s_at CEL 218002_s_at CXCL14 204044_at QPRT 207412_x_at CELP 218963_s_at KRT23 216129_at ATP9A 202410_x_at IGF2 207457_s_at LY6G6D 213369_at PCDH21 207607_at ASCL2 210881_s_at IGF2 44790_s_at C13orf18 218806_s_at VAV3 219471_at C13orf18 203896_s_at PLCB4 205767_at EREG 214043_at PTPRD 216992_s_at GRM8 208121_s_at PTPRO 206755_at CYP2B6 205253_at PBX1 216255_s_at GRM8 206696_at GPR143

30 Vilar et al - 2008

206418_at NOX1 203698_s_at FRZB 215657_at SLC26A3

31