A

Suppl. Figure 1

A 1.0 p = 0.0192

0.8

0.6

0.4

Proportion Surviving 0.2 Gain Normal 0.0 0122436486072 Months After Esophagectomy

B p = 0.034

Below Median Proportion Surviving Above Median

Months After Esophagectomy Suppl. Figure 2 Suppl. Figure 3 Suppl. Figure 4 Suppl. Figure 5

A

Score 0 Score 1 Score 2 Score 3 N=25, (22.5% of patients) N=53, (47.7% of patients) N=23, (20.7% of patients) N=10, (9.0% of patients) B 60

50

40

30

% of Patients 20

10

0 1 23 Score SUPPLEMENTARY MATERIAL

Supplementary methods:

Real-time PCR analysis. Real-time PCR analysis for CDK4 expression was performed in 108 tumors using Assays-on-Demand (AOD, Life technologies Corp, Carlsbad, CA). 88ng of cDNA (RNA equivalents) template was used in each qPCR reaction and assays were performed in triplicate. Quantification of CDK4 expression was measured relative to the geometric mean of five endogenous control (HMBS, POLRA, TBP, PGK, UCBH).

Network-based analysis. Genes that were differently expressed between CDK6-positive and CDK6- negative samples with a p-value < 0.05 and fold-change >0.5 were identified using limma package from bioconductor (1). Network-based analysis was performed using the functional interaction (FI) network (2). Briefly, this consists of 10956 and more than 200,000 curated functional interactions. We calculated pairwise shortest paths among genes of interest in the FI network, hierarchically clustered them based on the average linkage method, and then selected a cluster containing more than 80% altered genes. To calculate a p-value for average shortest path we performed 1000-fold permutation test by randomly selecting the same number of genes from the biggest connected network component. A minimum spanning tree algorithm was used to find linkers and connected all genes of interest in one subnetwork (3). For network clustering we used Markov Cluster Algorithm (MCL) (4) with inflation 1.8. Only clusters with 10 or more genes were taken into account. We used the hypergeometric test to evaluate whether up (or down)-regulated genes were represented more than expected by chance within each cluster. All network diagrams were drawn with Cytoscape (5). The functional enrichment analysis for pathways was based on binominal test. False discovery rate was calculated based on 1000 permutations on all genes in the FI network. siRNA sequences:

CDK6, si1: Sense - GUUUGUAACAGAUAUCGAUtt, Antisense - AUCGAUAUCUGUUACAAACtt

CDK6, si2: Sense - GCAGAAAUGUUUCGUAGAAtt, Antisense - UUCUACGAAACAUUUCUGCaa

CDK4, si1: Sense – UGCUGACUUUUAACCCACAtt, Antisense - UGUGGGUUAAAAGUCAGCAtt

CDK4, si2: Sense – CACCCGUGGUUGUUACACUtt, Antisense - AGUGUAACAACCACGGGUGta

Construction of Tissue Microarray (TMA): TMAs, containing 38 cases of Barrett’s esophagus (BE), 81 cases of columnar cell metaplasia (CCM), 86 cases of squamous epithelium (SE), 18 cases of low grade dysplasia (LGD), 15 cases of high grade dysplasia (HGD), and 116 cases of EAC, were constructed from the representative areas of formalin-fixed specimens collected between 1997-2005 in the Department of Pathology and Laboratory Medicine, University of Rochester Medical Center/Strong Memorial Hospital, Rochester, New York. Five-micron sections were cut from tissue microarrays and were stained with H&E to confirm the presence of the expected tissue histology within each tissue core. Additional sections were cut for IHC analysis.

Patients for Tissue Microarrays: All the 116 patients with esophageal adenocarcinoma used for the tissue microarray construction were treated with esophagectomy in Strong Memorial Hospital/University of Rochester between 1997-2005. These patients included 104 males (89.6%) and 12 females (10.4%). The patient age ranged from 34 to 85 years with a mean of 65 years.

Immunohistochemistry. Tissue sections were deparaffinized, rehydrated and incubated with anti-CDK6 antibody (Santa Cruz Biotechnology, Santa Cruz, CA). Sections were then incubated with the secondary antibody (Flex mouse-link, Dako North America, Inc., Carpinteria, CA) followed by Flex-HRP then Flex DAB chromogen and counterstained with hematoxylin. The intensity of the CDK6 immunostaining was scored as follows: score 0, no stain, score 1, weak stain in cytoplasm and/or nucleus (>30% of cells), score 2, moderate stain in cytoplasm and/or nucleus (>30% of cells) and score 3, strong stain in cytoplasm and/or nucleus (>30% of cells). Anchorage-independent growth assay. For anchorage-independence, two layer agarose-containg media were plated in 24-well plates. The first layer (300μl) was McCoy growth medium (Invitrogen) containing 0.8% sea plaque agarose (a gift from Dr. Andrew Bateman, McGill University, Montreal, Canada) and 5% FBS.The second layer contained 3000 cells in McCoy medium containing 0.4% agarose and 7.5% FBS. Colonies were left to grow in 5% CO2 at 37ºC in a humidified incubator for about three weeks. Colonies were counted using an inverted microscope at 4X amplification. Supplementary Table 1: Clinical characteristics of esophageal adenocarcinoma patients.

No. of Variable patients

Sex

Male 95

Female 21

N-Stage N0 48

N1 65

NX 3

Stage

T1 36

T2 17

T3 59

T4 2

TX 2

Overall Stage I 28

II 31

III 49

IV 7

Unknown 1

Follow Up (months)

Median 26.8

Range 2.3-76.5 Supplementary Table 2:

Pathway annotation for two major network clusters identified by MCL algorithm. P-values in the table are calculated based on binominal test and FDR values based on 1000 permutation tests. Only outputs with p-value < 0.05 and FDR<0.05 are listed. In the brackets after pathways we marked the data resource where the pathway was curated: “R” for Reactome, “K” for KEGG, “B” for BioCarta, “P” for Panther, and “N” for NCI-Nature.

Cluster Pathway P-Value FDR IDs [HLA-DQB1, ITK, HLA-DRB1, HLA-DMA, <1.000e- 2 TCR signaling (R) 3.42E-14 HLA-DQA1, TRAC, HLA-DPA1, INPP5D, 03 HLA-DRA] [HLA-DQB1, HLA-DRB1, CTSS, HLA- Antigen processing and <5.000e- 2 1.07E-09 DQA1, HLA-DRB4, HLA-DPA1, HLA- presentation (K) 04 DRA] Cell adhesion molecules [HLA-DQB1, HLA-DRB1, HLA-DQA1, 2 8.17E-07 3.33E-04 (K) HLA-DRB4, HLA-DPA1, HLA-DRA] LCK and FYN tyrosine 2 kinases in initiation of 6.43E-06 2.50E-04 [HLA-DRB1, TRAC, HLA-DRA] TCR activation (B) [TRA@, HLA-DMA, HLA-DQA1, HLA- 2 T cell activation (P) 9.14E-06 2.00E-04 DPA1, HLA-DRA] 2 IL 4 signaling pathway (B) 1.25E-05 1.67E-04 [HLA-DRB1, IL2RG, HLA-DRA] IL12-mediated signaling [HLA-DRB1, TRAC, IL2RG, JAK2, HLA- 2 1.58E-05 2.86E-04 events (N) DRA] The co-stimulatory signal 2 3.39E-05 3.75E-04 [HLA-DRB1, TRAC, HLA-DRA] during T-cell activation(B) TCR signaling in naive [ITK, HLA-DRB1, TRAC, INPP5D, HLA- 2 3.58E-05 3.33E-04 CD4+ T cells (N) DRA] Role of mef2d in T-cell 2 6.37E-05 6.00E-04 [HLA-DRB1, TRAC, HLA-DRA] (B) Activation of csk by camp- dependent kinase 2 2.61E-04 2.82E-03 [HLA-DRB1, TRAC, HLA-DRA] inhibits signaling through the T cell receptor (B) T cell receptor signaling 2 4.59E-04 5.50E-03 [HLA-DRB1, TRAC, HLA-DRA] pathway (B) IL4-mediated signaling 2 7.71E-04 1.05E-02 [IL2RG, JAK2, INPP5D] events (N) Jak-STAT signaling 2 3.07E-03 3.50E-02 [IL7R, IL2RG, JAK2] pathway (K) IL12 signaling mediated 2 3.66E-03 3.66E-02 [HLA-DRB1, HLA-DRA] by STAT4 (N) EPO signaling pathway 2 4.40E-03 3.81E-02 [JAK2, INPP5D] (N) Receptor-ligand <1.000e- [GNGT1, ADRB1, CCR6, GNB2, CXCL9, 3 complexes bind G 3.67E-08 03 CXCL11] proteins(R) Class A/1 (Rhodopsin-like 3 1.23E-04 9.50E-03 [ADRB1, CCR6, CXCL9, CXCL11] receptors)(R) Cytokine-cytokine 3 3.55E-04 1.80E-02 [CCR6, CXCL13, CXCL9, CXCL11] receptor interaction(K) Heterotrimeric G-protein signaling pathway-Gq 3 8.50E-04 2.68E-02 [GNGT1, GNB2, RGS5] alpha and Go alpha mediated pathway (P) 5HT2 type receptor 3 mediated signaling 9.32E-04 2.30E-02 [GNGT1, GNB2] pathway (P) Heterotrimeric G-protein signaling pathway-Gi 3 1.95E-03 3.33E-02 [GNGT1, ADRB1, RGS5] alpha and Gs alpha mediated pathway (P)

Supplementary results:

Network-analysis of CDK6 amplified versus non-amplified tumors.

While the association of amplification and overexpression of CDK6 with poor outcome in patients seems to be readily explained by its important role in regulating the cell cycle, a network-based analysis of genes that are differentially expressed between CDK6 amplified and non-amplified tumors raises another intriguing possibility. It is well known that solid human tumors are often infiltrated by lymphocytes (tumor- infiltrating lymphocytes [TILs]) and T cells have been shown to be mediators of anti-tumor immunity (6). The presence of TILs in esophageal carcinoma is often correlated with improved patient survival (7). In our analysis we found one cluster of genes associated with T-cell signaling and another cluster associated with chemokines and chemokine receptors, several of which are involved in attraction of T and B lymphocytes. Thus it is possible that tumors with increased CDK6 copy number produce less T-cell chemo-attractants, have fewer TIL’s and subsequently, these patients have worse survival. The nature of this observation and correlation with CDK6 amplification is not clear and needs to be experimentally verified. Although there is some evidence that chemokine signaling can influence CDK6 activity via pRB (8), we are not aware of any data suggesting that CDK6 activity can influence chemokine expression.

We identified 213 genes (including CDK6) that were differently expressed between CDK6-amplified and CDK6-non-amplified tumor samples. Of these, 111 (52.11%) were in the FI network and hierarchical clustering reduced this to a set of 88 of the most interconnected candidates. This set represents a putative CDK6 amplification-associated EAC cancer gene set and was used for further analyses. The average shortest distance calculation showed that genes in this set are linked together much more tightly than would be expected by chance alone (p<0.001) indicating that these differentially expressed genes occupy a small corner of the large FI network space.

From these 88 genes, a sub-network was built by adding the minimum number of linker genes required to form a fully connected sub-network. This CDK6-associated EAC network thus consists of 134 genes, 46 of which are linkers. We then used network community analysis to automatically identify network clusters that contain genes and their products that are involved in common cellular processes. The Markov Cluster algorithm identified 3 clusters consisting of more than 10 genes (Suppl. Fig 2). Cluster 1 is comprised of 35 genes but is not significantly associated with any specific cellular processes or pathways. Cluster 2 is comprised of 22 genes (18 CDK6-associated genes and 4 linkers) and all 18 CDK-6 associated genes are down-regulated in CDK6 amplified tumors (significant enrichment with p = 1.31E-5 from hypergeometric test) (Suppl. Fig. 3). Functional annotation of cluster 2 shows that it is significantly related to T-cell receptor signaling, antigen processing and presentation and T-cell activation (Suppl. Table 2); all known characteristics of T cell function. Cluster 3 is mainly enriched in -coupled receptor interactions (Suppl.Table 2) and notably contains several down-regulated chemokines (CXCL9, CXCL13 and CXCL11) and chemokine receptors (CCR6) (Suppl.Fig 4). Interestingly, CXCL9 and CXCL11 are T-cell chemo attractants while CXCL13 is a B-cell chemo attractant. Thus, clusters 2 and 3 together suggest down-regulation of T-cell function in CDK6-amplified tumors, possibly as a consequence of down-regulated chemo attractant production.

Genes in the 12q13 amplicon.

12q13 amplification comprises 102 genes: OR6C75, OR6C65, OR6C76, OR6C2, OR6C70, OR6C68, OR6C4, OR10P1, METTL7B, ITGA7, BLOC1S1, RDH5, CD63, GDF11, SARNP, ORMDL2, DNAJC14, MMP19, WIBG, DGKA, SILV, CDK2, RAB5B, SUOX, IKZF4, RPS26, ERBB3, PA2G4, RPL41, ZC3H10, ESYT1, MYL6B, MYL6, SMARCC2, RNF41, OBFC2B, SLC39A5, ANKRD52, COQ10A, CS, CNPY2, PAN2, IL23A, STAT2, APOF, TIMELESS, MIP, SPRYD4, GLS2, RBMS2, BAZ2A, ATP5B, SNORD59B, SNORD59A, PTGES3, NACA, PRIM1, HSD17B6, SDR9C7, RDH16, GPR182, ZBTB39, TAC3, MYO1A, TMEM194A, NAB2, STAT6, LRP1, MIR1228, NXPH4, SHMT2, NDUFA4L2, STAC3, R3HDM2, INHBC, INHBE, GLI1, ARHGAP9, MARS, DDIT3, MBD6, DCTN2, KIF5A, PIP4K2C, DTX3, GEFT, SLC26A10, B4GALNT1, OS9, LOC100130776, AGAP2, TSPAN31, CDK4, MARCH9, CYP27B1, METTL1, FAM119B, TSFM, AVIL, MIR26A2, CTDSP2, XRCC6BP1

Immunohistochemistry of EAC tissue microarrays.

Given that CDK6 is a potential driver of the 7q21 amplicon, we analyzed CDK6 protein expression in tissue microarrays from EAC tumors using immunohistochemistry. In an independent cohort of 111 EAC tumor specimens, CDK6 protein showed both cytoplasmic and nuclear immunostaining in a subset of tumors (Supplementary Figure 5). The combined frequency of the highest signals (scores 2 and 3) was 30% (33/111) which is comparable to the frequency of genetic amplification (35%) observed in our array data.

Supplementary figures legend:

Suppl. Figure 1: Kaplan Meier survival curves showing correlation of 12q13 amplification, A and CDK4 expression (median split), B with overall survival. Expression of CDK4 was measured by quantitative real time PCR (see supplementary methods above).

Suppl. Figure 2: Sub-network of CDK6-specific EAC. MCL network clustering results for the differently expressed genes between CDK6-positive and CDK6-negative tumor samples. Gene nodes in three biggest clusters are displayed in different colors. Linker genes used to connect cancer genes are triangles. Up-regulated genes are shown with a red border, down-regulated genes – with a blue border.

Suppl. Figure 3: Diagram of cluster2. Linker genes shown as triangles and in white color. All proteins besides linkers in this cluster were down-regulated. Labeled proteins belong to “TCR signaling” pathway curated by Reactome.

Suppl. Figure 4: Diagram of cluster3. Linker genes shown as triangles and in white color. This cluster consists of three G-proteins, three GPCRs (of those two are chemokine receptors) and three chemokine ligands. CXCL9 and CXCl11 are involved in T cell trafficking and activation [Ref], and CXCl13 preferentially promotes the migration of B lymphocytes [PubMed]. Suppl. Figure 5: Representative slides of immunohistochemical detection of CDK6 on tissue microarrays from an independent cohort of 111 tumors. CDK6 shows both nuclear and cytoplasmic staining in positive tumors. Scoring (0 to 3) is based on CDK6 signal intensity. Number (N) and percent of patients corresponding to each score is shown below the slides. B. A histogram showing the percent of patient versus CDK6 staining intensity (score).

Supplementary references:

1. Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. p Article 3. 2. Wu G, Feng X, & Stein L (A human functional protein interaction network and its application to cancer data analysis. Genome Biol 11(5):R53 . 3. Yellen JGaJ (1998) Graph Theory and Its Applications (CRC Press). 4. Enright AJ, Van Dongen S, & Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7):1575-1584 . 5. Shannon P, et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498-2504 . 6. Toes RE, Ossendorp F, Offringa R, & Melief CJ (1999) CD4 T cells and their role in antitumor immune responses. J Exp Med 189(5):753-756 . 7. Cho Y, et al. (2003) CD4+ and CD8+ T cells cooperate to improve prognosis of patients with esophageal squamous cell carcinoma. Cancer Res 63(7):1555-1559 . 8. Khan MZ, et al. (2008) The chemokine CXCL12 promotes survival of postmitotic neurons by regulating Rb protein. Cell Death Differ 15(10):1663-1672 .