<<

Supporting Information

Bloushtain-Qimron et al. 10.1073/pnas.0805206105 SI Text were performed on ontology terms (using data from CGAP, Cell Culture. For 2D culture, cells were plated at a concentration ftp://ftp1.nci.nih.gov/pub/CGAP/Hs࿝GeneData.dat) that are of 5,000–10,000 cells per well in 48-well plates or eight chamber present in the hypomethylated group versus those present in the glass slides coated with collagen or irradiated 3T3 feeder layer. background list, which are surrounding all uniquely Cells were plated in medium containing 2% Matrigel. Growth mapped AscI sites on the genome. SAGE library analysis was media used: Medium 1: MEBM plus supplements: B27, 20 ng/ml performed similarly but matching the tags to transcripts. Human EGF, 20 ng/ml bFGF, and 4 ␮g/ml heparin (bovine pituitary genome information (March 2006 hg18) were downloaded from extract was excluded); Medium 2: DMEM/F12 plus 1 mM UCSC site (http://hgdownload.cse.ucsc.edu/downloads. glutamine, 5 ␮g/ml insulin, 500 ng/ml hydrocortisone, 10 ng/ml html#human) and used to map all mRNAs immediately close to EGF, 20 ng/ml cholera toxin, and 0.1% BSA; Medium M3: equal AscI sites. This is further modified using a proximal promoter mix of MEBM supplemented with 5.0 ␮g/ml insulin, 70.0 ␮g/ml region of Ϫ5KbtoϪ80 Kb upstream of the transcriptional start bovine pituitary extract, 0.5 ␮g/ml hydrocortisone, 5.0 ng/ml site. When AscI falls inside a proximal promoter or inside a gene, EGF, 5.0 ␮g/ml transferrin, 10 nM isoproterenol, 2.0 mM any additional relations to other genes not meeting the same glutamine, and DMEM/F12 supplemented with 10 ␮g/ml insulin, criteria are rejected. This consistently brings down mapped gene 10 nM tri-iodothyronine, 1.0 nM 17b-estradiol, 0.1 ␮g/ml hy- number from 6,000 to 4,300 genes. A histogram of distances of drocortisone, 5 ng/ml EGF, 2 mM glutamine, 0.5% FCS and 4 AscI in relation the transcriptional start sites were made using ml of AlbuMAX (5% solution in water) with oxytocin added to the Ϫ20 Kb proximal promoter setting. Result is arbitrarily 0.1 nM final concentration. For immunofluorescence, cells were divided into 4 groups representing 3 peak AscI location groups fixed in 4% formaldehyde in PBS for 20 min at room temper- (blue, red, green) and a remaining distal AscI group (yellow). A ature (RT) and permeabilized in 0.1% Triton X-100 for 5 min. pie chart is drawn to show the percentage composition of each Nonspecific binding was blocked with 10% goat serum for 30 group. min., followed by incubation with primary antibody for1hand then with secondary antibody for 45 min at RT. To visualize Suz12 Target Enrichment. After merging and noise removal, SAGE nuclei, DAPI was included in secondary antibody incubation libraries (CD44, CD24, CD10, MUC1, in pairs) were subjected step at 0.2 ␮g/ml final concentration. to Poisson analysis (http://genome.dfci.harvard.edu/sager/). SAGE libraries are also normalized and ratio between CD44ϩ Immunofluorescence. For immunofluorescence, cells were fixed in cells and other group were calculated for each tag. Tags were 4% formaldehyde in PBS for 20 min at room temperature (RT) mapped to best genes using CGAP data file (ftp:// and permeabilized in 0.1% Triton X-100 for 5 min. Nonspecific ftp1.nci.nih.gov/pub/SAGE/HUMAN/) and gene list merged binding was blocked with 10% goat serum for 30 min, followed based on lower p value from the Poisson analysis. Differentially by incubation with primary antibody for 1 h and then with expressed genes were ordered evenly as a spectrum going from secondary antibody for 45 min at RT. To visualize nuclei, DAPI red (down-regulated genes) to black (no change genes) and to was included in secondary antibody incubation step at 0.2 ␮g/ml green (up-regulated genes). Yellow bars were used to mark final concentration. 2-fold threshold. Suz12 target genes were marked in black bars and their densities (sum of target gene numbers in a window of Analysis of SAGE and MSDK Data. To identify differentially meth- 2% total gene number) were appended. Fisher exact tests were ylated tags MSDK libraries were merged and differentially performed starting from two ends using a starting window with methylated tags were determined based on Poisson analysis genes above the threshold, then moving toward midpoint using (http://genome.dfci.harvard.edu/sager/, P Ͻ 0.05). MSDK tags the same size windows, testing the enrichment of Suz12 targets were mapped to genome (National Center for Biotechnology inside the windows. Information build 36) and genes nearest to tag associated AscI were collected. Ratios were drawn across the genome and where Pathway Map and Network Analyses Using METACORE. For enrich- two or more ratios were involved a line is drawn between these ment analysis of gene lists in functional categories ratios to denote they are linked to the same AscI site. Ratio lists of differentially methylated or expressed genes were ana- points for genes of interest were circled and genes labeled. To lyzed for relative enrichment in certain functional ontology calculate an arbitrary hypomethylation score for each MSDK categories in Metacore, including GO and GeneGo cellular library, every AscI site on genome (National Center for Bio- processes, and canonical pathways maps. P values were calcu- technology Information build 36) was accessed for its methyl- lated using a basic formula for hypergeometric distribution ation status based on MSDK tag counts. One AscI site can give where the P value essentially represents the probability of raise to up to two MSDK tags. When either one of these tags has particular mapping arising by chance, given the number of genes a count of greater than or equal to 2, the site is marked as in the set of all genes on maps/networks/processes, genes on a hypomethylated. A total of 6,079 MSDK tags were mapped to particular map/network/process and genes in the experiment. 3,517 AscI sites within all MSDK libraries generated. The For network visualization and analysis sets of genes differentially hypomethylation score for each library is then calculated as the methylated or expressed in the four normal mammary epithelial sum of all hypomethylated AscI sites in one library. MSDK that cell types were uploaded into the Metacore analytical suite are significantly hypomethylated in one MSDK library compared version 4.2 (GeneGo, Inc. St. Joseph, MI). with another were collected based on Poisson analysis (http:// genome.dfci.harvard.edu/sager/, P Ͻ 0.05) and normalized ratios Immunohistochemistry. For double staining formalin-fixed, par- (cutoff at 2-fold difference). These tags were mapped to unique affin-embedded 5-␮M sections were mounted on glass slides, AscI sites on genome. Hypomethylated genes were then inferred deparaffinized, and rehydrated through graded alcohols. For as genes surrounding the hypomethylated AscI sites. To find antigen retrieval, sections were immersed in 10 mM citrate genes enriched in hypomethylated subgroup, Fisher exact tests buffer (pH 6.0) and heated in a pressure cooker (120°C for 5 min,

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 1of31 followed by 90°C for 10 seconds). Slides were blocked with and Data Set 2 (from http://www.rii.com/publications/2002/ hydrogen peroxide, avidin, biotin, and serum, and incubated nejm.html and http://microarray-pubs.stanford.edu/wound࿝NKI/ with anti-HoxA11 antibody at 1:200 overnight at 4°C. Slides were explore.html). The clinical data table for Data Set 1 listed incubated with biotinylated secondary antibody followed by sample 625 as having a relapse, but this was changed to relapse- streptavidiperoxidase complex (Biogenex, San Ramon, CA) and free because all information on the National Center for Bio- 3,3Ј-diaminobenzidine as a substrate. Blocking steps were re- technology Information database indicates that this patient is peated and slides were incubated with anti-CD44v6 antibody relapse-free. Only lymph node-negative tumors of patients who (clone VFF-18, Millipore) at 1:400 for1hatroom temperature. had not undergone chemotherapy or hormonal treatment were Detection with secondary antibody was repeated and slides were included in our analyses since Data Set 1 only contained patients incubated with VIP substrate (Vector Labs, Burlingame, CA). of this type. Information about the microarray probes down- Counterstaining was performed with Methyl green. loaded with the data was used to map them to official National Center for Biotechnology Information gene symbols. Data were Statistical Analyses. To determine association of methylated sta- log2-transformed, any flagged expression values were removed, tus of the three genes (PACAP, SLC3A9R1, and FOXC1) and and remaining values were median-centered by array. tumor ER/PR/HER2 and BRCA1/2 characteristics and to ana- Microarray probes in the two datasets corresponding to genes lyze association of methylated status of two additional genes identified as differentially expressed and methylated between (LHX1 and HOXA10) and tumor BRCA1/2 characteristics. CD44ϩ cells and other normal cell types in this study (CD24ϩ, Kruskal-Wallis tests were used to evaluate whether distributions MUC1ϩ, and CD10ϩ cells) were identified. Data for these of gene MSP methylation values were different among different probes were median-centered by gene. Then, only genes with tumor ER/PR/HER2 categories. Wilcoxon rank sum tests were probes with expression values Ͼ(mean ϩ standard deviation) or used to evaluate if distributions of gene MSP methylation values Ͻ(mean – standard deviation) in at least 10% of each dataset were different between BRCA1 and BRCA2. Each of the three were kept for further analyses. All five remaining genes were genes was tested separately. hypomethylated in CD44ϩ cells, and the mRNA of one (PACAP) was low in these cells, while the mRNA of the other Association with Clinical Outcome Analysis. Expression data on four (FOXC1, HOXA10, LHX1, and SOX13) was high. Logrank frozen primary tumor samples and follow-up clinical data about tests and Kaplan–Meier analysis were performed using Graph- the patients involved were downloaded for Data Set 1 (from Pad Prism 4 (GraphPad Software, 2005). Heat maps were http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc ϭ GSE2034) created using MapleTree (developed by L. Simirenko).

1. Carroll JS, et al. (2005) -wide mapping of estrogen binding 2. Ling JQ, et al. (2006) CTCF mediates interchromosomal colocalization between Igf2/ reveals long-range regulation requiring the forkhead FoxA1. Cell 122:33–43. H19 and Wsb1/Nf1. Science 312:269–272.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 2of31 Fig. S1. Cell purification and analysis. (A) Summary of multicolor FACS analyses to determine the fraction of cells single or double positive for each of the four markers analyzed. The majority (45–65%) of mammary epithelial cells were CD24ϩ and a significant fraction (Ϸ30%) of these were also MUC1ϩ whereas only 2–3% of CD24ϩ cells were positive for CD44 and CD10. The CD10ϩ cell population comprised 11–14% of total epithelial cells and approximately 20% of these were also CD44ϩ. The CD44ϩ cell fraction demonstrated significant overlap with each of the other three cell populations, but at least approximately 40% of CD44ϩ cells were negative for both luminal and myoepithelial lineage markers. (B) Schematic outline of tissue fractionation and enrichment procedure for the various cell types. Cells are captured using the indicated antibody coupled magnetic beads. *denotes that in some experiments the order of capturing with MUC1 and CD24 beads was reversed. Detailed experimental procedure is available from the authors upon request. (C) Semiquantitative RT-PCR analysis of enriched cell fractions. mRNAs prepared from the indicated cell fractions were tested for the expression of known luminal (Lum), myoepithelial (Myoep), and progenitor (Prog) cell-specific genes. PPIA (Peptidyl-Prolyl Isomerase A) was used as loading control. Triangles indicate increasing number of PCR cycles (25, 30, 35). (D) Analysis of human embryonic stem cells. Semiquantitative RT-PCR analysis of undifferentiated (ES-UD) and BMP4-treated (ES-D) human embryonic stem cells used for MSDK analysis using stem cell (Stem) and trophoblast specific (Troph) markers. Triangles indicate increasing number of PCR cycles (25, 30, 35).

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 3of31 Fig. S2. MSDK analysis. Distribution of AscI sites relative to transcription start sites of genes. To determine the genes analyzed by our MSDK approach, we identified all genes nearby to be AscI sites. This revealed that AscI positions in relation to genes are not random as nearly half of the AscI sites fall inside a region that is Ϫ1,000 bp to ϩ 4,000 bp from a predicted transcriptional start site. Of the estimated 4,300 genes found near AscI sites, more than 3,600 genes (83%) have an AscI site located within the 20,000 bp promoter region or inside the gene. Thus, although not evaluating the entire genome, our method still provides a fairly comprehensive and unbiased DNA methylation analysis. For some genes the distance between the proximal promoter and the AscI site is large, but numerous data from several labs have demonstrated the existence of long-range regulatory elements that can function as enhancers (1) or as regulators of imprinting (2). All mappings are based on March 2006 hg18 freeze from UCSC and a proximal promoter of Ϫ20 Kb was used to map AscI related genes. (A) Histogram of AscI site frequencies found at different locations from transcription start sites. These are arbitrarily divided into, Group 1 (blue): Ϫ256 to Ϫ2 Kb, 1,874 genes; Group 2 (red): Ϫ1toϩ 4 Kb, 2341 genes; Group 3 (green): ϩ8toϩ 256 kb, 616 genes; Group 4 (yellow): all other positions, 228 genes. (B) Pie chart showing the percentage composition of each group described in (A).

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 4of31 Fig. S3. MSDK data analysis. Graphs depicting the ratio and chromosomal location of tags statistically significantly (P Ͻ 0.05) differentially present in the indicated MSDK libraries. Graphs include only tags with unique match to the genome. Dots corresponding to genes selected for further validation are circled in red. The y-axis represents the ratio of normalized tags from the indicated libraries in the various comparisons. Higher ratio reflects hypomethylation. CD44ϩ/All indicates the comparison of CD44ϩ cells against all other normal mammary epithelial cells (CD10ϩ, CD24ϩ, and MUC1ϩ cells). In multi-library comparisons, dots with different colors correspond to the ratio of tags in the different pairwise comparisons as indicated in the insets.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 5of31 Fig. S4. Confirmation of MSDK data. Sequencing (A) and qMSP (B) results for selected genes using bisulfate-treated DNA from CD44ϩ (red), CD10ϩ (green), MUC1ϩ (yellow), and CD24ϩ (blue) cells. Methylation sites (CpG) are represented in circles, CpGs located in the AscI sites are marked with red rectangles and number indicates their distance from the transcriptional start site. Methylation percentage of each site (1–100%) is depicted by shading intensity. Red bars mark the location of the AscI sites, predicted exons and direction of transcription are indicated with blue rectangles and arrowheads, respectively. Numbers indicate genomic position and green rectangles corresponds to CpG islands. In the graph depicting the qMSP results, each column corresponds to cells purified from a different individual. Relative methylation levels normalized to ACTB (␤-actin) are indicated on the y-axis. P values indicate the statistical significance of the observed differences in DNA methylation levels among the two cell types or groups of basal (CD44ϩ and CD10ϩ) and luminal (MUC1ϩ and CD24ϩ) cells for the case of SCGB3A1.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 6of31 Fig. S5. Methylation of the indicated genes in CD24ϩ and CD44ϩ isolated from tumors and normal breast tissue. qMSP analysis of the methylation of the indicated genes in CD44ϩ and CD24ϩ cells isolated from breast tumors and normal breast. The y-axis indicates the ln ratio of qMSP values in CD24ϩ and CD44ϩ cells. PE, pleural effusion, IDC, invasive ductal carcinoma. Colors indicate different samples.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 7of31 Fig. S6. Tumor subtype-specific DNA methylation patterns and their clinical relevance. (A) Comparison of clinical outcome for patients with ‘‘CD24ϩ cell-like’’ and ‘‘CD44ϩ cell-like’’ expression patterns of the indicated genes, found to be differentially methylated between the two cell types. Tumors with low (Յ 50th percentile) expression of PACAP and high (Ն 75th percentile) average expression of all other listed genes were called CD44ϩ cell-like whereas tumors with high (Ն 50th percentile) expression of PACAP and low (Յ 25th percentile) expression of all other listed genes were called CD24ϩ cell-like. The heat maps show the expression of the listed genes in each tumor in Datasets 1 and 2, with tumors arranged in columns in order according to first PACAP expression value and then average expression value of the other genes. Kaplan–Meier curves and logrank test P values show that patients with CD44ϩ cell-like tumors in both datasets have statistically significantly (P Ͻ 0.05) shorter distant metastasis-free survival times than patients with CD24ϩ cell-like tumors. (B) qMSP analysis of the indicated genes in matched primary tumors (marked with star and in red) and distant metastases (DM) collected from different organs. Numbers represent independent patients.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 8of31 Fig. S7. The highest scored canonical pathway map ‘‘Chemokines and regulation’’ in the four normal mammary epithelial cell types. Small thermometers indicate MSDK (red) and SAGE (blue) data. The height of the bars is proportional to tag counts. Numbers indicate samples as follows: 1,5, CD10ϩ; 2,6, CD24ϩ; 3,7, CD44ϩ; 4,8, MUC1ϩ. The complex object ‘‘integrins’’ on the map corresponds to ITGB5 for MSDK and INTB2 for SAGE data, respectively. Symbols correspond to different protein classes (see Fig. S12 for detailed description).

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 9of31 Fig. S8. The highest scored canonical pathway map ‘‘cytoskeleton remodeling’’ in the four normal mammary epithelial cell types. Small thermometers indicate MSDK (red) and SAGE (blue) data. The height of the bars is proportional to tag counts. Numbers indicate samples as follows: 1,5, CD10ϩ; 2,6, CD24ϩ; 3,7, CD44ϩ; 4,8, MUC1ϩ. The complex object ‘‘integrins’’ on the map corresponds to ITGB5 for MSDK and INTB2 for SAGE data, respectively. Symbols correspond to different protein classes (see Fig. S12 for detailed description).

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 10 of 31 Fig. S9. Enrichment analysis for the gene content of the highest scored AN networks for SAGE data for CD10ϩ, CD24ϩ, CD44ϩ, and MUC1ϩ cells. P values for cellular processes are calculated using hypergeometric distribution statistics. Genes with MSDK data mapped on are marked with solid red circles whereas genes with SAGE data mapped on are marked with solid blue circles.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 11 of 31 Fig. S10. Enrichment analysis of gene content in the highest scored AN networks for SAGE data of individual cell types Hubs distribution in the highest scored AN networks for SAGE data of individual cell types. Hubs are defined as the 25% of the most connected network nodes. Symbols correspond to different protein classes see Fig. S12 for detailed description). # edges indicate the number of interactions (edges) for a node.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 12 of 31 Fig. S11. The top scored AN networks in the four normal cell types built on SAGE data. Genes with normalized tag counts Ͼ5/50,000 total tags/library were included in the analysis and these are marked with solid red circles. Fragments of canonical pathways on the network are marked with cyan lines.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 13 of 31 Fig. S12. FOXC1 network neighborhood and cellular processes. (A) Interactome network of FOXC1. Solid red circles mark genes differentially methylated between CD44ϩ and CD24ϩ cells, cyan lines/circles indicate downstream targets of FOXC1 while yellow lines/circles highlight its direct upstream regulators including microRNAs. (B) Cellular processes enriched in FOXC1 neighborhood network. % indicates the fraction of all objects on the network belonging to a certain GO process. P values of enrichment for the indicated cellular processes are calculated using hypergeometric distribution statistics. (C) Hubs distribution and enrichment of cellular processes for FOXC1 network. Hubs are defined as the 25% of the most connected network nodes. Symbols correspond to different protein classes (see Fig. S12 for detailed description). # edges indicate the number of interactions (edges) for a node.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 14 of 31 Fig. S13. Detailed description of symbols used in pathway maps and networks.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 15 of 31 Table S1. Summary of tissue samples used Sample Age Tissue Histology Cell types purified Assay used for

N1 48 reduction normal MUC1,CD10,CD24,CD44 qRT-PCR, qMSP N2 22 reduction normal MUC1, BerEP4,CD10,CD24,CD44 SAGE N3 24 reduction normal MUC1,CD10,CD24,CD44 qRT-PCR, qMSP N5 58 reduction normal MUC1,CD10,CD24,CD44 qRT-PCR, qMSP, SAGE N4 40–50 reduction normal MUC1,CD24,CD10,BerEP4,CD44 qRT-PCR, qMSP N6 40–50 reduction normal MUC1,CD24,CD10,CD44 qRT-PCR, qMSP N7 18 reduction normal MUC1, BerEP4,CD10,CD24,CD44 qRT-PCR, qMSP N8 43 reduction normal MUC1, BerEP4,CD24,CD10,CD44 qRT-PCR, qMSP, SAGE N9 29 reduction normal MUC1, BerEP4,CD10,CD24,CD44 qRT-PCR, qMSP, MSDK N10 51 reduction normal MUC1, BerEP4,CD10,CD24,CD44 qRT-PCR, qMSP N15 22 reduction normal MUC1, BerEP4,CD10,CD24,CD44 SAGE IDC31 50 invasive tumor inflammatory carcinoma ER-/PR-/HER2ϩ CD24,CD44, PROCR qMSP, MSDK IDC28 50 invasive tumor ductal carcinoma ERϩ/PRϩ/HER2- CD24,CD44, PROCR qMSP, SAGE IDC26 32 invasive tumor ductal carcinoma ERϩ/PRϩ/HER2- CD24,CD44, PROCR qMSP, SAGE PE6 45 pleural effusion ductal carcinoma ERϩ/PRϩ/HER2ϩ CD24,CD44 qMSP PE3B UK pleural effusion ductal carcinoma ERϩ/PRϩ/HER2- CD24,CD44, PROCR qMSP

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 16 of 31 Table S2. Matched primary tumors and distant metastases Case Primary Metastases Her-2/neu (FISH) CK5/6 EGFR

1ERϩ PR-ILC ER-PR-ILC negative – – 2ERϩ PR ϩ IDC ER ϩ PR-IDC borderline amplified – – 3ERϩ (strong) ILC: ERϩ (weak) PRϩ (weak) negative – – PRϩ (weak) IDC IDC: ERϩ (weak) PRϩ (strong) 4 ER-PR-IDC ER-PR-IDC negative ϩϩ

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 17 of 31 Table S4. Raw SAGE and MSDK gene data corresponding to Figs. 1B and 2B GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

MSDK CD44 ϩ vs CD24ϩ GO:0005737 cytoplasm 68 679 112 1668 0.008627027 2.06 783 ͓’ACTG1’, ’GLRX5’, ’AGA’, ’TMEM33’, ’RANBP17’, ’RAN’, ’C20orf179’, ’MARCKS’, ’NARS2’, ’TAS2R16’, ’OCA2’, ’WFS1’, ’PLEKHA8’, ’C1QL1’, ’POFUT1’, ’EXOSC4’, ’APOB’, ’PREX1’, ’CDC42BPA’, ’RRBP1’, ’BET1L’, ’LZTS2’, ’WDR16’, ’RIC8A’, ’TUBB3’, ’PTPN21’, ’COL4A4’, ’COL4A3’, ’ZNF346’, ’EIF4B’, ’ESR2’, ’MTHFD1’, ’FLNA’, ’PKM2’, ’SMAD1’, ’ACR’, ’TMSB10’, ’MTIF3’, ’ZNF622’, ’EFCBP1’, ’HTRA2’, ’PXK’, ’DDX19B’, ’CYP39A1’, ’SIAH1’, ’WBSCR17’, ’PLCD1’, ’LGALS7’, ’LGALS4’, ’RAB11FIP1’, ’GPAA1’, ’GPSM2’, ’RBMS3’, ’ANKRD15’, ’AXIN2’, ’RAVER2’, ’TMEM55A’, ’HIP1R’, ’RRAGD’, ’NR3C1’, ’UXS1’, ’CDC42EP3’, ’SPSB4’, ’NDEL1’, ’DLEC1’, ’ARSA’, ’AARS’, ’TXNDC11’͔ GO:0043565 sequence-specific DNA 12 117 168 2230 0.203660366 0.69 17 ͓’NKX2–8’, ’LHX1’, ’SIX2’, ’THRA’, ’FOXF2’, binding ’LMX1A’, ’ESR2’, ’FOXC1’, ’HOXA11’, ’PITX1’, ’NR3C1’, ’PHOX2B’͔ GO:0045786 negative regulation of cell 4 22 176 2325 0.108939822 0.96 19 ͓’LZTS2’, ’FOXC1’, ’ANKRD15’, ’DLEC1’͔ cycle GO:0005730 6 18 174 2329 0.005479901 2.26 23 ͓’CD3EAP’, ’EXOSC4’, ’DDX56’, ’ZNF346’, ’POLR1E’, ’DDX21’͔ GO:0008134 11 73 169 2274 0.033486622 1.48 26 ͓’RAN’, ’THRA’, ’FOXF2’, ’ESR2’, ’FLNA’, binding ’FOXC1’, ’CBX4’, ’TCF20’, ’ZNF136’, ’’, ’PHOX2B’͔ GO:0006913 nucleocytoplasmic 4 21 176 2326 0.097326238 1.01 70 ͓’RANBP17’, ’RAN’, ’FLNA’, ’DDX19B’͔ transport GO:0006355 regulation of transcription, 25 393 155 1954 0.86498554 0.06 113 ͓’NKX2–8’, ’LHX1’, ’RAN’, ’SIX2’, ’THRA’, DNA-dependent ’FOXF2’, ’ZNF678’, ’PLAGL2’, ’SOX13’, ’LMX1A’, ’PRDM14’, ’ESR2’, ’FOXC1’, ’SMAD1’, ’CBX8’, ’CBX4’, ’HOXA11’, ’TCF20’, ’PRDM12’, ’ZNF136’, ’PITX1’, ’EOMES’, ’NR3C1’, ’ZBTB5’, ’PHOX2B’͔ GO:0048513 organ development 13 206 167 2141 0.800520121 0.1 1590 ͓’ACTG1’, ’LHX1’, ’FOXF2’, ’EMD’, ’MDGA1’, ’COL4A4’, ’COL4A3’, ’FOXC1’, ’SOST’, ’SMAD1’, ’HOXA11’, ’PITX1’, ’NFAM1’͔ GO:0044428 nuclear part 16 128 164 2219 0.046379251 1.33 283 ͓’RANBP17’, ’RAN’, ’FOXF2’, ’CD3EAP’, ’EMD’, ’EXOSC4’, ’DDX56’, ’ZNF346’, ’FOXC1’, ’SMAD1’, ’CBX8’, ’DDX19B’, ’POLR1E’, ’CPSF1’, ’PHOX2B’, ’DDX21’͔ GO:0003723 RNA binding 11 70 169 2277 0.02633445 1.58 58 ͓’LARP5’, ’EXOSC4’, ’DDX56’, ’ZNF346’, ’EIF4B’, ’DDX19B’, ’RBMS3’, ’CPSF1’, ’RAVER2’, ’DDX21’, ’AARS’͔ GO:0048519 negative regulation of 12 182 168 2165 0.743395417 0.13 1112 ͓’FOXF2’, ’LZTS2’, ’COL4A3’, ’ESR2’, ’FOXC1’, biological process ’SOST’, ’SMAD1’, ’CBX4’, ’AATF’, ’ZNF136’, ’ANKRD15’, ’DLEC1’͔ CD44 ϩ vs Muc1ϩ GO:0005737 cytoplasm 9 788 20 1864 0.508703054 0.29 783 ͓’UBE2Q2’, ’PIK3CD’, ’LZTS1’, ’UXS1’, ’OCA2’, ’EPS8L1’, ’ESR2’, ’MTHFD1’, ’STAT5A’͔ GO:0043565 sequence-specific DNA 5 137 24 2515 0.016463268 1.78 17 ͓’PITX1’, ’LHX3’, ’LMX1A’, ’ESR2’, ’FOXC1’͔ binding GO:0045786 negative regulation of cell 2 27 27 2625 0.038300194 1.42 19 ͓’LZTS1’, ’FOXC1’͔ cycle GO:0005730 nucleolus 0 28 29 2624 1 0 23 ͓͔ GO:0008134 transcription factor 2 89 27 2563 0.258236056 0.59 26 ͓’ESR2’, ’FOXC1’͔ binding GO:0006913 nucleocytoplasmic 0 25 29 2627 1 0 70 ͓͔ transport

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 18 of 31 GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

GO:0006355 regulation of transcription, 9 431 20 2221 0.037011945 1.43 113 ͓’PITX1’, ’CBX2’, ’LHX3’, ’LZTS1’, ’LMX1A’, DNA-dependent ’ESR2’, ’STAT5A’, ’PRDM12’, ’FOXC1’͔ GO:0048513 organ development 6 223 23 2429 0.032477036 1.49 1590 ͓’PITX1’, ’LHX3’, ’UNC5C’, ’STAT5A’, ’CRHR2’, ’FOXC1’͔ GO:0044428 nuclear part 1 154 28 2498 0.823845364 0.08 283 ͓’FOXC1’͔ GO:0003723 RNA binding 0 83 29 2569 1 0 58 ͓͔ GO:0048519 negative regulation of 6 201 23 2451 0.020796556 1.68 1112 ͓’CBX2’, ’LZTS1’, ’ESR2’, ’STAT5A’, ’CRHR2’, biological process ’FOXC1’͔ CD44 ϩ vs CD10ϩ GO:0005737 cytoplasm 48 808 102 1869 0.348632587 0.46 783 ͓’UBL5’, ’C19orf2’, ’AGA’, ’PIK3CD’, ’MIB2’, ’C20orf179’, ’MARCKS’, ’GLRX5’, ’TAS2R16’, ’FXR2’, ’PLEKHA8’, ’HIP1R’, ’PREX1’, ’DGAT2’, ’SYT15’, ’RRBP1’, ’LZTS2’, ’TUBB3’, ’COL4A4’, ’COL4A3’, ’EPS8L1’, ’ESR2’, ’MTHFD1’, ’ST8SIA1’, ’IPO9’, ’FLNA’, ’SSB’, ’SCYL3’, ’MTIF3’, ’SPSB4’, ’AKAP8L’, ’ZNF346’, ’AP3D1’, ’DAB2IP’, ’DDX19B’, ’ACTR8’, ’WDR16’, ’GPSM2’, ’CEP57’, ’CTNNB1’, ’LATS2’, ’SERTAD2’, ’ANKRD15’, ’ACAA2’, ’RSAD1’, ’ABCB9’, ’PRKCG’, ’AARS’͔ GO:0043565 sequence-specific DNA 7 137 143 2540 0.652415773 0.19 17 ͓’NFIL3’, ’ELK4’, ’MAF’, ’ESR2’, ’FOXC1’, binding ’FOSL2’, ’PITX1’͔ GO:0045786 negative regulation of cell 5 29 145 2648 0.031484022 1.5 19 ͓’LZTS2’, ’FOXC1’, ’DAB2IP’, ’LATS2’, cycle ’ANKRD15’͔ GO:0005730 nucleolus 3 25 147 2652 0.183216419 0.74 23 ͓’ZNF346’, ’MKI67’, ’DDX21’͔ GO:0008134 transcription factor 11 80 139 2597 0.007752911 2.11 26 ͓’C19orf2’, ’NFIL3’, ’ELK4’, ’MYST3’, ’ESR2’, binding ’FLNA’, ’FOXC1’, ’TLE1’, ’CTNNB1’, ’SERTAD2’, ’NRIP1’͔ GO:0006913 nucleocytoplasmic 5 21 145 2656 0.010457598 1.98 70 ͓’IPO9’, ’FLNA’, ’SSB’, ’DDX19B’, ’CEP57’͔ transport GO:0006355 regulation of transcription, 19 441 131 2236 0.91382969 0.04 113 ͓’C19orf2’, ’PROX1’, ’NFIL3’, ’ELK4’, ’FST’, DNA-dependent ’MYST3’, ’MAF’, ’PHF23’, ’ESR2’, ’FOXC1’, ’TLE1’, ’ZNF536’, ’PRDM12’, ’FOSL2’, ’ZNF133’, ’CTNNB1’, ’PITX1’, ’SERTAD2’, ’NRIP1’͔ GO:0048513 organ development 13 235 137 2442 0.563397778 0.25 1590 ͓’PROX1’, ’FST’, ’EMD’, ’MYST3’, ’COL4A4’, ’COL4A3’, ’FOXC1’, ’SOST’, ’TLE1’, ’AP3D1’, ’CTNNB1’, ’PITX1’, ’NRIP1’͔ GO:0044428 nuclear part 14 143 136 2534 0.036036522 1.44 283 ͓’C19orf2’, ’EMD’, ’IPO9’, ’FOXC1’, ’HNRPH1’, ’AKAP8L’, ’ZNF346’, ’DDX19B’, ’MKI67’, ’CTNNB1’, ’PRPF18’, ’DDX21’, ’RPA1’, ’NRIP1’͔ GO:0003723 RNA binding 9 81 141 2596 0.046662476 1.33 58 ͓’LARP5’, ’FXR2’, ’SSB’, ’HNRPH1’, ’ZNF346’, ’DDX19B’, ’SFRS5’, ’DDX21’, ’AARS’͔ GO:0048519 negative regulation of 16 204 134 2473 0.117800803 0.93 1112 ͓’PROX1’, ’FST’, ’MYST3’, ’LZTS2’, ’COL4A3’, biological process ’ESR2’, ’FOXC1’, ’SOST’, ’TLE1’, ’DAB2IP’, ’CTNNB1’, ’LATS2’, ’SERTAD2’, ’ANKRD15’, ’PRKCG’, ’NRIP1’͔ SAGE CD44 ϩ vs CD24ϩ GO࿝id GO࿝term a b c dpVal Score GO࿝id Gene list scanned GO:0005581 collagen 11 10 258 6820 4.61E-11 10.34 25 ͓’COL4A2’, ’COL4A1’, ’LUM’, ’COL5A3’, ’COL1A2’, ’COL1A1’, ’COL3A1’, ’COL14A1’, ’COL6A1’, ’COL6A3’, ’COL6A2’͔

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 19 of 31 GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

GO:0007166 cell surface receptor linked 46 381 223 6449 4.00E-11 10.4 239 ͓’TIMP3’, ’CPZ’, ’ADAMTS4’, ’INHBA’, signal transduction ’FMOD’, ’THY1’, ’CPE’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’BMP2’, ’FN1’, ’IL7R’, ’IL1R1’, ’IL6’, ’GEM’, ’VEGFA’, ’IL1B’, ’MPZL1’, ’SPARC’, ’C3’, ’HTRA1’, ’CD69’, ’PTPRC’, ’BDKRB1’, ’IL2RB’, ’SFRP4’, ’ALDH2’, ’SFRP2’, ’CTGF’, ’GNA12’, ’COL1A2’, ’EBI2’, ’IL8’, ’C1S’, ’CRYAB’, ’CSF3’, ’EDNRB’, ’CXCR7’, ’CXCR4’, ’LTBP4’, ’SRC’, ’PPAP2B’, ’RGS16’, ’PDGFRB’, ’RGS5’͔ GO:0030154 cell differentiation 63 666 206 6164 9.11E-11 10.04 1418 ͓’SNF1LK’, ’TIMP3’, ’TIMP2’, ’TIMP1’, ’PBXIP1’, ’LGALS1’, ’KLF6’, ’INHBA’, ’HSPA1A’, ’THY1’, ’SERPINB9’, ’BMP2’, ’SLC2A3’, ’PTGS2’, ’GLIS2’, ’CDKN1A’, ’NDRG1’, ’PHLDA1’, ’IL6’, ’APOE’, ’EGR1’, ’MAFG’, ’EPAS1’, ’TNFAIP3’, ’UBE2Z’, ’CREM’, ’VEGFA’, ’CD74’, ’SRGN’, ’IL1B’, ’HSPD1’, ’C7’, ’GJA1’, ’BAG3’, ’SERPINF1’, ’NFKB1’, ’PTPRC’, ’IL2RB’, ’SFRP4’, ’IFI6’, ’ID3’, ’PPP1R15A’, ’SFRP2’, ’CTGF’, ’EMP3’, ’SLAMF7’, ’HMOX1’, ’CRYAB’, ’MAP2K1’, ’MCL1’, ’CSF3’, ’EDNRB’, ’CEBPB’, ’TWIST1’, ’PEA15’, ’ETS1’, ’CXCR4’, ’TNFSF9’, ’SPON2’, ’LTBP4’, ’PGRMC1’, ’NR4A2’, ’MMP9’͔ GO:0030247 polysaccharide binding 12 33 257 6797 6.46E-08 7.19 11 ͓’SOD3’, ’FN1’, ’APOE’, ’THBS4’, ’THBS2’, ’THBS1’, ’VEGFA’, ’FSTL1’, ’COL5A3’, ’CTGF’, ’TNFAIP6’, ’LTBP4’͔ GO:0009611 response to wounding 31 141 238 6689 1.58E-13 12.8 227 ͓’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’BMP2’, ’FN1’, ’PTGS2’, ’CCL3L3’, ’IL1R1’, ’IL6’, ’THBS1’, ’FOS’, ’FABP4’, ’IL1B’, ’C3’, ’C7’, ’GJA1’, ’NFKB1’, ’BDKRB1’, ’CCL19’, ’ID3’, ’CTGF’, ’GNA12’, ’CFD’, ’IL8’, ’TNFAIP6’, ’C1S’, ’CEBPB’, ’CXCR4’, ’CFH’, ’C1R’͔ GO:0006935 chemotaxis 11 44 258 6786 5.11E-06 5.29 58 ͓’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CCL3L3’, ’VEGFA’, ’IL1B’, ’CCL19’, ’IL8’, ’MAP2K1’, ’CXCR4’͔ GO:0006950 response to stress 53 428 216 6402 4.17E-13 12.38 476 ͓’GSTA4’, ’C19orf40’, ’HSP90AB1’, ’HSPA1A’, ’SOD3’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’BMP2’, ’FN1’, ’PTGS2’, ’CCL3L3’, ’CDKN1A’, ’MAPKAPK5’, ’IL1R1’, ’IL6’, ’APOE’, ’THBS1’, ’FOS’, ’EPAS1’, ’EGLN1’, ’GPX3’, ’VEGFA’, ’FABP4’, ’IL1B’, ’HYOU1’, ’DNAJB1’, ’HSPD1’, ’C3’, ’C7’, ’GJA1’, ’NFKB1’, ’AKR1B1’, ’BDKRB1’, ’CCL19’, ’ID3’, ’PPP1R15A’, ’CTGF’, ’GNA12’, ’COL1A1’, ’CFD’, ’IL8’, ’TNFAIP6’, ’MAP4K4’, ’HMOX1’, ’C1S’, ’MAP2K1’, ’CEBPB’, ’CXCR4’, ’CYGB’, ’CFH’, ’C1R’͔ GO:0048513 organ development 50 426 219 6404 1.27E-11 10.9 1590 ͓’TIMP1’, ’LGALS1’, ’ADAMTS4’, ’KLF6’, ’INHBA’, ’THY1’, ’MSX1’, ’GYPC’, ’BMP2’, ’PTGS2’, ’LARGE’, ’IL6’, ’EGR1’, ’MAFG’, ’DCN’, ’COL4A2’, ’EPAS1’, ’EGLN1’, ’MEF2D’, ’VEGFA’, ’CD74’, ’FST’, ’SRGN’, ’IL1B’, ’SPARC’, ’GJA1’, ’COMP’, ’COL5A3’, ’SERPINF1’, ’PTPRC’, ’ID3’, ’CTGF’, ’COL1A2’, ’COL1A1’, ’IL8’, ’HMOX1’, ’COL3A1’, ’CSF3’, ’EDNRB’, ’CEBPB’, ’TWIST1’, ’ODC1’, ’ETS1’, ’CXCR4’, ’MYOM2’, ’PPAP2B’, ’TAGLN’, ’MMP9’, ’COL6A3’, ’MMP2’͔ GO:0001664 G-protein-coupled 7 22 262 6808 7.90E-05 4.1 220 ͓’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, receptor binding ’CCL3L3’, ’CCL19’, ’IL8’͔

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 20 of 31 GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

GO:0006954 inflammatory response 26 95 243 6735 2.27E-13 12.64 149 ͓’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’BMP2’, ’FN1’, ’PTGS2’, ’CCL3L3’, ’IL1R1’, ’IL6’, ’FOS’, ’FABP4’, ’IL1B’, ’C3’, ’C7’, ’NFKB1’, ’BDKRB1’, ’CCL19’, ’CFD’, ’IL8’, ’TNFAIP6’, ’C1S’, ’CEBPB’, ’CXCR4’, ’CFH’, ’C1R’͔ GO:0006955 immune response 45 195 224 6635 5.59E-20 19.25 645 ͓’IGHG1’, ’CTSS’, ’THY1’, ’HLA-DPB1’, ’HLA-DRB1’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CD6’, ’CCL3L3’, ’IL7R’, ’IGHA1’, ’IL1R1’, ’IL6’, ’GEM’, ’IGL@’, ’HLA-DPA1’, ’CD276’, ’CD74’, ’IL1B’, ’C3’, ’C7’, ’CST7’, ’PTPRC’, ’IL2RB’, ’IFI6’, ’CCL19’, ’SLAMF7’, ’EBI2’, ’CFD’, ’IL8’, ’IFITM2’, ’HMOX1’, ’C1S’, ’CSF3’, ’CEBPB’, ’HLA-DRA’, ’HLA-A’, ’ETS1’, ’CXCR4’, ’TNFSF9’, ’SPON2’, ’CFH’, ’C1R’͔ GO:0007155 cell adhesion 23 239 246 6591 0.000136978 3.86 94 ͓’MFAP4’, ’THY1’, ’SCARF2’, ’FN1’, ’CD6’, ’SRPX’, ’THBS4’, ’THBS2’, ’THBS1’, ’PSCD1’, ’DPT’, ’COMP’, ’COL5A3’, ’CTGF’, ’SLAMF7’, ’IL8’, ’TNFAIP6’, ’SPON2’, ’LGALS3BP’, ’COL14A1’, ’COL6A1’, ’COL6A3’, ’COL6A2’͔ GO:0048519 negative regulation of 56 521 213 6309 1.48E-11 10.83 1112 ͓’TIMP2’, ’TIMP1’, ’PBXIP1’, ’GSN’, ’INHBA’, biological process ’HSPA1A’, ’THY1’, ’SERPINB9’, ’TERF2IP’, ’MSX1’, ’CXCL1’, ’BMP2’, ’PTGS2’, ’CCL3L3’, ’GLIS2’, ’CDKN1A’, ’IL6’, ’EGR1’, ’TSPYL2’, ’IGFBP6’, ’COL4A2’, ’TNFAIP3’, ’VEGFA’, ’IGFBP7’, ’CD74’, ’FOSB’, ’GPNMB’, ’FABP4’, ’FST’, ’SRGN’, ’IL1B’, ’ZEB2’, ’HTRA1’, ’GJA1’, ’BAG3’, ’PMP22’, ’SERPINF1’, ’PRKAR1A’, ’NFKB1’, ’PTPRC’, ’IL2RB’, ’IFI6’, ’ID3’, ’PPP1R15A’, ’EMP3’, ’IL8’, ’HMOX1’, ’CRYAB’, ’MCL1’, ’CEBPB’, ’TWIST1’, ’PEA15’, ’ETS1’, ’WARS’, ’RGS16’, ’RGS5’͔ GO:0043069 negative regulation of 16 115 253 6715 3.17E-05 4.5 30 ͓’HSPA1A’, ’SERPINB9’, ’CDKN1A’, ’IL6’, programmed cell death ’TNFAIP3’, ’VEGFA’, ’CD74’, ’BAG3’, ’NFKB1’, ’IL2RB’, ’IFI6’, ’HMOX1’, ’CRYAB’, ’MCL1’, ’CEBPB’, ’PEA15’͔ GO:0005576 extracellular region 92 438 177 6392 4.57E-39 38.34 132 ͓’TIMP3’, ’TIMP2’, ’TIMP1’, ’IGHG1’, ’BGN’, ’CPZ’, ’MFAP4’, ’LGALS1’, ’GSN’, ’ADAMTS4’, ’CTSS’, ’INHBA’, ’FMOD’, ’SOD3’, ’CPE’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’BMP2’, ’FN1’, ’CCDC80’, ’PLTP’, ’CCL3L3’, ’IL7R’, ’METRNL’, ’FBLN1’, ’IL6’, ’APOD’, ’APOE’, ’THBS4’, ’THBS2’, ’THBS1’, ’PCOLCE’, ’DCN’, ’EFEMP1’, ’IGFBP6’, ’COL4A2’, ’COL4A1’, ’IGFBP5’, ’PSAP’, ’ADFP’, ’GPX3’, ’CTSL1’, ’VEGFA’, ’IL15RA’, ’IGFBP7’, ’PTGDS’, ’DPT’, ’FST’, ’SRGN’, ’IL1B’, ’FSTL1’, ’LUM’, ’SPARC’, ’C3’, ’CST3’, ’C7’, ’CST7’, ’HTRA1’, ’COMP’, ’COL5A3’, ’SERPINF1’, ’AKR1B1’, ’SFRP4’, ’CCL19’, ’RARRES2’, ’SFRP2’, ’CTGF’, ’COL1A2’, ’COL1A1’, ’CFD’, ’IL8’, ’TNFAIP6’, ’C1S’, ’SPARCL1’, ’ECM1’, ’COL3A1’, ’CSF3’, ’TNFSF9’, ’PI16’, ’SPON2’, ’LTBP4’, ’LGALS3BP’, ’CFH’, ’COL14A1’, ’MMP9’, ’COL6A1’, ’C1R’, ’COL6A3’, ’COL6A2’, ’MMP2’͔ CD44 ϩ vs Muc1ϩ GO:0005581 collagen 10 12 217 6860 4.24E-10 9.37 25 ͓’COL4A1’, ’COL4A2’, ’LUM’, ’COL1A2’, ’COL1A1’, ’COL3A1’, ’COL14A1’, ’COL6A1’, ’COL6A3’, ’COL6A2’͔

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 21 of 31 GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

GO:0007166 cell surface receptor linked 44 386 183 6486 1.81E-12 11.74 239 ͓’LIF’, ’CPZ’, ’ADAMTS4’, ’INHBA’, ’FMOD’, signal transduction ’CPE’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’FN1’, ’PPP2CA’, ’CCL2’, ’ADRBK1’, ’IL1R1’, ’IL6’, ’JAK1’, ’GEM’, ’UBE2N’, ’VEGFA’, ’IL1B’, ’GPRC5A’, ’PPAP2B’, ’C3’, ’SPARC’, ’HTRA1’, ’IL7R’, ’CD69’, ’BDKRB1’, ’IL2RB’, ’SFRP4’, ’COL1A2’, ’IL8’, ’SPHK1’, ’CRYAB’, ’EDNRB’, ’RGS5’, ’CXCR7’, ’RGS2’, ’CXCR4’, ’ANXA1’, ’LTBP4’, ’C1S’, ’RGS16’, ’PDGFRB’͔ GO:0030154 cell differentiation 53 663 174 6209 2.54E-09 8.59 1418 ͓’PTGS2’, ’TIMP2’, ’TIMP1’, ’PEA15’, ’INHBA’, ’SERPINB9’, ’PPP2CA’, ’SLC2A3’, ’MTPN’, ’CCL2’, ’GLIS2’, ’PHLDA1’, ’IL6’, ’EGR1’, ’MAP1B’, ’TPT1’, ’MAFG’, ’EPAS1’, ’TNFAIP3’, ’CREM’, ’VEGFA’, ’CD74’, ’BAG3’, ’SRGN’, ’IL1B’, ’TNFSF9’, ’S100A4’, ’C7’, ’BNIP3L’, ’SERPINF1’, ’NFKB1’, ’LGALS1’, ’IL2RB’, ’SFRP4’, ’ID3’, ’PPP1R15A’, ’EMP3’, ’SLAMF7’, ’SPHK1’, ’CRYAB’, ’HSPA1A’, ’MCL1’, ’EDNRB’, ’TWIST1’, ’ETS1’, ’CXCR4’, ’ANXA1’, ’SPON2’, ’LTBP4’, ’ANXA5’, ’APOE’, ’NPM1’, ’MMP9’͔ GO:0030247 polysaccharide binding 8 32 219 6840 3.44E-06 5.46 11 ͓’FN1’, ’THBS2’, ’THBS1’, ’VEGFA’, ’TNFAIP6’, ’LTBP4’, ’CD44’, ’APOE’͔ GO:0009611 response to wounding 29 144 198 6728 8.61E-14 13.06 227 ͓’PTGS2’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CXCL5’, ’FN1’, ’MTPN’, ’CCL3L3’, ’CCL2’, ’IL1R1’, ’IL6’, ’THBS1’, ’IL1B’, ’C3’, ’C7’, ’NFKB1’, ’BDKRB1’, ’CCL19’, ’ID3’, ’IL8’, ’TNFAIP6’, ’CXCR4’, ’ANXA1’, ’ANXA5’, ’C1S’, ’CFH’, ’CFD’, ’C1R’͔ GO:0006935 chemotaxis 13 42 214 6830 1.14E-08 7.94 58 ͓’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CXCL5’, ’CCL3L3’, ’CCL2’, ’VEGFA’, ’IL1B’, ’FOSL1’, ’CCL19’, ’IL8’, ’CXCR4’͔ GO:0006950 response to stress 45 439 182 6433 2.64E-11 10.58 476 ͓’HSPA6’, ’PTGS2’, ’C19orf40’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CXCL5’, ’FN1’, ’MTPN’, ’CCL3L3’, ’CCL2’, ’MAPKAPK5’, ’IL1R1’, ’IL6’, ’CYGB’, ’THBS1’, ’EPAS1’, ’EGLN1’, ’UBE2N’, ’VEGFA’, ’IL1B’, ’DNAJB1’, ’C3’, ’C7’, ’NFKB1’, ’BDKRB1’, ’CCL19’, ’ID3’, ’PPP1R15A’, ’COL1A1’, ’HSP90AA1’, ’IL8’, ’TNFAIP6’, ’HSPA1A’, ’CXCR4’, ’ANXA1’, ’ANXA5’, ’HERPUD1’, ’C1S’, ’APOE’, ’NPM1’, ’CFH’, ’CFD’, ’C1R’͔ GO:0048513 organ development 41 423 186 6449 1.46E-09 8.83 1590 ͓’LIF’, ’PTGS2’, ’ADAMTS4’, ’TIMP1’, ’INHBA’, ’PPP2CA’, ’MTPN’, ’CCL2’, ’ADRBK1’, ’IL6’, ’EGR1’, ’DCN’, ’MAFG’, ’EPAS1’, ’EGLN1’, ’VEGFA’, ’COL4A2’, ’CD74’, ’SRGN’, ’IL1B’, ’PPAP2B’, ’SPARC’, ’SERPINF1’, ’LGALS1’, ’ID3’, ’COL1A2’, ’COL1A1’, ’IL8’, ’SPHK1’, ’COL3A1’, ’EDNRB’, ’TWIST1’, ’ETS1’, ’CXCR4’, ’MSX1’, ’MYOM2’, ’LMNA’, ’ODC1’, ’MMP9’, ’COL6A3’, ’MMP2’͔ GO:0001664 G-protein-coupled 9 18 218 6854 8.47E-08 7.07 220 ͓’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, receptor binding ’CXCL5’, ’CCL3L3’, ’CCL2’, ’CCL19’, ’IL8’͔ GO:0006954 inflammatory response 25 95 202 6777 2.89E-14 13.54 149 ͓’PTGS2’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CXCL5’, ’FN1’, ’CCL3L3’, ’CCL2’, ’IL1R1’, ’IL6’, ’IL1B’, ’C3’, ’C7’, ’NFKB1’, ’BDKRB1’, ’CCL19’, ’IL8’, ’TNFAIP6’, ’CXCR4’, ’ANXA1’, ’C1S’, ’CFH’, ’CFD’, ’C1R’͔

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 22 of 31 GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

GO:0006955 immune response 42 192 185 6680 8.11E-21 20.09 645 ͓’LIF’, ’IGHG1’, ’HLA-DPB1’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CXCL5’, ’CCL3L3’, ’CCL2’, ’HLA-B’, ’IGHA1’, ’HLA-E’, ’IL1R1’, ’IL6’, ’GEM’, ’IGL@’, ’HLA-DPA1’, ’UBE2N’, ’CD276’, ’CD74’, ’CD6’, ’IL1B’, ’TNFSF9’, ’C3’, ’C7’, ’CST7’, ’BNIP3L’, ’IL7R’, ’IL2RB’, ’CCL19’, ’SLAMF7’, ’IL8’, ’IFITM2’, ’HLA-DRA’, ’ETS1’, ’CXCR4’, ’SPON2’, ’C1S’, ’CFH’, ’CFD’, ’C1R’͔ GO:0007155 cell adhesion 19 249 208 6623 0.000901502 3.05 94 ͓’MFGE8’, ’MFAP4’, ’FN1’, ’PPP2CA’, ’SRPX’, ’CCL2’, ’THBS2’, ’THBS1’, ’CD6’, ’DPT’, ’SLAMF7’, ’IL8’, ’TNFAIP6’, ’SPON2’, ’CD44’, ’COL14A1’, ’COL6A1’, ’COL6A3’, ’COL6A2’͔ GO:0048519 negative regulation of 56 511 171 6361 4.05E-15 14.39 1112 ͓’LIF’, ’PTGS2’, ’TIMP2’, ’TIMP1’, ’PEA15’, biological process ’INHBA’, ’SERPINB9’, ’CXCL1’, ’PPP2CA’, ’CCL3L3’, ’CCL2’, ’GLIS2’, ’ADRBK1’, ’IL6’, ’EGR1’, ’GSN’, ’MAP1B’, ’TPT1’, ’IGFBP6’, ’IGFBP7’, ’ZEB2’, ’TNFAIP3’, ’VEGFA’, ’COL4A2’, ’CD74’, ’BAG3’, ’FOSB’, ’TBX2’, ’SRGN’, ’IL1B’, ’SET’, ’GPNMB’, ’HTRA1’, ’BNIP3L’, ’PMP22’, ’SERPINF1’, ’PRKAR1A’, ’NFKB1’, ’IL2RB’, ’ID3’, ’PPP1R15A’, ’EMP3’, ’IL8’, ’SPHK1’, ’CRYAB’, ’HSPA1A’, ’MCL1’, ’TWIST1’, ’RGS5’, ’ETS1’, ’RGS2’, ’MSX1’, ’ANXA1’, ’ANXA5’, ’NPM1’, ’RGS16’͔ GO:0043069 negative regulation of 19 110 208 6762 2.09E-08 7.68 30 ͓’PEA15’, ’SERPINB9’, ’CCL2’, ’IL6’, ’TPT1’, programmed cell death ’TNFAIP3’, ’VEGFA’, ’CD74’, ’BAG3’, ’BNIP3L’, ’NFKB1’, ’IL2RB’, ’SPHK1’, ’CRYAB’, ’HSPA1A’, ’MCL1’, ’ANXA1’, ’ANXA5’, ’NPM1’͔ GO:0005576 extracellular region 78 460 149 6412 9.65E-33 32.02 132 ͓’FXYD6’, ’LIF’, ’IGHG1’, ’CPZ’, ’MFAP4’, ’TIMP2’, ’ADAMTS4’, ’TIMP1’, ’INHBA’, ’FMOD’, ’CPE’, ’BGN’, ’CXCL14’, ’CXCL1’, ’CXCL3’, ’CXCL2’, ’CXCL5’, ’FN1’, ’CCDC80’, ’PLTP’, ’CCL3L3’, ’CCL2’, ’METRNL’, ’PTGDS’, ’IL6’, ’FBLN1’, ’THBS2’, ’THBS1’, ’PCOLCE’, ’GSN’, ’DCN’, ’TPT1’, ’EFEMP1’, ’IGFBP6’, ’IGFBP7’, ’COL4A1’, ’ADFP’, ’CTSL1’, ’VEGFA’, ’COL4A2’, ’DPT’, ’SRGN’, ’IL1B’, ’LUM’, ’TNFSF9’, ’C3’, ’C7’, ’CST7’, ’SPARC’, ’HTRA1’, ’IL7R’, ’SPARCL1’, ’SERPINF1’, ’LGALS1’, ’SFRP4’, ’CCL19’, ’RARRES2’, ’COL1A2’, ’COL1A1’, ’IL8’, ’TNFAIP6’, ’ECM1’, ’COL3A1’, ’SPON2’, ’LTBP4’, ’C1S’, ’APOE’, ’CFH’, ’CFD’, ’COL14A1’, ’CALU’, ’MMP3’, ’MMP9’, ’COL6A1’, ’C1R’, ’COL6A3’, ’COL6A2’, ’MMP2’͔ CD44 ϩ vs CD10ϩ GO:0005581 collagen 11 12 335 6536 3.42E-09 8.47 25 ͓’COL4A2’, ’COL4A1’, ’LUM’, ’COL5A3’, ’COL3A1’, ’COL6A1’, ’COL1A2’, ’COL1A1’, ’COL18A1’, ’COL6A3’, ’COL6A2’͔ GO:0007166 cell surface receptor linked 38 375 308 6173 0.000176622 3.75 239 ͓’PPAP2B’, ’IL6ST’, ’CXCL12’, ’EDNRA’, signal transduction ’EDNRB’, ’IL6’, ’SMAD3’, ’HTRA1’, ’SPARC’, ’GPR124’, ’ITGB5’, ’ADAMTS4’, ’ADRA2A’, ’CSF3’, ’ADAM33’, ’CCL2’, ’IGF2’, ’PDGFRB’, ’ECGF1’, ’TGFB3’, ’NOTCH3’, ’CALM2’, ’FMOD’, ’CX3CL1’, ’IL1B’, ’SFRP2’, ’RPSA’, ’COL1A2’, ’RGS5’, ’ECEL1’, ’C1S’, ’TIMP3’, ’ENPP2’, ’SOCS3’, ’STC1’, ’IL1R1’, ’CD59’, ’ELF1’͔

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 23 of 31 GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

GO:0030154 cell differentiation 62 657 284 5891 1.04E-05 4.98 1418 ͓’CDKN1A’, ’EDNRB’, ’IL6’, ’APOE’, ’HES4’, ’SMAD3’, ’NR2F2’, ’EPAS1’, ’KLF6’, ’SPON2’, ’AXUD1’, ’IER3’, ’RND1’, ’SLC2A3’, ’CSF3’, ’PAK2’, ’CCL2’, ’LAMB1’, ’CD74’, ’ECGF1’, ’IFI6’, ’ZFP36’, ’TGFB3’, ’MCL1’, ’ETS1’, ’NR4A2’, ’BCL6’, ’NOTCH3’, ’MMP9’, ’PLDN’, ’GADD45B’, ’CTSB’, ’NDRG1’, ’NDRG2’, ’JUN’, ’MAFG’, ’APOLD1’, ’SRGN’, ’IL1B’, ’FOSL2’, ’LGALS1’, ’SFRP2’, ’FOXO3’, ’ALDH1A3’, ’ANPEP’, ’LDB1’, ’BCL3’, ’TIMP3’, ’BOC’, ’SERPINB9’, ’SOCS3’, ’F11R’, ’HMOX1’, ’TNFAIP3’, ’TNFAIP2’, ’BTG1’, ’NFKB1’, ’ZFP36L1’, ’EMP1’, ’STEAP4’, ’PPP1R15A’, ’TNN’͔ GO:0030247 polysaccharide binding 8 37 338 6511 0.001564761 2.81 11 ͓’APOE’, ’CD44’, ’CCL8’, ’COL5A3’, ’HAPLN3’, ’POSTN’, ’THBS4’, ’TNFAIP6’͔ GO:0009611 response to wounding 30 139 316 6409 8.62E-10 9.06 227 ͓’CXCL12’, ’IL6’, ’TFPI’, ’APOL2’, ’FBN1’, ’CCL2’, ’CCL4’, ’CCL8’, ’IGFBP4’, ’CD93’, ’FABP4’, ’ZFP36’, ’MGLL’, ’BCL6’, ’CTSB’, ’CX3CL1’, ’SERPING1’, ’IL1B’, ’CHST2’, ’ATRN’, ’C1S’, ’C1R’, ’CCL3L3’, ’IL1R1’, ’PROCR’, ’F11R’, ’TNFAIP6’, ’NFKB1’, ’CD59’, ’SELE’͔ GO:0006935 chemotaxis 9 44 337 6504 0.001135384 2.94 58 ͓’CXCL12’, ’CCL2’, ’CCL4’, ’CCL8’, ’ECGF1’, ’CX3CL1’, ’IL1B’, ’ENPP2’, ’CCL3L3’͔ GO:0006950 response to stress 50 411 296 6137 1.15E-07 6.94 476 ͓’ERRFI1’, ’CXCL12’, ’CDKN1A’, ’IL6’, ’APOE’, ’SMAD3’, ’TFPI’, ’EPAS1’, ’APOL2’, ’FBN1’, ’ADRA2A’, ’REV3L’, ’CCL2’, ’CCL4’, ’CCL8’, ’IGFBP4’, ’CD93’, ’FABP4’, ’CLIC1’, ’ZFP36’, ’MGLL’, ’BCL6’, ’GADD45B’, ’CTSB’, ’DNAJA1’, ’HIF1A’, ’CX3CL1’, ’SERPING1’, ’IL1B’, ’CHST2’, ’ATRN’, ’COL1A1’, ’SEPP1’, ’C1S’, ’C1R’, ’BCL3’, ’C19orf40’, ’CCL3L3’, ’PXDN’, ’IL1R1’, ’PROCR’, ’F11R’, ’TNFAIP6’, ’HMOX1’, ’NFKB1’, ’DUSP1’, ’CD59’, ’PPP1R15A’, ’SHANK3’, ’SELE’͔ GO:0048513 organ development 60 406 286 6142 2.95E-12 11.53 1590 ͓’PPAP2B’, ’TCF12’, ’EDNRB’, ’IL6’, ’COL4A2’, ’SMAD3’, ’NR2F2’, ’SPARC’, ’EPAS1’, ’KLF6’, ’FBN1’, ’ADAMTS4’, ’CRISPLD2’, ’CSF3’, ’CCL2’, ’LAMB1’, ’PLXDC1’, ’IGF2’, ’IGF1’, ’IGFBP4’, ’GJA4’, ’CD74’, ’TPM4’, ’ECGF1’, ’COL5A3’, ’ZFP36’, ’TGFB3’, ’COL3A1’, ’ETS1’, ’LMNA’, ’BCL6’, ’NOTCH3’, ’MMP9’, ’DCN’, ’POSTN’, ’HECA’, ’MAFG’, ’APOLD1’, ’SRGN’, ’IL1B’, ’LGALS1’, ’CHRD’, ’COL1A2’, ’COL1A1’, ’SEPP1’, ’FOXO3’, ’ALDH1A3’, ’ANPEP’, ’ZBTB7A’, ’COL18A1’, ’LDB1’, ’COL6A3’, ’BCL3’, ’BOC’, ’HMOX1’, ’TNFAIP2’, ’BTG1’, ’ZFP36L1’, ’EMP1’, ’MSX1’͔ GO:0001664 G-protein-coupled 6 21 340 6527 0.00184541 2.73 220 ͓’CXCL12’, ’CCL2’, ’CCL4’, ’CCL8’, ’CX3CL1’, receptor binding ’CCL3L3’͔ GO:0006954 inflammatory response 25 94 321 6454 5.91E-10 9.23 149 ͓’CXCL12’, ’IL6’, ’APOL2’, ’CCL2’, ’CCL4’, ’CCL8’, ’IGFBP4’, ’FABP4’, ’ZFP36’, ’MGLL’, ’BCL6’, ’CX3CL1’, ’SERPING1’, ’IL1B’, ’CHST2’, ’ATRN’, ’C1S’, ’C1R’, ’CCL3L3’, ’IL1R1’, ’PROCR’, ’F11R’, ’TNFAIP6’, ’NFKB1’, ’SELE’͔

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 24 of 31 GO࿝id GO࿝id GO࿝term a b c d pVal Score scanned Gene list

GO:0006955 immune response 43 193 303 6355 4.32E-14 13.36 645 ͓’IL6ST’, ’B2M’, ’CXCL12’, ’TCF12’, ’IL6’, ’NFIL3’, ’HLA-DRA’, ’FCGRT’, ’SPON2’, ’HLA-DRB1’, ’CSF3’, ’CCL2’, ’CCL4’, ’CCL8’, ’AZGP1’, ’CD74’, ’IFI6’, ’HLA-DPA1’, ’ETS1’, ’BCL6’, ’TRIM22’, ’CX3CL1’, ’CD276’, ’SERPING1’, ’IL1B’, ’FTH1’, ’TAPBP’, ’IFITM2’, ’C1S’, ’C1R’, ’HLA-DQA1’, ’BCL3’, ’CCL3L3’, ’PXDN’, ’HLA-C’, ’HLA-B’, ’HLA-A’, ’HLA-E’, ’IL1R1’, ’PROCR’, ’HMOX1’, ’CD59’, ’ELF1’͔ GO:0007155 cell adhesion 41 240 305 6308 3.30E-10 9.48 94 ͓’CXCL12’, ’TGFBI’, ’RGMA’, ’LAMA4’, ’SPON2’, ’ITGB5’, ’CD44’, ’LGALS3BP’, ’RND1’, ’CCL2’, ’CCL4’, ’LAMB1’, ’CLDN4’, ’PSCD1’, ’CD99’, ’AZGP1’, ’CD93’, ’COL5A3’, ’HAPLN3’, ’VCL’, ’COL6A1’, ’BCL6’, ’POSTN’, ’FREM1’, ’CX3CL1’, ’LAMC3’, ’CPXM1’, ’ICAM1’, ’RPSA’, ’COL18A1’, ’COL6A3’, ’COL6A2’, ’BOC’, ’SRPX’, ’THBS4’, ’F11R’, ’TNFAIP6’, ’PLEKHC1’, ’EMILIN1’, ’TNN’, ’SELE’͔ GO:0048519 negative regulation of 51 517 295 6031 2.44E-05 4.61 1112 ͓’CDKN1A’, ’IL6’, ’TGFBI’, ’COL4A2’, ’ZEB2’, biological process ’SMAD3’, ’HTRA1’, ’BHLHB2’, ’ARID5B’, ’LMCD1’, ’IER3’, ’RND1’, ’CCL2’, ’STARD13’, ’IGFBP7’, ’AZGP1’, ’CD74’, ’FABP4’, ’DAB2IP’, ’IFI6’, ’ZFP36’, ’TGFB3’, ’MCL1’, ’ETS1’, ’VCL’, ’BCL6’, ’NOTCH3’, ’TNIP1’, ’JUN’, ’TBX2’, ’SRGN’, ’IL1B’, ’FTH1’, ’RGS5’, ’ZBTB7A’, ’COL18A1’, ’LDB1’, ’BCL3’, ’GSN’, ’SERPINB9’, ’SOCS3’, ’CCL3L3’, ’HMOX1’, ’TNFAIP3’, ’FOSB’, ’BTG1’, ’NFKB1’, ’RPS14’, ’PPP1R15A’, ’ELF1’, ’MSX1’͔ GO:0043069 negative regulation of 14 116 332 6432 0.005426849 2.27 30 ͓’CDKN1A’, ’IL6’, ’IER3’, ’CCL2’, ’CD74’, ’IFI6’, programmed cell death ’MCL1’, ’BCL6’, ’BCL3’, ’SERPINB9’, ’SOCS3’, ’HMOX1’, ’TNFAIP3’, ’NFKB1’͔ GO:0005576 extracellular region 92 423 254 6125 2.19E-29 28.66 132 ͓’IL6ST’, ’B2M’, ’CXCL12’, ’ANGPTL2’, ’IL6’, ’APOD’, ’APOE’, ’TGFBI’, ’COL4A2’, ’COL4A1’, ’PSAP’, ’ADFP’, ’HTRA1’, ’SPARC’, ’TFPI’, ’LAMA4’, ’FAM20C’, ’CRTAP’, ’APOL2’, ’SPON2’, ’FBN1’, ’LGALS3BP’, ’ADAMTS4’, ’ADAMTS9’, ’CRISPLD2’, ’CSF3’, ’CCL2’, ’CCL4’, ’CCL8’, ’LAMB1’, ’PLXDC1’, ’IGF2’, ’IGF1’, ’IGFBP7’, ’IGFBP4’, ’IGFBP5’, ’AZGP1’, ’LUM’, ’ECGF1’, ’COL5A3’, ’CST3’, ’HAPLN3’, ’TGFB3’, ’COL3A1’, ’SMOC2’, ’CD248’, ’VASN’, ’HEG1’, ’ST3GAL4’, ’COL6A1’, ’MMP9’, ’BGN’, ’DCN’, ’FMOD’, ’CTSB’, ’CTSD’, ’POSTN’, ’SEMA3G’, ’FREM1’, ’CX3CL1’, ’LAMC3’, ’APOLD1’, ’CTSL1’, ’SERPING1’, ’SRGN’, ’IL1B’, ’CPXM1’, ’ATRN’, ’LGALS1’, ’SFRP2’, ’CHRD’, ’COL1A2’, ’COL1A1’, ’SEPP1’, ’ECM1’, ’COL18A1’, ’C1S’, ’C1R’, ’COL6A3’, ’COL6A2’, ’TIMP3’, ’GSN’, ’ENPP2’, ’STC1’, ’CCL3L3’, ’HLA-C’, ’THBS4’, ’TNFAIP6’, ’TNFAIP2’, ’EMILIN1’, ’MMP7’, ’TNN’͔

List of genes differentially expressed or methylated between the indicated cell types grouped into the indicated functional categories.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 25 of 31 Table S5. Summary of cell culture conditions tested Plating Medium CD44ϩ CD10ϩ MUC1ϩ CD24ϩ

2D, collagen Medium 1 growing, tight teardrop growing, tight not growing well not growing shape or mesenchymal teardrop-shape cell shape. Few, luminal cell colonies- tight arrangement of cell (sometimes more tight in the periphery than those in the center, flat, square shape. Medium 2 not growing not growing not growing not growing Medium 3 growing, luminal cell not growing growing, luminal cell growing, luminal cell colonies- tight colonies- tight colonies- tight arrangement of cell arrangement of cell arrangement of cell (sometimes more tight in (sometimes more tight in (sometimes more tight in the periphery than those the periphery than those the periphery than those in the center, flat, square in the center, flat, square in the center, flat, square shape. shape. shape. 2D, 3T3 feeder Medium 1 growing, tight teardrop growing tight growing, luminal cell not growing shape or mesenchymal teardrop-shape colonies- tight cell shape.Few, luminal arrangement of cell cell colonies- tight (sometimes more tight in arrangement of cell the periphery than those (sometimes more tight in in the center, flat, square the periphery than those shape. in the center, flat, square shape. Medium 2 not growing not growing growing, luminal cell not growing well colonies- tight arrangement of cell (sometimes more tight in the periphery than those in the center, flat, square shape. Medium 3 growing, luminal cell not growing growing, luminal cell growing, luminal cell colonies- tight colonies- tight colonies- tight arrangement of cell, arrangement of cell arrangement of sometimes more tight in (sometimes more tight in cell,sometimes more the periphery than those the periphery than those tight in the periphery in the center, flat, square in the center, flat, square than those in the center, shape shape. flat, square shape 3D, Matrigel Medium 1 growing, small and big growing, big growing, small spherical not growing sperical structure spherical structure structure Medium 2 not growing not growing growing, small spherical not growing structure Medium 3 not growing not growing not growing not growing

All cells were grown at low (5%) oxygen.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 26 of 31 Table S9. Statistical analysis of the association between methylation of the indicated genes and breast tumor subtypes PACAP HOXA10 FOXC1 SLC9A3R1

Tumor subtype Mean No Mean No Mean No Mean No

Total 3791 62 1166 74 172 46 216 62 Luminal 5622 39 1566 54 267 28 237 39 HER2ϩ 293 10 52 8 10 10 131 11 Basal 987 13 108 12 38 8 224 12 p-value 0.0005 0.0006 0.06 0.99

Mean qMSP values and number of cases that have both gene qMSP and ER/PR/HER2 information available are listed.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 27 of 31 Table S11. Statistical significance of enrichment of genes differentially methylated (A) or expressed (B) among the four different cell types in the listed gene ontology (GO) term categories CD44 ϩ CD44 ϩ CD44 ϩ N5-CD44 ϩ N2.CD44 GO࿝term vs Muc1ϩ vs CD10ϩ vs CD24ϩ vs CD10ϩ vs CD24 index

Differentially methylated genes organ development 1.49 0.25 0.1 cytoplasm 0.29 0.46 2.06 nucleolus 0 0.74 2.26 nuclear part 0.08 1.44 1.33 nucleocytoplasmic transport 0 1.98 1.01 regulation of transcription, DNA-dependent 1.43 0.04 0.06 transcription factor binding 0.59 2.11 1.48 RNA binding 0 1.33 1.58 sequence-specific DNA binding 1.78 0.19 0.69 negative regulation of biological process 1.68 0.93 0.13 negative regulation of cell cycle 1.42 1.5 0.96 Expressed genes G-protein-coupled receptor binding 7.07 4.1 7.58 1.36 15 cell surface receptor linked signal transduction 11.74 10.4 8.59 2.84 14 negative regulation of programmed cell death 7.68 4.5 11.2 3.19 13 negative regulation of biological process 14.39 10.83 1.77 7.71 12 polysaccharide binding 5.46 7.19 11.99 4.62 11 cell adhesion 3.05 3.86 4.23 4.16 10 chemotaxis 7.94 5.29 11.72 4.58 9 response to wounding 13.06 12.8 11.48 7.09 8 inflammatory response 13.54 12.64 13.19 5.71 6 immune response 20.09 19.25 29.48 7.05 5 cell differentiation 8.59 10.04 4.3 5.71 4 organ development 8.83 10.9 11.16 14.41 3 collagen 9.37 10.34 4.55 9.03 2 extracellular region 32.02 38.34 34.69 19.89 1

GO terms and IDs are listed. Statistically significant P values for differentially methylated genes are in bold; P values for expressed genes are all statistically significant.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 28 of 31 Table S12. Functional enrichment analysis of MSDK and SAGE data Pathway maps GO processes GeneGo processes

MSDK Chemokines and adhesion neutrophil chemotaxis Chemotaxis EphB receptors in dendritic spine cellular extravasation Cell adhesion࿝Attractive and repulsive morphogenesis and synaptogenesis receptors TGF, WNT and cytoskeletal remodeling embryonic placenta development Cell adhesion࿝Leucocyte chemotaxis Cytoskeleton remodeling leukocyte chemotaxis Cytoskeleton࿝Regulation of cytoskeleton rearrangement Immunological synapse formation localization of cell Development࿝Neurogenesis:Axonal guidance Slit-Robo signaling cell motility Cell adhesion࿝Integrin-mediated cell-matrix adhesion Role of Nek in cell cycle regulation steroid signaling Transcription࿝Nuclear receptors pathway transcriptional regulation Reverse signalling by ephrin B leukocyte adhesion Inflammation࿝Neutrophil activation Plasmin signaling regulation of neurological process Cell adhesion࿝Integrin priming, Keratin filaments -mediated signaling Inflammation࿝IL-6 signaling pathway SAGE Keratin filaments, translation Translation࿝Translation initiation Cytoskeleton remodeling sarcomere organization Cytoskeleton࿝Intermediate filaments TGF, WNT and cytoskeletal remodeling, morphogenesis of an epithelium Translation࿝Elongation-Termination Endothelial cell contacts by junctional epidermis development Cell adhesion࿝Cell junctions mechanisms Chemokines and adhesion epithelial cell differentiation Reproduction࿝Male sex differentiation Lectin Induced complement pathway myofibril assembly Cell cycle࿝G2-M Leptin signaling via JAK/STAT and MAPK striated muscle cell development Reproduction࿝Spermatogenesis, motility and cascades, copulation IL3 activation and signaling pathway muscle cell development Inflammation࿝Complement system GTP-XTP metabolism cell differentiation Cell adhesion࿝Cell-matrix interactions Alternative complement pathway cellular developmental process Translation࿝Regulation of initiation

Only genes that demonstrated statistically significant (P Ͻ 0.05) differences among the four cell types were included in this analysis.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 29 of 31 Table S13. Functional enrichment analysis of SAGE data Canonical pathway maps GO processes GeneGo processes

CD10ϩ Regulation of translation initiation, cellular component assembly, Transcription࿝mRNA processing, Cytoskeleton remodeling, cellular component organization and Proteolysis࿝Ubiquitin-proteasomal biogenesis, proteolysis, TGF, WNT and cytoskeletal remodeling, embryonic development (sensu Mammalia), Reproduction࿝Feeding and Neurohormones signaling, Endothelial cell contacts by non-junctional mRNA metabolic process, Cytoskeleton࿝Actin filaments, mechanisms, Regulation activity of EIF2, macromolecule complex assembly, Development࿝Skeletal muscle development, Role of tetraspanins in the mRNA processing, Cell adhesion࿝Integrin-mediated cell-matrix integrin-mediated cell adhesion, adhesion, Non-genomic (rapid) action of Androgen embryonic development (sensu Vertebrata), Cell adhesion࿝Cadherins, Receptor, Insulin regulation of the protein synthesis, protein-RNA complex assembly, Translation࿝Regulation of initiation, Chemokines and adhesion, double-strand break repair via homologous Muscle contraction, recombination, Integrin-mediated cell adhesion recombinational repair Cytoskeleton࿝Intermediate filaments CD24ϩ Insulin regulation of fatty acid methabolism, cell-substrate junction assembly, Translation࿝Regulation of initiation, Insulin signaling:generic cascades, translation, Signal transduction࿝Insulin signaling, Insulin regulation of the protein synthesis, DNA topological change, Cell adhesion࿝Integrin-mediated cell-matrix adhesion, TGF, WNT and cytoskeletal remodeling, biosynthetic process, Inflammation࿝Amphoterin signaling, Role of IAP- in apoptosis, regulation of cellular biosynthetic process, Inflammation࿝TREM1 signaling, Leucune, isoleucine and valine metabolism, transcription-coupled nucleotide-excision Signal transduction࿝ESR1-nuclear pathway, repair, Oxidative phosphorylation, protein-RNA complex assembly, Cell adhesion࿝Glycoconjugates, Caspases cascade, cellular component assembly, Signal transduction࿝ signaling cross-talk, Receptor-mediated HIF regulation, cholesterol biosynthetic process, Proteolysis࿝Ubiquitin-proteasomal proteolysis, Regulation activity of EIF2 cell-substrate adhesion Apoptosis࿝Apoptotic mitochondria CD44ϩ ECM remodeling, immune response, Cell adhesion࿝Cell-matrix interactions, Cytoskeleton remodeling, multicellular organismal process, Proteolysis࿝ECM remodeling, Classic complement pathway, immune system process, Inflammation࿝IL-4 signaling, Chemokines and adhesion, response to stimulus, Immune࿝BCR pathway, TGF, WNT and cytoskeletal remodeling, patterning of blood vessels, Proteolysis࿝Connective tissue degradation, Integrin-mediated cell adhesion, germ cell migration, Cell adhesion࿝Platelet-endothelium- leucocyte interactions, Plasmin signaling, blood vessel development, Development࿝Blood vessel morphogenesis, CXCR4 signaling pathway, vasculature development, Proliferation࿝Negative regulation of cell proliferation, PLAU signaling, embryonic development, Development࿝Ossification and bone remodeling, Leukocyte chemotaxis skeletal development Development࿝Cartilage development MUC1ϩ CD28 signaling, regulation of granulocyte differentiation, Proteolysis࿝Ubiquitin-proteasomal proteolysis, IP3 signaling, positive regulation of granulocyte Transcription࿝mRNA processing, differentiation, Oxidative phosphorylation, positive regulation of myeloid cell Development࿝Hemopoiesis, Erythropoietin differentiation, pathway, Angiotensin activation of Akt, regulation of myeloid leukocyte Signal Transduction࿝TGF-beta, GDF and differentiation, Activin signaling, TCA, granulocyte differentiation, Transcription࿝Chromatin modification, Calcium signaling, positive regulation of myeloid leukocyte Inflammation࿝IL-2 signaling, differentiation, NTS activation of IL-8 in colonocytes, regulation of myeloid cell differentiation, Inflammation࿝IFN-gamma signaling, ICOS-ICOSL pathway in T-helper cell, female gonad development, Reproduction࿝Progesterone signaling, NFAT in immune response, positive regulation of hormone secretion, Proliferation࿝Lymphocyte proliferation, Function in T lymphocytes female sex differentiation Inflammation࿝IL-10 anti-inflammatory response

Genes with normalized tag counts Ͼ 5 per 50,000 total tags in at least one of the SAGE libraries were included in this analysis.

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 30 of 31 Other Supporting Information Files

Table S3

Table S6

Table S7

Table S8

Table S10

Bloushtain-Qimron et al. www.pnas.org/cgi/content/short/0805206105 31 of 31