Stromal Microenvironment Processes Unveiled by Biological Component Analysis of Expression in Xenograft Tumor Models

Xinan Yang, PhD1,§, Younghee Lee, PhD1,§, Yong Huang, MD, MSc1, James L Chen, MD2, H. Rosie Xing, PhD3,4* and Yves A Lussier, MD1,4,5* 1Sect. Genetic Medicine, and 2Sect. of Hematology/Oncology, Dept. of Medicine;3Depts. of Pathology and of Radiation and Cellular Oncology; 4UC Comprehensive Cancer Ctr; Ludwig Ctr for Metastasis Research; 5Inst. of Genomics and Systems Biology; Inst. for Translational Medicine; Comput. Inst., University of Chicago, IL, USA §Equally contributed authors * Corresponding authors Abstract which have conflated expression levels, and it does little to elucidate the tumor microenvironment. Mouse xenograft models, in which human cancer cells are implanted in immune-suppressed mice, have In this paper, we describe a computational method of been popular for studying the mechanisms of novel exploiting interspecies differences in the mouse therapeutic targets, tumor progression and tumor xenograft model, in which human cancer cells metastasis. We hypothesized that we could exploit the are grown in immune-suppressed mice. This model is interspecies genetic differences in these experiments popular for studying the mechanisms of preclinical to elucidate stromal microenvironment signals from drug trials, tumor progression and the development of probes on human arrays unintentionally cross- metastasis. Mouse xenografts consist of both human hybridizing with mouse homologous in tumor cells and mouse stromal tissues[5], thus xenograft tumor models. By identifying cross-species differential genes expression derived from human hybridizing probes from sequence alignment and expression arrays are generally implicitly attributed cross-species hybridization experiment for the human to the cancer cells[6, 7]. However, cross- whole-genome arrays, deregulated stromal genes can hybridization of human chip probes with homologous then be identified and their biological significance mouse genes may result in a mixed gene expression was imputed from enrichment studies. Comparing signal, where deregulated mice stromal genes in these results with those found by the laser capture xenograft tumor models are being jointly measured microdissection of stromal cells from tumor along with the human cancer genes. Functional specimens, resulted in the discovery of significantly categories ( terms) enriched, to date, enriched stromal biological processes. Using this in the gene expression of whole tumor xenografts method, researchers can leverage xenograft neglected the impact of cross-species hybridization. experiments to, in addition to their primary Tumor/stromal interactions have been examined by endpoints, better characterize the tumor modeling the diffusion of the nutrients in the stroma, microenvironment without additional cost. invasion of single cells into the stroma, and lussierlab.org/publication/Stroma characterizing the function of the stromal elements[8- Introduction 10]. To our knowledge, no studies have specially focused on identifying mouse stromal signals in Characterizing the tumor microenvironment is human gene arrays; unsurprisingly, a few groups essential as it relates to clinical prognosis, metastatic have identified cross-species signals in multiple potential, and treatment-related outcome[1]. species chips[11, 12]. We and others have designed Innovations in cell labeling techniques and small- pan-viral arrays comprising species-probes for over animal in vivo imaging have enabled investigators to 1000 species on one chip for diagnostic purposes[12, phenotype the microenvironment and cancer cell 13]. In collaboration with others, we have interactions. However, genome-wide expression demonstrated that statistically-based gene enrichment analyses of the tumor and its microenvironment have approaches of these pan-viral arrays were effective in not kept pace. Teasing apart cancer cells from increasing the species-specific signal[14]. The stromal tissues in whole tissue expression requires xenograft tumor expressed on human arrays present a time consuming and costly laser microdissection of related, multiple species problem, as human probes the tumor before RNA extraction[2, 3]. were not designed to be species specific and thus the Computational methods previously have been cross-hybridization is equally (if not significantly proposed as means of subtraction out the stromal more important than) in panmicrobial arrays. signal[4]. Yet this method simply eliminates genes We hypothesized that we could identify stromal Agilent 44k whole oligo microarray microenvironment signals (Gene Ontology (GO) was obtained from the Gene Expression Omnibus[17] biological processes) from deregulated genes using (GEO:GPL6480; Suppl. Methods, Table A). the human arrays probes unintentionally cross- Simulating a Xenograft Model with Human and hybridizing with mouse homologs. This assumption Mouse RNA. In order to biologically determine is based on the observation that 1) the majority of the cross-hybridizing probes, the human and mouse cross-hybridizing probes designed for human genes universal reference RNAs were obtained from also target their mouse homologs, and 2) the majority ArrayIt (Sunnyvale, CA). of gene Gene Ontology[15] annotations for human Array Selection. For this study, we selected the and mouse homologs are identical (S. Methods). Agilent 44k whole human genome oligo microarray, Thus, in this paper, we design an unbiased method to which allows custom design. The Agilent microarray identify and optimize the choice of cross-species is designed with one or more 60-mer probes per gene hybridizing probes. To test out method, we compared rather than the more complex Affymetrix probeset the Gene Ontology terms enriched in cross- containing multiple shorter (25-mer) probes voting hybridizing probes against those derived from for each gene (S. Methods, array processing). microdissected stromal components. Determining Probes of Human Arrays Cross- Methods Hybridizing with Mouse RNA and the Proportion We developed a two-stage method for deriving the of Homologous Genes between Species. Cross- underlying biology of the tumor stromal species hybridizing (Xhyb) probes between human microenvironment that we call “biological and mouse RNA were determined by combining a component analysis” which is designed to enrich the computational method (BLAST[18]) and a biological gene expression signals in the stromal component method in which the channels of two arrays were separately from the cancer cell component of the independently exposed to either (i) mouse RNA, tumor xenograft (Figure 1). (ii) human RNA, or (iii) both. For simplicity, these genes are hereafter referred to as “Xhyb genes” Array Analysis and Xhyb probe Identification whose probes were highly expressed when exposed Brain Xenograft Agilent 44k Human microarray 78 genes dys-exp. to mouse universal RNAs but lower when exposed to 4.9k Xhyb + 14.8k remaining genes human universal RNAs. From the identified genes

4 GO enrichment Analysis: with only cross-species hybridizing probes, the a)Full dys-exp. vs 20k genes of Agilent array 14 Xhyb genes b)Dys-exp. Xhyb vs 4.9k Xhyb proportion of homologous genes between the two •NCBI c)Dys-exp. remainder vs 14.8 remainder 65 remainder Homologs db d)Dys-exp. Homo-Xhyb vs 4.9k Xhyb species were calculated. This last step was important •BLASTN Full Xhyb 17 3 8 to justify the GO enrichment studies conducted over 0 9 0 cross-hybridizing probes of the human array that 6 Remainder (A) Xhyb genes enriched in Stromal GO, were used to produce sets of GO terms describing the but not the remaining genes biological processes in the stroma (S. Methods). The Stromal Gold Standard 59 9 61 15 normalization model of microarray gene expression 2 No p=0.007 Overlapping on a log scale, noted as y, in this xenograft model is: GO classification by Expert: y  (  c )  (  c )   stromal GO, cancer GO, neither, i, j iHs jHs iMm jMms i, j (1), unknown (for p<0.01 GO terms) (B) where μ represents an overall mean value for a probe Figure 1. Description of the Biological Component j, originally designed for its Homo sapiens (Hs) target Analysis. Part A: To derive the possible contribution of (noted as  ), or a Mus musculus (Mm) homolog if the stromal microenvironment from the cross-hybridizing iHs cross-hybridizing (noted as  ). The notation c is probes, we studied gene lists derived from xenograft iMm models from the literature that used the same microarray the main effect of a clinical phenotype j, additive to platform. Robust GO enrichment analyses[16] were the probe expression. The notation ε is the conducted over (i) all genes differentially expressed between the experimental conditions, (ii) the cross- stochastic error[19] that includes same species cross- hybridizing subset, or (iii) the remaining genes (A, Top species hybridizing non-homologous cross-species Venn Diagram). Part B: We evaluated the cross- hybridizing, and other technological errors. Because hybridizing genes enriched GO terms with expert classified the four-array experiment was performed under the GO terms and with the gold standards from literature exact same conditions, we ignored the array effect (Bottom Venn Diagram). Legend: Hs: Homo sapiens; Mm: and array interaction effect in the data extracted by Mus musculus; P: Probe; GS: gold standard. the Agilent Feature Extraction Software for the Datasets. The GO annotations and gene homologs Agilent 4x44k array (4 arrays per slide). Thus in this for the human genome were downloaded from the case of cross-species hybridization, we can assume NCBI. Microarray platform information for the that the distribution of the term (  c ) is iHs jHs

stochastic, centered at zero and thus part of the error Determination of Statistical Significance of ε. Consequently, the equation for the expression of a Enriched Cross-Species GO Terms: To find the cross-hybridizing probe becomes: overrepresented biological processes, the [y ]  (  c ) [  (  c )] Gene ID for annotated probes were inputted into the i, j Xhyb iMm jMm i,i iHs jHs ,    c   Bioconductor package GOstats[16] to evaluate the iMm jMm i, j enrichment of biological process (BP) of GO terms in where ε’ represents stochastic errors.Therefore, the four ways: a) the full literature-reported gene lists differential expression of an Xhyb probe between two was compared to the entire gene set on the 44k phenotypes a and b is really a stromal gene response Agilent microarray; b) the Xhyb subset of the rather than a cancer cell response on a log scale: reported genes were compared to all Xhyb genes in [FCi ]Xhyb  [yi,a ]Xhyb [yi,b ]Xhyb the microarray that we had identified, c) the  (  c   ' )  (  c   ' ) . remaining reported genes were compared to the iMm aMm i,a iMm bMm i,b remaining genes in the microarray, and d) the  c  c  " aMm bMm i,a,b reported genes whose probes cross-hybridize mouse We make standard stochastic assumptions about the homolog compared to all the genes with cross- errors ε, ε’ and ε”. As shown in S. Fig. 1, when Xhyb hybridized probes on the array. Note that since we human probes are exposed to both human and mouse could not know which cross-hybridizing probes RNA, their major expression levels reflect the targeted their homologs on the whole array without expression of their homologous mouse genes, thus experimental evidence, this test used all genes with the GO terms enriched in human gene targets cross-hybridizing probes as the background, resulting represent the stromal response of their homologous in an estimated statistic. Alternatively, approach b) mouse genes. In contrast, in the remaining uses an evaluation by proxy based on the observation deregulated genes, the majority of gene expressions that the majority of mouse GO Annotations were were contributed by designed targets which reflect identical to human GO Annotations, and the majority the tumor biological process of human tumor tissues. of the cross-hybridizing genes target its mouse A stromal GO term is contributed by genes where homologs (S. Methods). For each way of evaluation, cross-species hybridization is more likely to occur as a “conditional” enrichment analysis, where presented in S Tab. 1. Thus by selecting genes that conditional refers to the decorrelation of the GO are differentially expressed between phenotypes and graph structure instead of treating each term as also Xhyb, we identify a stromal-specific GO term independent from all other terms[16]. This that can distinguish among phenotypes. This is “conditional” hypergeometric test allows robust because that each of the hybridizing probes of human results from small gene lists because it follows a array expresses only when it was exposed to mouse bottom-up testing method. This method tests the RNA. Thus the differentially expressed probe leaves of the GO graph, then it removes all genes between phenotypes reflects the dysregulation of its annotated at significant child-terms from the parent- homologous mouse target rather than the stochastic term’s gene list before testing the terms whose effect or cancer effect, since its human target is not descendant have already been tested. expressed. Evaluation of the Methodology (Figure 1B). We Stromal Effect Among Literature Reported Gene validated our methodology using two independently Lists (Figure 1A). Xenograft models of human published stromal-associated biological processes cancer and mouse stroma are generally analyzed derived from microarray data as “gold standards”[2, using human arrays for deregulated genes, and the 3]. The Fisher’s exact test was performed on these expression profiles are generally attributed to the “gold standards” and “Xhyb genes” overrepresented human gene’s biological processes or function. To GO terms. This was based on the assumption that the assess the stromal effect among the reported tumor stromal response to human cancer has shared contributing genes, we searched the PubMed resource phenotype and thus shared Gene Ontology by keywords “xenograft, Agilent, cancer, gene annotation, and thus could be applied to cancer study. expression”. Two publication fit this criteria using In particular, the first gold standard (GS1) contained the same platform (GEO:GPL6480) as shown in 61 differentially expressed breast stromal genes with S. Tab. 2. A glioblastoma[20] study was selected for their associated enrichments identified by comparing further GO enrichment analysis as it contained stromal tissues (adjusted to cancer) with surrounding enough reported genes (S. Tab. 3). The reported epithelium[3]. The second gold standard (GS2) genes whose probes cross-hybridize with mouse compared gene expression profiles of breast tumor RNAs (Xhyb) were thereafter searched using the stroma captured by laser capture microdissection, and available GenBank IDs or symbols. reported the overrepresented GO terms from clinical outcomes data divided into poor-prognosis, good-

prognosis and mixture-outcome, respectively[2]. The demonstrated by differences in angiogenesis and in 107 biological processes derived from the whole necrosis[20]. tumor with poor prognosis were used as the second Evaluation. The biological process terms associated gold standard because samples with poor prognosis with cross-hybridizing probes that targeted mouse are more representative for characterization of homologs (see Methods test d) significantly stromal signatures[2]. The network of the enriched overlapped with the known stromal gold standard GO biological process terms and those in the two terms as defined in the Methods section. In S. Fig. 2, gold standards were visualized, using Bioconductor four out of twelve GO terms overlapped with the GS1 packages Rgraphviz and GOstats. An additional (Fisher’s p=7x10-6). In contrast, none of the 15 GO evaluation for the union of GO terms that were terms associated with remaining genes matched the statistically significant (unadjusted conditional GS1 (S. Fig. 2A). Examining the second gold hypergeometric test p-value<0.01) was performed as standard, three Xhyb genes enriched GO terms S. Method described (S. Table 4). overlap with GS2 (Fisher’s p=0.001). On the other The Xhyb probes and R script can be downloaded hand, there was only one overlap from the remaining from www.lussierlab.org/publication/Stroma. gene enriched GO terms (p=0.26) (S. Fig. 2B). Moreover, an alternative evaluation focus on the Results and Discussion genes with cross-hybridizing probes (see Methods Identification of Xhyb probes. As shown in the left test b) also resulted in significant overlapping with plot in S. Methods, we examined the top 18% of the two gold standards (p-value=0.07 and 0.01 as BGS2 (notation BGS2.18) of the absolute expression presented in S. Fig. 3) for Xhyb gene enriched GOs. of mouse probes when exposed to human RNA Noteworthy, there was no overlap between Xhyb because of it had the highest F-score (precision=40%, gene enriched GO terms, and the whole tumor recall=25%) together with the 3rd theoretical annotations from the subset with good prognosis. prediction model (M-Xhyb3). We examined multiple This corroborates the hypothesis that biological conditions in the M-Xhyb3 and developed an process annotations associated with poor prognosis optimized model (Fig. A, Suppl. Methods). Using are contains more stromal components. this optimized model, we then adjusted the parameter Figure 3 visualizes the graphical relationship of the x of BGS2. This resulted in 6,200 Xhyb probes identified BP terms and the two gold standards (S. involving 5,300 mouse genes from GPL7202 and Table 8). The biological process “regulation of 5,900 Xhyb probes involving 4,900 human genes behavior” is of interested as a root term of 4 out of 7 from GPL6480, respectively. identified terms that also overlap with GS2. This may Xhyb genes found in published deregulated genes suggest that tumor stromal actions or reactions are in from xenograft models. The identified Xhyb subset response to external or internal stimuli through the of the literature-reported tumor deregulated genes are regulation of chemotaxis. Of note that 11 out of the listed in S. Table 3. Note that there were many genes 12 identified GO terms are the child nodes of the targeted by multiple Agilent probes. In such cases, stromal gold standard terms (S. Figure 4). genes were reported but with only one cross-mouse hybridizing probe, and we considered them as possible Xhyb genes. Details are given in S. Tab. 5- 6. Furthermore, among the 17 identified cross-mouse hybridizing human probes that were reported by literature in S. Tab. 5, 13 probes targeted their mouse homologs (S. Tab. 7), predicted by the BLASTN algorithm[18] with default parameters. Several biological processes associated with stromal cells were significantly (p<0.01) enriched only among the Xhyb deregulated genes. For example, it has been previously shown that implanted tumors developed intensive angiogenesis with vascular endothelial growth factor (VEGF) induction in stroma[21]. Accordingly, we found that positive regulation of vascular endothelial growth factor receptor signaling pathway was significantly enriched among the Xhyb Figure 3. The network of the Xhyb enriched BP terms and the two gold standards. A full network including all deregulated genes (p=0.004), suggesting a different identified BP terms are given in S. Fig. 4. stromal/tumor cell cross-talk and was further

In addition, as presented in S. Fig. 5a, deregulated which must be accounted for. An additional benefit Xhyb genes (cyan circles) are significantly better at of this methodology is that from a cost perspective; predicting biological processes of stromal cells when this in silico approach allows us to extract additional compared to the the remaining genes and the full stromal knowledge from current xenograft tumor reported gene list annotations in both studies. arrays without needing to perform further in vivo/in Conversely, S. Fig. 5b shows that the remaining vitro experiments. In future studies, we intend to deregulated genes (orange triangles) or the full gene examine these cross-species enrichment for list (blue squares) are better at predicting biological biological interactions between cancer cells and processes that are indicative of cancer cells. stromal cells, an important initiative of the National Limitations and future studies. This study focused Cancer Institute Tumor Microenvironment Think on human probes that were inadvertently cross- Tank[23]. hybridized with mouse RNAs. The confounded Funding & Acknowledgements. This work supported expression of 5,000 human and mouse genes can thus by the National Institute of Health (National Cancer be assessed; however, these measures are limited as Institute) 1U54CA121852 (National Center for Multiscale they are not genome-wide. The platform-specific Analyses of Genomic and Cellular Networks - MAGNET). cross-species hybridizing probes thus limits our study References as we are indirectly measuring the stromal [1] Joyce JA, Pollard JW. Microenvironmental regulation of microenvironment effects as compared to the direct metastasis. Nat Rev Cancer 2009;9: 239-52. [2] Finak G, et al. Stromal gene expression predicts clinical measurements that could arguably be obtained with outcome in breast cancer. Nat Med 2008;14: 518-27. species-specific mouse probes. [3] Finak G, et al. Gene expression signatures of morphologically normal breast tissue identify basal-like tumors. Breast Cancer For future studies, we plan to use knowledge Res 2006;8: R58. extracted from the literature to further refine the [4] Stuart RO, et al. In silico dissection of cell-type-associated interpretation, using BioMedLEE[22] in a high- patterns of gene expression in prostate cancer. PNAS 2004;101: throughput manner. Our current experiment 615-20. [5] Tlsty TD, Coussens LM. Tumor stroma and regulation of identified Xhyb probes in Agilent human microarrays cancer development. Annu Rev Pathol 2006;1: 119-50. and therefore can only reveal part of the stromal [6] Gouyer V, et al. Autocrine induction of invasion and signatures. A more complete view of stromal metastasis by tumor-associated trypsin inhibitor in human colon signatures in xenograft model should be investigated cancer cells. Oncogene 2008;27: 4024-33. [7] Klymkowsky MW, Savagner P. Epithelial-mesenchymal using mouse-specific probes. To further identify transition: a cancer researcher's conceptual friend and foe. Am J microenvironment factors associated with tumor Pathol 2009;174: 1588-93. progression and clinical prognosis, we are currently [8] Anderson AR, et al.Microenvironment driven invasion: a developing novel methodology to dissect cancer and multiscale multimodel investigation. J Math Biol 2009;58: 579- 624. stromal signature using xenograft model of head and [9] Sangar V, et al. Quantitative sequence-function relationships neck cancer. in based on gene ontology. BMC Bioinformatics 2007;8: 294. Conclusion [10] Wang C, et al. J Hepatol 2009;50: 1122-31. In this paper, we introduced a novel design to detect [11] Chiu CY, et al. J Clin Microbiol 2007;45: 2340-3. [12] Palacios G, et al. Emerg Infect Dis2007;13: 73-81. biological processes of the stroma using human [13] Jabado OJ, et al. Nucleic Acids Res 2008;36: e3. arrays with probes of which approximately 25% [14] Liu Y, Sam L, Li J, Lussier YA. Robust methods for accurate cross-hybridize with mouse genes. We also conclude diagnosis using pan-microbiological oligonucleotide that the majority of our identified cross-hybridized microarrays. BMC Bioinformatics 2009;10 Suppl 2: S11. [15] Camon E, et al. Genome Res 2003;13: 662-72. genes target their homologs. Although human and [16] Falcon S, Gentleman R. Using GOstats to test gene lists for mouse RNA gene expression are confounded in this GO term association. Bioinformatics 2007;23: 257-8. subset of probes, we have demonstrated that [17] Edgar R, et al. Nucleic Acids Res 2002;30: 207-10. appropriately performed Gene Ontology enrichment [18] Altschul SF, et al. J Mol Biol 1990;215: 403-10. [19] Wolfinger RD, et al. J Comput Biol 2001;8: 625-37. analysis identified significant stroma-associated GO [20] Sakariassen PO, et al. PNAS 2006;103: 16466-71. terms, using GO annotations from previous laser [21] Fujita M, et al. Carcinogenesis 2005;26: 271-9. capture microdissections of the stroma. These results [22] Lussier Y, Friedman C. BiomedLEE: a natural-language suggest that our method may be a reasonable processor for extracting and representing phenotypes, underlying molecular mechanisms and their relationships. In: alternative to laser microdissection in a specially ISMB.; 2007. designed chip. Our results also suggest that human [23] Tumor Microenvironment Think Tank chip probes cross-hybridizing with mouse genes are, (http://www.cancer.gov/think-tanks-cancer-biology). in fact, over-annotated with stroma-associated [24] Mason SJ, Graham NE. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) biological processes. This leads us to believe that curves. In: Meteorol QJR, editor. Statistical significance and xenograft gene expression studies contain interpretation; 2002, p. 2145-2166. confounded cell signals and stromal cell signals

List of Supplementary Material: Supplementary Methods Supplementary Figures S. Figure 1. A simulated visual model of differential genes expression in a xenograft experiment. S. Figure 2: Comparison of the gold standards of stromal biological process (BP) with BP terms enriched among the genes with cross-hybridizing probes that target mouse homologs (Homo-Xhyb), and among the remaining genes for glioblastoma deregulation. S. Figure 3. Comparison of the gold standards of stromal biological process (BP) with BP terms enriched among the genes with Xhyb probes and among the remaining genes for glioblastoma deregulation. S. Figure 4. Significantly enriched (p<1%) BP terms among the genes with mouse homologous cross- hybridizing probes, and their overlapping with two gold standards. S. Figure 5. Comparison of the overrepresented biological processes (BPs) derived from three GO enrichment tests for the reported gene list. Supplementary Tables S. Table 1: The relationship between deregulated genes and their contributed Gene Ontology in xenograft model, using human microarray. S. Table 2: Previous xenograft cancer studies using the Agilent GPL6480 human microarray. S. Table 3: Xhyb subset of the previous xenograft studies reported cancer deregulated genes. S. Table 4: The significant (p<1%) GO terms among the deregulated genes in a published glioblastoma xenograft model. S. Table 5: Xhyb subset of the 78 deregulated genes reported by a previous glioblastoma xenograft study and their corresponding probes in the Agilent Array (GPL6480). S. Table 6: Xhyb subset of the 28 deregulated genes reported by a previous colon cancer xenograft study and their corresponding probes in the Agilent array (GPL6480). S. Table 7: The human probes in S. Table 3 that target their mouse homologs. S. Table 8: The identified stromal associated GO terms that also reported by two gold standards using laser microdissection technology. S. Table 9: The possible mouse targets for all human probes in the Agilent 44k whole human genome microarray provided by Agilent Company (http://www.lussierlab.org/publication/Stroma/SupplTable9.xls). Supplementary Methods: Data: Multiple data resources are used in this study as listed in the following Table A. Table A. Data version Content Data source (downloaded data)

The GO annotation for both human and mouse NCBI Sep. 22, 2009 genes (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz)

The mouse homolog of human genes NCBI (ftp://ftp.ncbi.nih.gov/pub/HomoloGene) Build63

Probe annotation for Agilent 44k whole human GPL6480 from GEO (http://0- on Sep., 2009 genome array www.ncbi.nlm.nih.gov.millennium.unicatt.it/project s/geo/query/acc.cgi?acc=GPL6480)

Possible mouse target of probes in human array Agilent company May, 2008 78 genes differentially expressed between the PNAS (2006) 103(44) 16466-16471 first and later GBM xenograft generations. 28 genes deregulated in K18Y_TATI mutant Oncogene (2008) 27: 4024-4033 tumor xengrafts versus the control 5M21 xenografts. The mapping between gene symbol, GenBank "org.Hs.eg.db" from Bioconductor Version 2.2.0 ID and GO annotation for human gene

Array Preprocessing. One slide of the Agilent whole genome 4x44k microarray with two dual-arrays was used. The total pooled RNA was amplified and labeled following the Agilent Low RNA input Fluorescent Linear Amplification Kit protocol. Human RNA and mouse RNA was labeled with Cy5/red in two arrays, respectively; and the pooled human and mouse RNAs were labeled with Cy3/green. A mount of 1.5 ug labeled RNA per channel (1.5 ug human RNA or 1.5 ug mouse RNA in Cy5, while 0.75 ug human RNA pooled with 0.75 ug mouse RNA in Cy3) was hybridized onto GPL6480 microarray for 17 hours, following by the Agilent’s two color quick amp labeling protocol. The two arrays were scanned with 5 um resolution and 100% power gain at 600PMT for both Cy3 & Cy5, using the GenePix 4000B axon scanner. Subsequent data were extracted by the Agilent Feature Extraction Software. Assumption Verification. We performed gene ontology enrichment studies that conducted over cross-hybridizing probes of the human array to produce a set of GO terms pertaining to the biological processes of the stroma. There are two assumptions need to be verified: Assumption A) Majority of the Xhyb Probes Designed for Human Genes Target their Homologous Mouse Genes. To verify this assumption, we estimated the proportion of homologs between human and mouse in two cases: a) all the 19,725 genes in the whole Agilent human microarray, b) the identified 4,948 human genes with cross-hybridizing probes. To this end, we got two data resources: 1) Parts of possible mouse targets for all human probes in the Agilent 44k whole human genome microarray were provided by Agilent Company (Suppl. Table 9 of the manuscript), involving 5,791 unique mouse genes with GenBank accession number. 2) The human and mouse homologous database downloaded from NCBI contains 16,715 unique human genes with mouse homologous genes. Among them, there are 94% (15,689) of human homologous genes having probes in the Agilent 44k whole human genome microarray. First, the majority of the possible cross-hybridizing happens between homologous genes, because we observed that 74% (4,266) of the possible mouse targets are the homologous ones of their designed human targets. Second, the majority of our identified Xhyb genes target their homolog. This is because that 40% of the possible homologous targets (n=1,701) are identified as Xhyb, while only 26% of the possible homologous targets (n=3,247) are among the remainders in our experimental result. As shown in Figure A, this result suggests a significant enrichment between the possible homologous cross-hybridization and the identified genes with cross-hybridizing probes (Fisher’s exact test p-value<2x10-16).

4,266 possible mouse 4,948 identified homologs of the human genes with human genes of the cross-hybridizing array probes probes

2,563 1,701 3,247 Homolo Xhyb gous

P<2x10-16

19,725 Hs genes designed on the Agilent microarray Figure A. Majority of the Xhyb probes designed for human genes target the homologous mouse genes. Assumption B) Majority of the Gene Ontology Annotations for human and mouse homologous genes are identical. As presented in Figure B and Table B, on the view of GO annotation, for 17,355 pairs of homogous genes between human and mouse downloaded from NCBI (version 63), 83% pairs (14,339) have identical GO annotations. On the view of gene, there are 17,920 mouse genes with GO annotations, among them, 83% (14,803) are human homologs. And 80% (14,298) of mouse genes have a GO Annotation that is identical to the GO Annotation of its human homologos. Meanwhile, there are 17,688 human genes with GO annotations, and 86% (15,223) of these human genes have mouse homologos. The GO annotations of 80% (14,167) of all the human genes are identical with the GO annotations of their mouse homologous.

Homologous genes: 17,355 pairs of Hs.-Mm. homolgos 83% pairs (14,339) of have identical GO annotations

83% (14,803) with Hs 86% (15,223) with Mm Gene: homologos homologos

80% (14,298) with 80% (14,167) with identical GO with identical GO with their Hs homologos their Mm homologos

17,920 mouse genes with GO annotations 17,688 human genes with GO annotations Figure B. The proportion of genes with identical GO annotations to those of their homologs.

Table B. The counting of GO-gene occurrences and distinct genes considering homologs. A: NCBI GO (A) intersect (NCBI Identical Not Identical annotation homologs) GO-gene Distinct GO-gene Distinct GO-gene Distinct GO-gene Distinct

occurrence genes recurrence genes occurrence genes occurrence genes Hs 141,809 17,688 128,930 15,223 82,448 14,167 46,482 2,728 Mm 142,139 17,920 124,555 14,803 82,448 14,298 42,107 2,931

Blast for cross-hybrid chip design. The standalone Basic Local Alignment Search Tool (BLAST) software version 2.2.14(J Mol Biol 215: 403-10 (1990)) was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/, which was installed on a Solaris machine. FASTA format of the human and mouse RefSeq sequences was separately formatted by formatdb program for blast database. Blastn program was executed to search sequence similarity of a probe as a query against target RNA sequence database with default blast options; mouse cross-hybrid probes against human genes or human cross-hybrid probes against mouse genes. BL2SSEQ was run to compare two sequences between a probe or between a target. Array Analyses and cross-species hybridizing (Xhyb) Probe Identification (Figure C). The background subtracted signal derived by Agilent Feature Extraction was log2-transformed. After excluding the technical control probes, 34k human probes and 33k mouse probes that expressed in the green channel were selected for further study, using a criteria that required the expression intensity to be larger than zero on the log scale value. The identification of Xhyb probes for human array and mouse array was conducted utilizing the following six steps, where we presented the way to identify mouse probes that are most likely to hybridize with human RNA as example: Step 1: To predict cross-species hybridized probes, Blast program was performed to explore a sequence similarity between human probe and mouse mRNA and vice-versa with default Blast options. Step 2: Biological gold standards (BGS) are given in Table C, in order to capture the mouse probes whose targets highly express when exposed to human RNA. Step 3: Theoretical predicting models were given in Figure D and Table D, considering different sequence alignment conditions. Step 4: In order to optimize the parameters in the above theoretical models, precision, recall and their F-scores were calculated for four biological GSs against four theoretical models (Figure D left), where precisionrecall F  2 . After selecting the best BGS that balance the precision and recall, the possible combinations precision recall of Blast predicting conditions were further optimized according to F-measurement (Figure D right). Step 5: The parameter of the best biological GS was further decided according to the combination of the optimal blast conditions. Step 6: Finally, the Xhyb probes were identified using the optimal BGS.

Biological Experiment Blast Prediction Hs.RNA Mm.RNA (Hs.P vs Mm.RNA;

Hs+Mm.RNA Hs+MM.RNA Mm.P vs Hs.RNA) Hs+Mm.P

Four Gold Standards (GSs) Four Theoretical Models

Optimizing Blast Parameters & Bio. Gold Standard

Optimal Biological GS

75% remaining Hs genes (14.8k, involving 20k probes)

25% X-hyb Hs genes (4.9k, involving 5.9k probes) Figure C. Array analyses and cross-species hybridizing (Xhyb) probes identification.

As a result, the optimal blast parameters are: {18

Figure D. Optimizing Blast parameters and biological gold standards.

The corresponding F-score for each biological GS against each model is given in the left plot, where different lines represent different BGSs, and the different number represent the different theoretical models. Per the plot, 1) biological GS 2.18 together with M-Xhyb3 achieves the highest F-score; 2) M-Xhyb 3 always has a higher F-score compared with other models. In the right plot, multiple Blast conditions were separately learned to get the optimal theoretical model. Table C. Definition of four biological GSs to predict Xhyb probes. BGS Parameters BGS1.x top x ratio of relative expression (red/green) when exposed to wrong RNA vs to correct RNA: BGS2.x top x% absolute expression when exposed to human RNA with mouse probe BGS3.x top x% relative expression when exposed to human RNA with mouse probe the same as GS2 but absolute log2 based expression on both green channels of Hs+Mm RNA BGS4.x expression of mouse probes > 4

Table D. The details of the four models for theoretical prediction. Models Conditions and parameters M-Xhyb1 score > 70 & Alig > 58 & mis < 6 M-Xhyb 2 Alig > 50 & mismatches < 6 {18

S. Figure 1. A simulated visual model of differential genes expression in a xenograft experiment.

Fold Change of gene expression in Xenograft models g2 g5 g13 g1 g4 g6 g7 g9 g11 +2

0

-2 g3 g8 g12 Xhyb Probes Remaining Probes

FCi = [ci,aMm-ci,bMm] (Mm homologs of the designed target) + [ci,aHs-ci,bHs] (Hs target designed for the probe) +ε”(stochastic errors) The numbers in the plot are the simulated gene symbols of differentially expressed probes. a, b: two sample groups; c: effect of interested; i: probe id; FC: fold change (Equation. 1).

S. Figure 2. Comparison of the gold standards[2, 3] of stromal biological process (BP) with BP terms enriched among the genes with cross-hybridizing probes that target mouse homologs (Homo-Xhyb), and among the remaining genes for glioblastoma deregulation[20].

(A) Evaluated by GS1

Breast Cancer 2006 59 4 8 61 15 Stromal Homo-Xhyb Stromal Remainder BP Enriched BP Enriched P=7x10-6 No Overlapping

5304 BPs 5304 BPs (B) Evaluated by GS2

Nature medicine 2008 104 3 9 106 1 14 Stromal Homo-Xhyb Stromal Remainder BP Enriched BP Enriched p=0.001 p=0.26

5304 BPs 5304 BPs Panel A: The 61 stromal BPs are identified in stromal tissues adjusted to breast cancers[3]; Panel B: the 107 stromal BPs are identified in cancerous stromal tissues laser captured from breast cancer patients with poor prognosis[2]. In the left plots, the number of GS associated with stromal are in brown, which significantly overlaps with the GO terms enriched from Homo-Xhyb deregulated genes (in cyan). The Fisher’s exact tested p-value=7x10-6, 0.001, respectively. Whereas, in the right plot, no significant overlapping between the GSs and the GO terms enriched among the remaining deregulated genes (in orange).

S. Figure 3. Comparison of the gold standards of stromal biological process (BP) with BP terms enriched among the genes with Xhyb probes and among the remaining genes for glioblastoma deregulation.

(A)

Breast Cancer 2006 59 2 9 61 15 Stromal Xhyb Stromal Remainder BP Enriched BP Enriched p=0.007 No Overlapping

5304 BPs 5304 BPs

(B)

Nature medicine 2008 104 3 8 106 1 14 Stromal Xhyb Stromal Remainder BP Enriched BP Enriched p=0.001 p=0.26

5304 BPs 5304 BPs Panel A: The 61 stromal BPs are identified in stromal tissues adjusted to breast cancers[3]; Panel B: the 107 stromal BPs are identified in cancerous stromal tissues laser captured from breast cancer patients with poor prognosis[2]. In the left plots, the number of GS associated with stromal are in brown, which significantly overlaps with the GO terms enriched from Xhyb deregulated genes (in cyan). The Fisher’s exact tested p-value=0.007, 0.001, respectively. Whereas, in the right plot, no significant overlapping between the GS and the GO terms enriched among the remainding deregulated genes (in orange). The overlapped BPs are given in S. Table 7.

S. Figure 4. Significantly enriched (p<1%) BP terms among the genes with mouse homologous cross- hybridizing probes, and their overlapping with two gold standards.

Legend: The identified BP terms with its nearest parent-terms of the gold standard: Identified ID Identified Term Nearest GS ID Nearest GS Term Shortest Distance (Number of between Nodes +1) GO:0030949 positive regulation GO:0007169 transmembrane 3 of vascular receptor endothelial growth tyrosine kinase factor receptor signaling pathway signaling pathway GO:0008219 cell death GO:0030154 cell differentiation 2 GO:0001502 cartilage GO:0007155 cell adhesion 2 condensation GO:0006950 response to stress GO:0050896 response to stimulus 1 GO:0048523 negative regulation of cellular process

S. Figure 5. Comparison of the overrepresented biological processes (BPs) derived from three GO enrichment tests for the reported gene list. Different data points are the proxy recall and precision under a serial of thresholds for GO enrichment test. Panel a shows that in predicting stromal processes, the reported Xhyb subset of deregulated genes performs better than either the remaining genes or the full reported gene list, since only the precision-recall predictions from reported Xhyb genes (cyan circles) are significant (proxy p<5%). Panel b shows that the Xhyb subset of deregulated genes are the worst at predicting cancer processes compared to the other two kinds of gene lists with lower precision and recall.

S. Table 1: The relationship between deregulated genes and their contributed Gene Ontology in a xenograft model, using human microarray. Expression of Significantly Differentially Expressed Probes in Human Gene Ontology Enrichment from the Array Differentially Expressed Genes Xhyb probes Remainders Biological process chiefly attributable to mouse 푐 ≫ 푐 푐 is stochastically distributed stromal cells 푖,푗푀푚 푖,푗퐻푠 푖,푗퐻푠 Biological process chiefly attributable to human 푐 is stochastically 푖,푗푀푚 푐 ≪ 푐 cancer cells distributed 푖,푗푀푚 푖,푗퐻푠

Biological processes enriched in the experiment Both 푐푖,푗푀푚 and 푐푖,푗퐻푠 are stochastically distributed

S. Table 2: Previous xenograft cancer studies using the Agilent GPL6480 whole human microarray. Cancer Sample Reported Genes Glioblastomas 10 GBMs xenografts 78 genes differentially expressed between the first and later GBM xenograft (GBM)[20] generations. Colon 3-4 xenografts per 28 genes deregulated in mutant tumor xenografts versus the control tumor xenografts. cancer[6] group

S. Table 3: Xhyb subset of previous xenograft studies reported cancer deregulated genes. Cancer Type Xhyb deregulated Genes Possible Xhyb deregulated genes Glioblastomas CLIC1, COL11A1, CRYAB, EMP3, GPR17, PDGFRA, INSM1, IGFBP3, CCND1, TMPO, C5orf13, (GBM)[20] HTRA1, ID1, ITPKB, MAG, PTTG1, PSME1, CDKN1A, TJP2,PPP3R1 , DDR1, B2M, RAB33A, SERPINE2, SGK1, VEGFA SCARA3, EMP1, GFAP Colon cancer[6] IGFBP2, IGF2, FN1, TAGLN, ZBTB16, HPN, TFF3 PRKCA, ID8-72

S. Table 4: The significant (p<1%) GO terms among the deregulated genes in a published glioblastoma xenograft model.

Classification Conditional hypergeometric p-value GOTERM N S C Full List Xhyb Remainder enzyme linked receptor protein signaling pathway 0.0084 0.0031 positive regulation of vascular endothelial growth factor receptor signaling pathway 1 0.0038 negative regulation of cellular process 0.0066 0.0043 negative regulation of transforming growth factor beta receptor signaling pathway 0.0004 0.0076 cartilage condensation 1 0.0076 BMP signaling pathway 1 1 0.0076 negative regulation of BMP signaling pathway 1 1 0.0076 positive regulation of chemotaxis 1 1 0.0076 regulation of positive chemotaxis 1 1 0.0076 induction of positive chemotaxis 1 1 0.0076 visual perception 1 0.0088 phosphoinositide-mediated signaling 0.0015 0.0003 nervous system development 1 0.0025 0.0011 anatomical structure development 0.0019 mesenchymal cell development 1 0.0063 0.0023 signal transduction 1 1 0.0000 0.0023 cell-cell signaling 1 1 0.0042 dTMP biosynthetic process 0.0052 0.0046 deoxyribonucleoside monophosphate biosynthetic process 1 1 0.0052 0.0046 pyrimidine deoxyribonucleoside monophosphate metabolic process 1 1 0.0052 0.0046 positive regulation of phosphoinositide 3-kinase cascade 0.0052 0.0046 positive regulation of cyclin-dependent protein kinase activity 0.0001 0.0091 posterior midgut development 1 0.0091 cyclin catabolic process 1 0.0091 pyrimidine deoxyribonucleotide biosynthetic process 0.0091 clathrin cage assembly 0.0091 transmembrane receptor protein tyrosine kinase signaling pathway 1 1 0.0003 cell growth 1 1 0.0005 multicellular organismal development 0.0006 regulation of phosphate metabolic process 0.0009 transforming growth factor beta receptor signaling pathway 1 1 0.0024 regulation of protein amino acid phosphorylation 0.0026 cell differentiation 1 1 0.0039 negative regulation of phosphorylation 0.0039 organ development 1 0.0042 response to stress 1 1 0.0043 positive regulation of non-apoptotic programmed cell death 1 0.0052 regulation of protein kinase activity 0.0060 phosphorus metabolic process 0.0064 regulation of transferase activity 0.0071 regulation of cell growth 1 1 0.0072 extracellular structure organization and biogenesis 1 1 0.0081 regulation of biological quality 1 1 0.0082 Legend: N: neither cancer or stromal cell process; S: stromal cell or functional process; C: cancer cell process; blank in classification: unknown; blank in conditional hypergeometric p-value: not significant (p>1%).

S. Table 5: Xhyb subset of the differential expressed genes in a previous xenograft study and their corresponding probes in the Agilent array (GPL6480). Genes in bold are those identified as Xhyb. According to the literature (Sakariassen et al. 2006), the tumors became more vascular and circumscribed with emerging necrotic regions in the sub-sequence generation. Moreover, MRI scans showed less invasive, strongly contrast-enhancing tumors in the higher generation. Corresponding Agilent Probes Biological Symbol FC ID Annotation GENE SYMBOL V1Anno B2M -4 A_23_P37441 NM_004048 567 B2M Xhyb A_24_P740662 AK022379 567 B2M C5orf13 2.4 A_24_P149124 NM_004772 9315 C5orf13 Xhyb A_32_P98423 NM_004772 9315 C5orf13 CCND1 3.2 A_24_P193011 NM_053056 595 CCND1 Xhyb A_23_P202837 NM_053056 595 CCND1 A_24_P124550 NM_053056 595 CCND1 CDKN1A -2.8 A_23_P59210 NM_000389 1026 CDKN1A Xhyb A_24_P89457 NM_078467 1026 CDKN1A CLIC1 -2.9 A_23_P30884 NM_001288 1192 CLIC1 Xhyb COL11A1 2.5 A_23_P11806 NM_080629 1301 COL11A1 Xhyb CRYAB -6.2 A_24_P206776 NM_001885 1410 CRYAB Xhyb DDR1 -3.9 A_24_P123601 NM_013993 780 DDR1 Xhyb A_23_P93311 NM_013993 780 DDR1 A_24_P367289 NM_013993 780 DDR1 EMP1 -7.6 A_23_P76488 NM_001423 2012 EMP1 Xhyb A_24_P921446 BC017854 2012 EMP1 EMP3 -2.4 A_23_P119362 NM_001425 2014 EMP3 Xhyb GFAP -7.8 A_24_P31165 NM_002055 2670 GFAP Xhyb A_23_P66593 NM_002055 2670 GFAP A_24_P59786 NM_001131019 2670 GFAP GPR17 -2.8 A_23_P16963 NM_005291 2840 GPR17 Xhyb HTRA1 -2.5 A_23_P97990 NM_002775 5654 HTRA1 Xhyb ID1 -10.5 A_23_P252306 NM_002165 3397 ID1 Xhyb IGFBP3 3.4 A_23_P215634 NM_001013398 3486 IGFBP3 Xhyb A_24_P320699 NM_001013398 3486 IGFBP3 INSM1 3.4 A_24_P31676 NM_002196 3642 INSM1 Xhyb A_23_P109184 NM_002196 3642 INSM1 ITPKB -3 A_23_P372255 NM_002221 3707 ITPKB Xhyb MAG -4.2 A_24_P279704 NM_080600 4099 MAG Xhyb PDGFRA 13.1 A_23_P300033 NM_006206 5156 PDGFRA Xhyb A_23_P332536 BC015186 5156 PDGFRA A_32_P100379 AA599881 5156 PDGFRA PPP3R1 -3.3 A_23_P108592 NM_000945 5534 PPP3R1 A_24_P388252 NM_000945 5534 PPP3R1 Xhyb PSME1 -2.3 A_23_P151610 NM_006263 5720 PSME1 Xhyb A_23_P151614 NM_006263 5720 PSME1 PTTG1 2.1 A_23_P7636 NM_004219 9232 PTTG1 Xhyb RAB33A -3.1 A_23_P147025 NM_004794 9363 RAB33A Xhyb SCARA3 -4.9 A_23_P215900 NM_016240 51435 SCARA3 Xhyb A_24_P232158 NM_182826 51435 SCARA3 SERPINE2 2.6 A_23_P50919 NM_006216 5270 SERPINE2 Xhyb SGK1 -2.5 A_23_P19673 NM_005627 6446 SGK1 Xhyb TJP2 -3.1 A_23_P9293 NM_004817 9414 TJP2 Xhyb A_24_P201153 NM_201629 9414 TJP2 TMPO 2.5 A_24_P210244 NM_001032283 7112 TMPO Xhyb A_23_P325040 NM_003276 7112 TMPO VEGFA 8.9 A_23_P70398 NM_001025370 7422 VEGFA Xhyb A_23_P81805 NM_001025366 7422 VEGFA Xhyb A_24_P12401 NM_001025366 7422 VEGFA Xhyb A_24_P179400 NM_001025370 7422 VEGFA S. Table 6: Xhyb subset of the 28 deregulated genes reported by a previous colon tumor xenograft study and their corresponding Agilent array (GPL6480) probes. Genes in bold are those identified as Xhyb. According to the author (Gouye et al 2008), HT-29-derived 5M21 which displayed a constitutive invasive behavior in type I collagen, while TATI was shown in this study to be a major autocrine and transforming factor produced by HT-29-derived 5M21 cells to control colon cancer cell invasion and metastasis. Corresponding Agilent Probes Biological GenBank.no. Symbol Ratio ID GB_ACC GENE SYMBOL Annotation NM_177978 CHRD -5.2 A_23_P502047 NM_003741 8646 CHRD A_24_P117725 NM_003741 8646 CHRD Xhyb NM_00212482 FN1 -2.8 A_24_P119745 NM_212482 2335 FN1 Xhyb A_24_P85539 NM_212482 2335 FN1 Xhyb NM_00182983 HPN -2.1 A_23_P101806 NM_182983 3249 HPN Xhyb A_23_P406782 NM_182983 3249 HPN NM_002166 ID2 2.4 A_23_P143143 NM_002166 3398 ID2 Xhyb A_32_P69368 NM_002166 3398 ID2 Xhyb NM_000612 IGF2 -4.6 A_23_P150609 NM_000612 3481 IGF2 Xhyb A_23_P421379 NM_000612 3481 IGF2 NM_000597 IGFBP2 -9.6 A_23_P119943 NM_000597 3485 IGFBP2 Xhyb NM_002737 PRKCA 2.2 A_23_P55099 NM_002737 5578 PRKCA Xhyb A_24_P916496 NM_002737 5578 PRKCA Xhyb NM_001001522 TAGLN -2.2 A_23_P87011 NM_001001522 6876 TAGLN Xhyb A_23_P87013 NM_001001522 6876 TAGLN Xhyb NM_003226 TFF3 2 A_23_P257296 NM_003226 7033 TFF3 Xhyb A_23_P393099 NM_003226 7033 TFF3 A_24_P289208 NM_003226 7033 TFF3 NM_006006 ZBTB16 2 A_23_P104804 NM_006006 7704 ZBTB16 Xhyb

S. Table 7: The human probes in S. Table 4 that target their mouse homologs. Homo- ProbeID HsGENE HsSYMBOL Anno Mm.Homo MmGENE Target* Score* Xhyb** A_23_P30884 1192 CLIC1 Xhyb Clic1 114584 AK150382 88.33 Y A_23_P11806 1301 COL11A1 Xhyb Col11a1 12814 AK157734 93.62 Y NW_001030907. A_24_P206776 1410 CRYAB Xhyb Cryab 12955 1 91.5 Y NW_001030849. A_23_P119362 2014 EMP3 Xhyb Emp3 13732 1 78.8 Y A_23_P16963 2840 GPR17 Xhyb Gpr176 381413 N A_23_P97990 5654 HTRA1 Xhyb Htra1 56213 N A_23_P252306 3397 ID1 Xhyb Id1 15901 NT_039207.7 62.8 Y A_23_P372255 3707 ITPKB Xhyb Itpkb 320404 N A_24_P279704 4099 MAG Xhyb Mag 17136 NT_039413.7 62.6 Y A_23_P7636 9232 PTTG1 Xhyb Pttg1 30939 XR_002209 92.73 Y A_23_P147025 9363 RAB33A Xhyb Rab33a 19337 NM_011228.1 86 Y A_23_P50919 5270 SERPINE2 Xhyb Serpine2 20720 N A_23_P19673 6446 SGK1 Xhyb Sgk1 20393 BC070401 96.67 Y A_23_P70398 7422 VEGFA Xhyb Vegfa 22339 NM_001025250 96.67 Y A_23_P81805 7422 VEGFA Xhyb Vegfa 22339 NM_001025250 90.48 Y A_24_P12401 7422 VEGFA Xhyb Vegfa 22339 NM_001025250 98.33 Y A_24_P179400 7422 VEGFA Vegfa 22339 NM_001025250 104 Y

Legend: Anno: Annotation; Xhyb: cross-mouse hybridize; Homo: homologous; HomoXhyb: the human probes target its mouse homogous predicted by BLASTN. * The reported target and score using BLASTN Algorithm with Default Parameters (http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=10090) ** Y refers to BLASTN reports significant alignment, either wise N.

S. Table 8: The identified stromal associated GO terms (hypergeometric test p-value <1%) that also reported by two gold standards using laser microdissection technology. GOID Biological Process Test Reported Gold Standard Method GO:0007167 enzyme linked receptor protein signaling pathway test b GS1- Breast Cancer Res. 2006 GO:0007601 visual perception test b, test d GS1- Breast Cancer Res. 2006 GO:0015698 inorganic anion transport test d GS1- Breast Cancer Res. 2006 GO:0050795 regulation of behavior test d GS1- Breast Cancer Res. 2006 GO:0007169 transmembrane receptor protein tyrosine kinase signaling test d GS1- Breast Cancer Res. 2006 pathway GO:0050926 regulation of positive chemotaxis test d, test b GS2 - Nat. Med 2008 GO:0050930 induction of positive chemotaxis test d, test b GS2 - Nat. Med 2008 GO:0050921 positive regulation of chemotaxis test d, test b GS2 - Nat. Med 2008 Test d and test b are conditional hypergeometric tests, using the same background gene list, all the identified genes with cross-hybridize probes on the array. But test b uses the deregulated genes with identified hybridizing probes as inputted gene list, while test d filters the list into homologous genes with Xhyb probes (see Methods).