Supporting Information

Rhodes et al. 10.1073/pnas.0900351106 SI Materials and Methods were scored manually (100ϫ oil immersion) by a pathologist COPA. COPA analysis was performed on breast cancer (R.M.) and enumerated in morphologically intact and nonover- expression datasets in Oncomine (www.oncomine.org) as de- lapping nuclei. Amplification was defined as a locus number to scribed previously (1). COPA has 3 simple steps. First, gene control copy number of 1.5 or greater (13). expression values are median-centered, setting each gene’s me- BACs were obtained from the BACPAC Resource Center dian expression value to zero. Second, the median absolute (Oakland, CA), and probe locations were verified by hybridiza- deviation (MAD) is calculated and scaled to 1 by dividing each tion to metaphase spreads of normal peripheral lymphocytes. gene expression value by its MAD. Of note, median and MAD For detection of locus and control signal numbers, RP11-505J9 were used for transformation as opposed to mean and standard (mapping to AGTR1 locus on 3q24) and RP11-449F7 (3q deviation so that outlier expression values do not unduly influ- control) were used, respectively. BAC DNA was isolated by using ence the distribution estimates, and are thus preserved after a QIAFilter Maxi Prep kit (Qiagen), and probes were synthe- normalization. Third, the 75th, 90th, and 95th percentiles of the sized by using digoxigenin- or biotin-nick translation mixes transformed expression values are tabulated for each gene, and (Roche Applied Science). then are rank-ordered by their percentile scores, providing a prioritized list of outlier profiles. Genes scoring in the top 1% AGTR1 Transfection. N-terminal 3XHA-tagged human AGTR1 of outliers at any of the 3 percentile cutoffs were called outliers (Variant 1) in pcDNA 3.1ϩ (Invitrogen) was obtained from the and submitted to the meta-analysis. University of Missouri-Rolla cDNA Resource Center. 3XHA- AGTR1 was amplified by PCR, and TOPO-TA was cloned into Meta-analysis. For each gene called an outlier in at least one the Gateway entry vector pCR8/GW/TOPO. To generate ad- dataset, we counted the number of datasets in which the gene enoviral constructs, pCR8/3XHA-AGTR1 was recombined with was called an outlier and the total number of datasets in which pAD/CMV/V5 (Invitrogen) by using LR Clonase II (Invitrogen). the gene was measured and then calculated the percentage of Adenoviruses were generated by the University of Michigan datasets. We also used the binomial distribution function to Vector Core (Ann Arbor, MI). The benign human mammary compute a P value for each gene. The P value represents the epithelial cell lines HME and H16N2 were plated in 6-well probability that a given gene was found to be an outlier in the dishes. At 24 h later, the cells were infected with 3XHA-AGTR1 observed number of datasets by chance, given the number of adenovirus or LacZ adenovirus. Overexpression was confirmed datasets in which the gene was measured. We assumed that the by QPCR. At 48 h after infection, cells were treated with vehicle background probability for a gene being called an outlier was (ethanol), 1 ␮M angiotensin alone, 2 and 5 ␮M losartan alone, 0.03 because outliers were selected as the top 1% of genes at 3 and a combination of 2 and 5 ␮M losartan with 1 ␮M angiotensin percentile cutoffs. We selected P Ͻ 1E-5 as a conservative for 24 h before the invasion assay. threshold to define metaoutliers. Assuming that Ϸ25,000 genes were considered in our analysis, by chance we would expect less Cell Invasion Assay. Breast cell lines BT-549, Hs578T, HS16N2, than one gene to have P Ͻ 1E-5 (25,000 ϫ 1E-5 ϭ 0.25). HCC1528, HCC1500, and prostate carcinoma line DU145 were grown in 100-mm tissue culture plates overnight, then trans- Oncomine Data. Supplementary figures include gene expression ferred to serum-free medium. A total of 1–5 ␮M losartan (kind data and graphs from Oncomine (www.oncomine.org). In total, gift from Merck) was added 30 min before 1 ␮M angiotensin II 9 Oncomine datasets are referenced in Figs. S1–S10 (2–10). (American Peptide) treatment. Cell invasion was evaluated by Also, AGTR1 data were obtained from the supporting infor- using 24-well Matrigel invasion chambers (Becton Dickinson). mation text of another breast cancer cell line profiling study (11). Cells were trypsinized and seeded at equal numbers onto the basement membrane matrix present in the insert of a 24-well Tissue Microarrays. Breast tissue samples were obtained from the culture plate. FBS was added to the lower chamber, acting as a Surgical Pathology files at the University of Michigan with chemoattractant. After 48 h of additional incubation, the non- Institutional Review Board approval. A total of 311 cases of invading cells and extracellular matrix were removed gently with invasive breast cancer were used to construct tissue microarrays a cotton swab. The cells that had invaded were present on the by using a manual arrayer as described previously (12). Each tumor lower side of the chamber and were stained, air-dried, and was sampled in triplicate to account for tumor heterogeneity. photographed. The invaded cells were counted under the mi- croscope, assessing 6 random fields per experiment. The num- FISH. Four-micrometer-thick tissue microarray sections were bers of cells were averaged, and standard deviations were used for interphase FISH. Deparaffinized tissue was treated with calculated. To assess relative change in invasion, the ratio of cell 0.2 mol/L HCl for 10 min, 2ϫ SSC for 10 min at 80 °C, and then invasion with AT alone treatment was divided by the cell digested with Proteinase K (Invitrogen) for 10 min. The tissues invasion at baseline. To assess percent reduction in invasion, the and BAC probes were codenatured for 5 min at 94 °C and cell invasion with AT plus losartan treatment (2 ␮M) was hybridized overnight at 37 °C. Posthybridization washing was subtracted from the AT alone treatment and then divided by AT with 2ϫ SSC with 0.1% Tween 20 for 5 min, and fluorescent alone treatment. detection was done by using anti-digoxigenin conjugated to fluorescein (Roche Applied Science) and streptavidin conju- Quantitative PCR (QPCR). QPCR was performed by using SYBR gated to Alexa Fluor 594 (Invitrogen). Slides were counter- Green dye on an Applied Biosystems 7300 Real Time PCR stained and mounted in ProLong Gold Antifade Reagent with system (Applied Biosystems) essentially as described previously 4Ј,6-diamidino-2-phenylindole (Invitrogen). Slides were exam- (1). Briefly, total RNA was isolated from three 10-␮m sections ined by using an Axioplan Imaging Z1 microscope (Carl Zeiss) from each formalin-fixed, paraffin embedded (FFPE) tissue and imaged with a CCD camera using the ISIS software system specimen by using a MasterPure RNA Purification Kit (Epicen- in Metafer image analysis system (MetaSystems). FISH signals tre) according to the manufacturer’s instructions and was treated

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 1of17 with DNase I. Total RNA was isolated from cell lines by using were then directly lysed in sample buffer boiled, separated on TRIzol (Invitrogen). RNA was quantified by using an ND-1000 SDS/PAGE, and transferred onto poly(vinylidene difluoride) spectrophotometer (Nanodrop Technologies), and 1–5 ␮gof membrane (GE Healthcare). The membrane was incubated for total RNA was reverse-transcribed into cDNA by using Super- 1 h in blocking buffer [Tris-buffered saline, 0.1% Tween (TBS- Script III (Invitrogen) in the presence of random primers. All T), 3% BSA] and incubated overnight at 4 °C with the following: QPCR reactions were performed in duplicate with SYBR Green rabbit anti-phospho-ERK antibody, rabbit anti-phospho-AKT, Master Mix (Applied Biosystems) and 25 ng of both the forward mouse total ERK, and rabbit total AKT antibody (1:1,000; Cell and reverse primer using the manufacturer’s recommended Signaling Technology). After wash with TBS-T, the blot was thermocycling conditions. For each experiment, threshold levels incubated with horseradish peroxidase-conjugated secondary were set during the exponential phase of the QPCR reaction by antibody, and the signals were visualized by an enhanced chemi- using Sequence Detection Software version 1.2.2 (Applied Bio- luminescence system as described by the manufacturer (GE systems). For experiments using RNA isolated from FFPE Healthcare). The blot was reprobed with mouse monoclonal tissues, the amount of AGTR1 relative to the housekeeping gene antibody to actin (1:2,000; Sigma) for confirmation of equal GAPDH for each sample was determined by using the compar- loading. ative threshold cycle (Ct) method (Applied Biosystems User Bulletin no. 2; http://docs.appliedbiosystems.com/pebiodocs/ Mammary Fat Pad Xenograft Model. Four-week-old female Balb/C 04303859.pdf). For experiments using RNA from cell lines, the nu/nu mice were purchased from Charles River (Charles River amount of AGTR1 relative to the average of the housekeeping Laboratory). MCF7 stable cells overexpressing AGTR1 or Gus genes GAPDH, B2M, and HMBS was determined for each (2.5 ϫ 106 cells) were resuspended in 100 ␮L of saline with 20% sample. For all experiments, the relative amount of AGTR1 for Matrigel (BD Biosciences). An anesthetic mixture of ketamine each sample was calibrated to the median amount from all (80 mg/kg) and xylazine (5 mg/kg) was injected into the intra- samples in the experiment. All oligonucleotide primers were peritoneal cavity, and 2.5 ϫ 106 cells were implanted into the synthesized by Integrated DNA Technologies. B2M (14) and mammary fat pad of anesthetized mice by using 30-gauge GAPDH and HMBS (15) primers were as described. Sequences needles. All experimental animals (n ϭ 10) were implanted with for AGTR1 are as follows: 60-day release 0.25 mg of 17␤-estradiol pellet (Innovative Re- AGTR1࿝f GCTTTCCTACCGCCCCTCAGA search of America) through an s.c. route using a sterile trocar. AGTR1࿝r TTTCGAACATGTCACTCAACCTCAA. Mice from group MCF7-AGTR1 or MCF7-Gus were treated Approximately equal efficiencies of the primers were con- every day with losartan (90 mg/kg of body weight) or vehicle firmed by using serial dilutions of pooled breast cancer RNA to control. All animals were monitored at weekly intervals for the use the comparative Ct method. All reactions were subjected to development of tumors, and tumor sizes were recorded. Tumor melt curve analysis. volumes were calculated by using the formula (␲/6) (L ϫ W2), where L ϭ length of tumor and W ϭ width. Results were Immunoblot Analysis. The benign human mammary epithelial cell presented as the mean of tumor volumes recorded from all 4 line HME was plated in 6-well dishes. Cells were infected with groups at weeks 2 and 8. All procedures involving mice were AGTR1 adenovirus or LacZ adenovirus. At 48 h after infection, approved by the University Committee on Use and Care of cells were starved, pretreated with vehicle (ethanol) or 2 ␮M Animals at the University of Michigan and conform to their losartan, and stimulated with 1 ␮M angiotensin for 5 min. Cells relevant regulatory standards.

1. Tomlins SA, et al. (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor 10. West M, et al. (2001) Predicting the clinical status of human breast cancer by using gene genes in prostate cancer. Science 310:644–648. expression profiles. Proc Natl Acad Sci USA 98:11462–11467. 2. Huang E, et al. (2003) Gene expression predictors of breast cancer outcomes. Lancet 11. Charafe-Jauffret E, et al. (2006) Gene expression profiling of breast cell lines identifies 361:1590–1596. potential new basal markers. Oncogene 25:2273–2284. 3. Perou CM, et al. (2000) Molecular portraits of human breast tumours. Nature 406:747–752. 12. Witkiewicz AK, et al. (2005) Alpha-methylacyl-CoA racemase expression is 4. Scherf U, et al. (2000) A gene expression database for the molecular pharmacology of associated with the degree of differentiation in breast cancer using quantitative image cancer. Nat Genet 24:236–244. analysis. Cancer Epidemiol Biomarkers Prev 14:1418–1423. 5. Sorlie T, et al. (2003) Repeated observation of breast tumor subtypes in independent 13. Prentice LM, et al. (2005) NRG1 gene rearrangements in clinical breast cancer: Identi- gene expression data sets. Proc Natl Acad Sci USA 100:8418–8423. fication of an adjacent novel amplicon associated with poor prognosis. Oncogene 6. Staunton JE, et al. (2001) Chemosensitivity prediction by transcriptional profiling. Proc 24:7281–7289. Natl Acad Sci USA 98:10787–10792. 14. Mitas M, et al. (2001) Quantitative real-time RT-PCR detection of breast cancer micro- 7. van de Vijver MJ, et al. (2002) A gene-expression signature as a predictor of survival in metastasis using a multigene marker panel. Int J Cancer 93:162–171. breast cancer. N Engl J Med 347:1999–2009. 15. Vandesompele J, et al. (2002) Accurate normalization of real-time quantitative RT-PCR 8. van ’t Veer LJ, et al. (2002) Gene expression profiling predicts clinical outcome of breast data by geometric averaging of multiple internal control genes. Genome Biol 3:RE- cancer. Nature 415:530–536. SEARCH0034. 9. Wang Y, et al. (2005) Gene-expression profiles to predict distant metastasis of lymph- node-negative primary breast cancer. Lancet 365:671–679.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 2of17 Fig. S1. A schematic for the meta-analysis of outlier gene expression profiles (MetaCOPA). (A) A cancer gene expression dataset, consisting of thousands of genes measured across tens or hundreds of samples, is normalized and sorted via the COPA method. (B) COPA normalizes the median expression per gene to zero and the median absolute deviation to one. A percentile cutoff is selected (e.g., 75%, 90% or 95%), and genes are sorted by their COPA value at the selected percentile. In the COPA map, the intensity of red indicates degree of overexpression, and black indicates masked COPA values less than 1. (C) The top 1% of genes is deemed to have outlier expression profiles. (D) A collection of outliers from independent datasets are compiled and submitted for meta-analysis. (E) Multiple datasets of a given cancer type are meta-analyzed to identify genes that are consistently called outliers across independent datasets. Significance is assessed via the binomial distribution.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 3of17 Fig. S2. COPA indicates that ERBB2 exhibits outlier expression in multiple breast cancer microarray datasets from Oncomine. (A) ERBB2 expression profile in the Perou࿝Breast (3) cDNA microarray dataset (n ϭ 55). (B) ERBB2 expression profile in the vandeVijver࿝Breast (7) oligonucleotide dataset, segregated by ER status (n ϭ 295). (C) Expression analysis of ERBB2 genomic neighborhood in breast cancer across 68 ERϪ breast cancer specimens (7). Genes are ordered by distance from the ERBB2 locus. Red indicates relative overexpression, and blue indicates relative underexpression.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 4of17 Fig. S3. AGTR1 expression and comparison with ERBB2 expression in 3 additional Oncomine datasets. (A and B) Huang࿝Breast (2). (C and D) Sorlie࿝Breast (5). (E and F) vantVeer࿝Breast (8). Bar graph legends correspond to scatterplots except in D. See Fig. 2 for additional details.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 5of17 Fig. S4. AGTR1 expression and comparison with ERBB2 expression in 2 additional Oncomine datasets. (A and B) West࿝Breast (10) and (C and D) Wang࿝Breast (9). Bar graph legends correspond to scatterplots as well. See Fig. 2 for additional details.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 6of17 Fig. S5. AGTR1 expression in ERϩ, ERBB2Ϫ breast cancer and association with clinical outcome. (A) AGTR1 expression in breast tumor. From the Oncomine dataset, vandeVijver࿝Breast (7). Samples are grouped by metastatic event at 5 years. The red bar indicates the threshold of 1 used to determine AGTR1 positivity. (B) AGTR1 expression in breast tumor. From the Oncomine dataset, Wang࿝Breast (9). Samples are grouped by recurrence status at 5 years. The red line indicates the threshold used to determine AGTR1 positivity. Also, 2 ϫ 2 contingency tables are provided. OR, odds ratio; p-val, Fisher’s Exact Test P value.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 7of17 Fig. S6. AGTR1 expression in the context of genomic neighbors. Genes were sorted by their genomic distance to AGTR1. Only ERϩ cases are shown. Oncomine datasets: (A) Wang࿝Breast (9) and (B) vandeVijver࿝Breast (7).

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 8of17 Fig. S7. AGTR1 expression by qRT-PCR. Expression of AGTR1 and LacZ in primary mammary epithelial cells infected with adenovirus expressing AGTR1 or LacZ. Expression levels were normalized to GAPDH expression and scaled by 1E5 to emphasize relative expression changes.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 9of17 Fig. S8. AGTR1 overexpression and analysis of angiotensin II (AT) and losartan effects on cell invasion. Colorimetry readout of invasion assays. (A) Adenovirus transfection experiments in which LacZ-, AGTR1- or EZH2-expressing adenovirus was infected into H16N2 immortalized mammary epithelial cells. Cells were either untreated or were treated with angiotensin (1 ␮M), losartan (5 ␮M), angiotensin and losartan, or suberoylanilide hydroxamic acid (SAHA). (B) Identical experiment performed with HME immortalized mammary epithelial cells.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 10 of 17 Fig. S9. Effect of angiotensin II (AT) on ERK phosphorylation. Treatment with AT enhances ERK phosphorylation in AGTR1-overexpressing benign immortalized HME breast epithelial cells. Losartan (10 ␮M) treatment inhibited AT-mediated phosphorylation of ERK. AKT did not undergo phosphorylation upon angiotensin treatment. Total ERK and AKT levels remains unaltered upon angiotensin treatment.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 11 of 17 Fig. S10. AGTR1 expression in breast cancer cell lines. (A) The NCI-60 cell line panel, available in Oncomine as Scherf_CellLine (4). Breast cancer cell lines are colored red. Cell lines evaluated for AT-mediated invasion are labeled. (B) A second AGTR1 expression profile from the NCI-60 cell line panel, available in Oncomine as Staunton_CellLine (6). (C) AGTR1 expression profile in a panel of breast cancer cell lines (11). In C–E, AGTR1 lines evaluated for AT-mediated invasion are labeled. AGTR1-positive lines are colored red, and negative lines are colored blue. (D) AGTR1 expression profile in a second panel of breast cancer cell lines, available in Oncomine as Huang_CellLine (2). (E)AGTR1 expression levels were measured by qRT-PCR in a subset of cell lines and standardized against the average of GAPDH, HMBS, and B2M levels. To highlight relative changes in expression, the ratio of AGTR1 to (GAPDH ϩ HMBS ϩ B2M) was multiplied by 1E4.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 12 of 17 Table S1. Oncomine datasets included in the Meta-COPA analysis Dataset Oncomine study n Outliers Journal PMID Array type

1 Miller࿝Breast 251 111 Proc Natl Acad Sci 16141321 Affymetrix 2 Farmer࿝Breast 49 107 Oncogene 15897907 Affymetrix 3 Hess࿝Breast 133 99 J Clin Oncol 16896004 Affymetrix 4 Wang࿝Breast 286 96 Lancet 15721472 Affymetrix 5 Pawitan࿝Breast 159 88 Breast Cancer Res 16280042 Affymetrix 6 vandeVijver࿝Breast 295 87 N Engl J Med 12490681 Custom 7 Bittner࿝Breast 132 80 https://expo.intgen.org/ 15696145 Affymetrix 8 Perreard࿝Breast࿝24075 Breast Cancer Res 16626501 Custom 9 Richardson࿝Breast࿝24769 Cancer Cell 16473279 Affymetrix 10 vantVeer࿝Breast 117 58 Nature 11823860 Custom 11 Sotiriou࿝Breast࿝3 189 54 J Natl Cancer Inst 16478745 Affymetrix 12 Ma࿝Breast࿝36052 Cancer Cell 15193263 Custom 13 Zhao࿝Breast 64 48 Mol Biol Cell 15034139 Custom 14 Huang࿝Breast 89 47 Lancet 12747878 Affymetrix 15 Bild࿝CellLine࿝3 158 42 Nature 16273092 Affymetrix 16 Radvanyi࿝Breast 63 40 Proc Natl Acad Sci 16043716 Custom 17 Ma࿝Breast࿝24037 Cancer Cell 15193263 Custom 18 Pollack࿝Breast࿝24133 Proc Natl Acad Sci 12297621 Custom 19 Yu࿝Breast࿝39633 Clin Cancer Res 16740749 Affymetrix 20 Kreike࿝Breast 59 33 Clin Cancer Res 17020974 Custom 21 Sorlie࿝Breast࿝2 167 31 Proc Natl Acad Sci 12829800 Custom 22 Perou࿝Breast 65 30 Proc Natl Acad Sci 10430922 Custom 23 Sorlie࿝Breast 85 25 Proc Natl Acad Sci 11553815 Custom 24 Finak࿝Breast 66 22 Breast Cancer Res 17054791 Custom 25 Sotiriou࿝Breast࿝29822 Proc Natl Acad Sci 12917485 Custom 26 Thuerigen࿝Breast 91 11 J Clin Oncol 16622258 Custom 27 Gruvberger࿝Breast 58 9 Cancer Res 11507038 Custom 28 West࿝Breast 49 6 Proc Natl Acad Sci 11562467 Affymetrix 29 Perou࿝Breast࿝2275 Nature 10963602 Custom 30 Ma࿝Breast 61 4 Proc Natl Acad Sci 12714683 Custom 31 Hedenfalk࿝Breast 22 2 N Engl J Med 11207349 Custom

The first column corresponds to Fig. 2. The Oncomine Study naming convention is first author of the manuscript from which the dataset was collected and the tissue type. A number exists at the end of the study name if a single author has contributed multiple datasets. n is the number of samples in the datasets. Outliers indicates the number of the 158 meta-outliers that were called an outlier in the respective datasets. PMID is PubMed identifier.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 13 of 17 Table S2. Significant meta-outliers Gene Study count Total studies Misses % total P

2064 ERBB2 21 29 8 72 3.55875E-26 5469 PPARBP 19 26 7 73 6.24478E-24 2173 FABP7 18 27 9 67 1.40088E-21 2886 GRB7 18 29 11 62 9.76205E-21 10948 STARD3 18 30 12 60 2.37121E-20 51442 VGLL1 16 24 8 67 2.51785E-19 185 AGTR1 15 22 7 68 2.00428E-18 1548 CYP2A6 15 23 8 65 5.60037E-18 2891 GRIA2 16 29 13 55 2.0136E-17 3929 LBP 15 25 10 60 3.52672E-17 1280 COL2A1 14 23 9 61 3.02744E-16 93210 PERLD1 13 19 6 68 3.65146E-16 9635 CLCA2 13 21 8 62 2.58828E-15 866 SERPINA6 14 27 13 52 6.63407E-15 6947 TCN1 14 28 14 50 1.28973E-14 6279 S100A8 13 26 13 50 1.14887E-13 10202 DHRS2 12 21 9 57 1.21341E-13 1549 CYP2A7 13 28 15 46 3.90916E-13 3535 IGL@ 13 28 15 46 3.90916E-13 7545 ZIC1 11 18 7 61 4.63839E-13 1381 CRABP1 13 29 16 45 6.88842E-13 5709 PSMD3 12 24 12 50 1.02629E-12 5816 PVALB 12 24 12 50 1.02629E-12 429 ASCL1 13 30 17 43 1.18182E-12 55859 BEX1 11 20 9 55 2.31544E-12 1485 CTAG1B 11 20 9 55 2.31544E-12 4604 MYBPC1 11 20 9 55 2.31544E-12 54763 ROPN1 10 16 6 63 4.00614E-12 4589 MUC7 11 21 10 52 4.72896E-12 5799 PTPRN2 12 27 15 44 6.0657E-12 54490 UGT2B28 9 13 4 69 1.26145E-11 3213 HOXB3 12 29 17 41 1.71215E-11 23532 PRAME 12 29 17 41 1.71215E-11 1048 CEACAM5 12 30 18 40 2.7748E-11 4102 MAGEA3 9 14 5 64 3.43679E-11 3857 KRT9 12 31 19 39 4.40232E-11 8534 CHST1 11 25 14 44 5.3468E-11 5178 PEG3 11 25 14 44 5.3468E-11 5744 PTHLH 11 25 14 44 5.3468E-11 5655 KLK10 10 20 10 50 8.27659E-11 1360 CPB1 10 21 11 48 1.53709E-10 4101 MAGEA2 9 16 7 56 1.85942E-10 2819 GPD1 11 28 17 39 2.36979E-10 4693 NDP 11 28 17 39 2.36979E-10 5567 PRKACB 11 28 17 39 2.36979E-10 3852 KRT5 11 29 18 38 3.71337E-10 22943 DKK1 9 17 8 53 3.84481E-10 2019 EN1 9 17 8 53 3.84481E-10 339479 FAM5C 9 17 8 53 3.84481E-10 419 ART3 10 23 13 43 4.71816E-10 22794 CASC3 10 23 13 43 4.71816E-10 2296 FOXC1 10 23 13 43 4.71816E-10 10232 MSLN 11 30 19 37 5.70256E-10 8942 KYNU 10 27 17 37 3.11607E-9 7103 TSPAN8 10 27 17 37 3.11607E-9 2266 FGG 9 21 12 43 4.16788E-9 55876 GSDML 9 21 12 43 4.16788E-9 11341 SCRG1 9 21 12 43 4.16788E-9 3158 HMGCS2 10 28 18 36 4.71556E-9 7980 TFPI2 10 28 18 36 4.71556E-9 9862 THRAP4 10 28 18 36 4.71556E-9 8581 LY6D 8 16 8 50 6.80388E-9 7367 UGT2B17 8 16 8 50 6.80388E-9 1081 CGA 9 23 14 39 1.09725E-8

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 14 of 17 Entrez Gene Study count Total studies Misses % total P

3248 HPGD 9 23 14 39 1.09725E-8 1641 DCX 8 17 9 47 1.251E-8 1893 ECM1 10 31 21 32 1.46741E-8 4915 NTRK2 10 31 21 32 1.46741E-8 5691 PSMB3 10 31 21 32 1.46741E-8 1356 CP 9 24 15 38 1.70839E-8 1101 CHAD 8 18 10 44 2.19194E-8 10884 MRPS30 8 18 10 44 2.19194E-8 3205 HOXA9 9 25 16 36 2.5976E-8 7363 UGT2B4 9 25 16 36 2.5976E-8 3760 KCNJ3 8 19 11 42 3.68547E-8 1404 HAPLN1 9 26 17 35 3.86604E-8 2897 GRIK1 9 27 18 33 5.64325E-8 147179 WIPF2 9 27 18 33 5.64325E-8 247 ALOX15B 8 20 12 40 5.97928E-8 4916 NTRK3 8 20 12 40 5.97928E-8 2264 FGFR4 9 28 19 32 8.093E-8 3963 LGALS7 9 28 19 32 8.093E-8 6744 SSFA2 9 28 19 32 8.093E-8 2717 GLA 8 21 13 38 9.40236E-8 6445 SGCG 8 21 13 38 9.40236E-8 7368 UGT8 8 21 13 38 9.40236E-8 4753 NELL2 9 29 20 31 1.14198E-7 158763 RP13–102H20.1 6 10 4 60 1.37953E-7 117159 DCD 5 6 1 83 1.42155E-7 1746 DLX2 8 22 14 36 1.43829E-7 7200 TRH 8 22 14 36 1.43829E-7 8772 FADD 9 30 21 30 1.5876E-7 9598 CSAG2 7 16 9 44 1.97008E-7 2537 IFI6 9 31 22 29 2.17705E-7 1733 DIO1 8 24 16 33 3.13487E-7 30848 CTAG2 7 17 10 41 3.26157E-7 6549 SLC9A2 7 17 10 41 3.26157E-7 79933 SYNPO2L 7 17 10 41 3.26157E-7 9503 XAGE1 7 17 10 41 3.26157E-7 9 NAT1 8 25 17 32 4.48787E-7 51806 CALML5 7 18 11 39 5.19763E-7 3887 KRT81 7 18 11 39 5.19763E-7 795 S100G 7 18 11 39 5.19763E-7 80122 YSK4 7 18 11 39 5.19763E-7 91646 ECAT8 6 12 6 50 5.76252E-7 57575 PCDH10 6 12 6 50 5.76252E-7 1114 CHGB 8 26 18 31 6.31065E-7 3642 INSM1 8 26 18 31 6.31065E-7 5122 PCSK1 8 26 18 31 6.31065E-7 51412 ACTL6B 7 19 12 37 8.01459E-7 8927 BSN 7 19 12 37 8.01459E-7 5317 PKP1 7 19 12 37 8.01459E-7 57348 TTYH1 7 19 12 37 8.01459E-7 7783 ZP2 7 19 12 37 8.01459E-7 827 CAPN6 8 27 19 30 8.73016E-7 2568 GABRP 8 27 19 30 8.73016E-7 2888 GRB14 8 27 19 30 8.73016E-7 7018 TF 8 27 19 30 8.73016E-7 658 BMPR1B 7 20 13 35 1.20082E-6 6898 TAT 7 20 13 35 1.20082E-6 4588 MUC6 5 8 3 63 1.26134E-6 10720 UGT2B11 5 8 3 63 1.26134E-6 346171 ZFP57 5 8 3 63 1.26134E-6 124 ADH1A 8 29 21 28 1.59962E-6 3576 IL8 8 29 21 28 1.59962E-6 5644 PRSS1 8 29 21 28 1.59962E-6 6006 RHCE 8 29 21 28 1.59962E-6 6857 SYT1 8 29 21 28 1.59962E-6 7345 UCHL1 8 29 21 28 1.59962E-6

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 15 of 17 Entrez Gene Study count Total studies Misses % total P

810 CALML3 7 21 14 33 1.75421E-6 1041 CDSN 7 21 14 33 1.75421E-6 10321 CRISP3 7 21 14 33 1.75421E-6 51755 CRKRS 7 21 14 33 1.75421E-6 10481 HOXB13 7 21 14 33 1.75421E-6 6280 S100A9 7 21 14 33 1.75421E-6 6278 S100A7 7 22 15 32 2.50573E-6 1811 SLC26A3 7 22 15 32 2.50573E-6 6664 SOX11 7 22 15 32 2.50573E-6 90070 LACRT 5 9 4 56 2.76722E-6 222008 VSTM2 5 9 4 56 2.76722E-6 286 ANK1 8 31 23 26 2.78648E-6 57718 KIAA1622 6 15 9 40 2.88761E-6 199974 CYP4Z1 6 16 10 38 4.50197E-6 94025 MUC16 6 16 10 38 4.50197E-6 727897 MUC5B 6 16 10 38 4.50197E-6 8714 ABCC3 7 24 17 29 4.82347E-6 10562 OLFM4 5 10 5 50 5.3965E-6 1029 CDKN2A 7 25 18 28 6.52474E-6 8671 SLC4A4 7 25 18 28 6.52474E-6 56938 ARNTL2 6 17 11 35 6.77968E-6 11254 SLC6A14 6 17 11 35 6.77968E-6 2571 GAD1 7 26 19 27 8.69609E-6 7136 TNNI2 7 26 19 27 8.69609E-6 284266 CD33L3 5 11 6 45 9.64715E-6 11283 CYP4F8 6 18 12 33 9.90964E-6 143662 MUC15 6 18 12 33 9.90964E-6 26050 SLITRK5 6 18 12 33 9.90964E-6 7475 WNT6 6 18 12 33 9.90964E-6 151126 ZNF533 6 18 12 33 9.90964E-6

The Entrez Gene ID and official gene symbols are listed. Study count is the number of datasets in which the gene was called an outlier. Total studies indicates the total number of datasets in which the gene was measured. Misses indicates the number of datasets in which the gene was measured but was not called an outlier. % Total is the fraction of datasets in which the gene was called an outlier. The P value is based on the binomial distribution and indicates the probability that a gene would be called an outlier in the given number of datasets by chance.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 16 of 17 Table S3. Top-scoring meta-outliers Chr Symbol No of studies Total no. of studies % total P

17q PPARBP 19 26 73 6.2E-24 17q ERBB2 21 29 72 3.6E-26 17q PERLD1 13 19 68 3.7E-16 3q AGTR1 15 22 68 2E-18 6q FABP7 18 27 67 1.4E-21 Xq VGLL1 16 24 67 2.5E-19 19q CYP2A6 15 23 65 5.6E-18 17q GRB7 18 29 62 9.8E-21 1p CLCA2 13 21 62 2.6E-15 3q ZIC1 11 18 61 4.6E-13 12q COL2A1 14 23 61 3E-16 17q STARD3 18 30 60 2.4E-20 20q LBP 15 25 60 3.5E-17 14q DHRS2 12 21 57 1.2E-13 4q GRIA2 16 29 55 2E-17 Xq BEX1 11 20 55 2.3E-12 14q SERPINA6 14 27 52 6.6E-15 11q TCN1 14 28 50 1.3E-14 1q S100A8 13 26 50 1.1E-13 17q PSMD3 12 24 50 1E-12 22q PVALB 12 24 50 1E-12 19q CYP2A7 13 28 46 3.9E-13 22q IGL@ 13 28 46 3.9E-13 15q CRABP1 13 29 45 6.9E-13 12q ASCL1 13 30 43 1.2E-12

Chr, arm. The number of studies in which the gene was found to be an outlier (No. of studies) and the total number of studies in which the gene was measured are listed (Total no. of studies). The P value was calculated by using the binomial distribution.

Rhodes et al. www.pnas.org/cgi/content/short/0900351106 17 of 17