Supplementary Section

SUPPLEMENTARY SECTION

a) Hierarchical clustering analysis

Hierarchical clustering analysis of all 81 samples of the study, as well as specific subgroups of specimens was used to explore data obtained by gene expression profiling.

Gene expression matrices were filtered to exclude rows displaying missing values exceeding 10%, and both genes and experiments were mean-centered. Hierarchical clustering, carried on using the Pearson metric and the average clustering method, revealed the aggregation of cancer specimens from the same patient and of OSE samples.

Analysis of gene expression profiles using only tumor samples (excluding all cell line specimens) revealed several distinct clusters, which varied in number of genes and levels of expression (see Figure A).

Hierarchical clustering analysis indicated the following:

 Cluster D and clusters N and O displayed opposite gene expression profiles,

determining the major split of samples. These three clusters contained a very large

number of genes so that clear functional relationships were not evident. However

several growth factors, morphogens and signal transduction proteins were

revealed in these groups (VEGF, WNT5A, PDGFB in cluster D, WNT4 and

BMP4 in cluster N and MADH2, MADH5 and MADH6 in cluster O).

 Clusters E and G included numerous genes involved in the host immune response,

probably identifying tumor samples containing lymphocytes and immune cells.

Cluster E contained several interferon induced proteins (G1P2, G1P3, MX1,

IFIT1, IFI16), while cluster G contained several MHC class II molecules.  Cluster F contained a number of genes previously proposed as EOC markers, such

as KRT17, MEIS1, PAX8 and EPHB3 (Welsh et al, 2001; Schaner et al, 2003).

 Cluster M was clearly enriched in ECM proteins and contained both FGF2 and

FGFR4.

b) Unsupervised class discovery, associated gene list extraction, and GO-terms

distribution comparison

Class discovery (unsupervised clustering) was carried out using ISIS v.2.0 software (von

Heydebreck et al, 2001). Briefly, the software generates a large set of average gene expression profiles by standard hierarchical clustering and then, for each average profile, checks whether the clustering suggests one or more binary class distinctions of the set of samples. Finally, a statistical score (diagonal linear discriminant, DLD) is calculated for each candidate bipartition which quantifies how strongly the two classes are separated by the expression levels of a suitable subset of genes. The procedure was carried out using the default ISIS parameters (p=50, poffs=0, and all possible candidate splits=149), and 24 candidate splits were obtained. All 24 partitions revealed in tumor samples were further compared with available clinical data.

Because ISIS software does not provide the list of genes responsible for the partition formed, a two-sample univariate F-test (with randomized variance model and with FDR correction) was used to extract gene lists. A nominal P-value < 0.002 was chosen as a threshold, in order to limit the number of recovered genes. The confidence level of false discovery rate assessment was 90%, the maximum allowed number of false-positive genes was 10, and the maximum allowed proportion of false-positive genes was 0.1. The procedure was applied to all discovered partitions.

The number of retrieved genes varied among the different lists associated to the binary partitions, and GO-terms distribution analysis using EASE software v.2.0 (Hosack et al,

2003) was applied to reveal functional associations. The distribution of GO-terms associated to each gene list was compared to the distribution of GO-terms associated to all genes present on the array. Up- and downregulated genes within each list were analyzed separately, using only valid UniGene identifiers. Table A summarizes the results. Briefly, this analysis identified three gene lists (associated with ISIS classes 6, 20 and 24 respectively), which were enriched in GO-terms related to the ECM and one list

(associated with ISIS class 15) enriched in genes localized in the intracellular compartment. ECM and its remodeling emerged as the major relevant functional theme in our expression data. The three identified gene lists as well as the three related partitions largely overlapped, and the ISIS class 20 was chosen for further analysis, since it was the most balanced in terms of number of samples in each group. c ) Gene list description.

The ECM/FGF2 signaling-related classifier, as predicted using our dataset, contained genes related to the ECM and to elements functionally related to FGF2 signaling. The functional category related to ECM included genes encoding ECM structural components as well as genes involved in its remodeling and in cell adhesion processes: collagens

(COL3A1, COL5A1, COL6A3, COL9A3), the proteoglycans fibulin 2 (FBLN2) and fibromodulin (FMOD), the fibronectin 1 gene (FN1), thrombospondin 2 (THBS2),

Lutheran blood group (LU), fibroblast activation protein-alpha (FAP), lysyl oxidase-like 1 (LOXL1), latent transforming growth factor-beta binding protein 2 (LTBP2), SPARC- like 1 (SPARCL1), proteoglycan 1, secretory granule (PRG1), osteoblast-specific factor 2

(OSF2). Along with FGF2 this classifier contained the FGFR4, one clone corresponding to an immature form of the FGFR2, as well as other ECM related genes reported to be regulated by FGF2 in various cellular contexts: OB cadherin (CDH11), lumican (LUM), biglycan (BGN).

Among the other genes of the top-DLD-classifier, several are associated with the immune/inflammatory response of the host: CXCL2 chemokine (also known as GRO2 oncogene), interleukin 16 (IL16), MHC class II transactivator (MHC2TA) and the Fc fragment of IgG receptor and transporter-alpha (FCGRT), Fc fragment of IgG low- affinity IIIa receptor (FCGR3A), Duffy blood group (FY), lymphocyte-specific protein tyrosine kinase (LCK), tumor necrosis factor receptor superfamily member 6 (TNFRSF6) and Rhesus blood group-associated glycoprotein (RHAG). The top-DLD-classifier also accounted for genes involved in transcription regulation, such as GATA binding protein 1

(GATA1), GA binding protein transcription factors alpha subunit (60 kDa) and beta subunit 2 (47 kDa) (GABPA, GABPB2), NGFI-A binding protein 2 (NAB2), AE binding protein 1 (AEBP1) and RING1 and YY1 binding protein (RYBP).

Complete gene lists are available from the web sites of IFOM (http://www.ifom.it/) and

LNCIB (http://www.lncib.it/).

d) Ascitic cell recovery

Cells present in ascitic fluid were collected by centrifugation, resuspended in RPMI 1460 medium, stratified over a 75-100% Ficoll-Hypaque discontinuous density gradient and centrifuged to harvest tumor-associated lymphocytes and tumor cells. Tumor cells were enriched over the 75% Ficoll density gradient. Contaminating monocytes were removed by plastic adherence for 1 h at 37°C. Purity of ovarian tumor cell populations was deter- mined by flow-cytometric analysis of different tumor markers (Ca125, FR, Herb-B2,

EGF-R) and leukocyte differentiation antigens (CD3, CD14, CD16, CD28, CD25).

e) Analysis of TP53

Genomic DNA was extracted from frozen specimens when available (12 cases). In the

remaining 30 cases, methylene-blue stained sections from formalin-fixed, paraffin-

embedded tissues were microdissected under the microscope to obtain malignant

tissues. Genomic DNA was extracted as described (Birindelli et al, 2001). Samples

were screened by PCR-SSCP (single strand conformation polymorphism) (Donghi

et al, 1993) or by DG-DGGE (double gradient-denaturating gradient gel

electrophoresis) (Gelfi et al, 1997) for the presence of TP53 mutations in the most

frequently affected exons (5 through 8) of the gene. Samples with mutations were

identified by the presence of one or more new bands or a shift in position compared

with a control wild-type cell line and control mutated samples. These cases were

subjected to automated DNA sequencing (ABI Prism 377, Applied Biosystems)

and each sequence reaction was performed at least twice in sense and antisense

strands

f) cDNA microarrays

After printing, slides were cross-linked at 45 mj/cm2 and stored in a desiccator. Before hybridization, all slides were treated with 50% formamide for 2 min at 70°C to remove excess DNA, followed by 1% SDS/H2O for 5 min at room temperature to reduce overall background. Pre-hybridization was performed in UltraHyb hybridization buffer for 1 h at

42°C (Ambion, Austin, TX). All clones were annotated according to their GenBank accession number or their I.M.A.G.E. clone identifier, using the SOURCE (Diehn et al,

2003) and the IFOM EST Annotation Machine (Guffanti et al, 2002) resources.

g) Target cDNA preparation

Probe labeling reaction was carried out in a final volume of 40 l (1X first-strand buffer;

0.01 M DTT, 0.1 mM dATP, dGTP, dTTP, 6.25 M dCTP; 0.33 mM Cy3 or Cy5 dCTP;

1 Ci 32P-dCPT; 20 U RNase inhibitor from human placenta (Roche Applied Science,

Indianapolis, IN); 300 U SuperScript II reverse transcriptase (Life Technologies,

Fredrick, MA). Samples were incubated at 42°C for 2 h and the reactions stopped by addition of 4 l of 0.5 M EDTA, pH 8. Starting RNA template was removed by alkali hydrolysis adding 4 l of 0.5 M NaOH, followed by incubation at 70°C for 15 min and finally neutralization with 4 l of 0.5 M HCl. Unincorporated nucleotides were removed from labeled probes using Microcon YM-50 columns (Millipore, Bedford, MA).

h) Slide scanning and image analysis cDNA microarrays were scanned using the GenePix 4000A microarray scanner at a resolution of 10 m and analyzed using GenePix Pro v.3.0 software (Axon Instruments,

Union City, CA). Image scanning and acquisition processes were carried out as follows:

1. The Cy5 laser photomultiplier voltage (PMT) of each

sample was modulated according to the fixed Cy3 PMT value of the reference to obtain a scatter plot of the fluorescent ratios with a regression ratio of 1, thereby

indicating a balance between the two channels.

2. Removal of poor quality spots ("flagging") was carried out

both automatically using GenePix Pro software and manually. Such spots were

excluded from further analyses.

i) Raw data filtering, normalization and gene expression matrix construction

The GenePix Pro GPR raw data files containing the spot intensities were processed using the GenePix post-processing program GP3 (Fielden et al, 2002) Each GPR file associated to each sample was processed to correct, filter and normalize the data to obtain reliable

Cy3 and Cy5 ratios of each cDNA target.

Raw data filtering procedures to remove low-quality data included:

1. Removal of failed PCR clones;

2. Removal of flagged spots identified during scanning;

3. Removal of spots not meeting the following criteria:

a. negative or saturated local background corrected signal intensities in both

channels;

b. median signal intensity less than the median local background plus two standard

deviations of the median local background;

4. Removal of clones showing only one single valid measurement in the three

replicates;

After the filtering procedures, each slide was normalized to balance the Cy3 and Cy5 channels using a global trimmed mean obtained by eliminating the upper and lower 5% of the data. This value was subtracted from each data point. The final output of each sample contained Cy3 and Cy5 local background-corrected signals, Cy5 to Cy3 ratios

(log2) and a flag tag for each corresponding clone. The final gene expression matrix was obtained by collating log2 ratios and flag data from the 81 experiments. In all subsequent analyses, control spike genes were removed and a maximum of 10% of missing (invalid) values was allowed. The nearest neighbor method was chosen to estimate missing values.

Mean centering was applied to all genes to standardize the dataset. The Pearson metric and average or complete linkage methods were used in hierarchical clustering. The gene expression matrix used for ISIS class discovery procedure accounted for 39 specimens of primary tumors from advanced disease (with the exclusion of the 2 clear cell cases) and was further filtered to about 2000 genes to reduce the noise, as suggested by the software authors (von Heydebreck et al, 2001), by keeping only genes showing a row standard deviation greater than 0.27.

j) Supervised learning

Supervised learning methods implemented in BRB-ArrayTools were used to select the best discriminating genes associated with the known phenotypes. Standard class comparison was done using a two-sample univariate F-test (with randomized variance model and with FDR correction with a nominal P-value < 0.0025). The confidence level of false discovery rate assessment was 90%, the maximum allowed number of false- positive genes was 10, and the maximum allowed proportion of false-positive genes was

0.1. The estimated probability of identifying at least 113 genes as significant (P < 0.0025) by chance, when no real differences exist between the classes was 0.00606. All permutation tests carried out used at least 1000 permutations.

k) Immunohistochemistry

Tumor sections (1-2 m) were serially cut from formalin-fixed, paraffin-embedded tissue mounted in poly-L-lysine (Sigma, St. Louis, MO)-coated slides, deparaffinized in xylene and hydrated in graded alcohols. Endogenous peroxidase activity was inhibited by treating sections with 0.3% hydrogen peroxide in methanol for 30 min. Slides were washed three times in 0.05 M PBS-0.1% Triton, incubated with normal goat or human albumin diluted 1:50 and 1:100, respectively, in 1% PBS 0.1%-BSA-sodium azide and incubated overnight with the following primary antibodies:

Antibody Supplier Epitope retrieval Dilution Controls a FGF2 (147) Santa 6-min autoclave in citrate buffer (pH 1:50 Pos:pancreas sc-79, pAb Cruz 6) + 5-min in protease XIV 2% in Neg:blocking peptide TBS-EDTA FGFR4 (c16) Santa 6-min autoclave in citrate buffer (pH 1:200 Pos:neuroendocrine sc-124, pAb Cruz 6) carcinoma Neg:blocking peptide FN1 Dako 6-min autoclave in EDTA (pH 8) + 5- 1:800 Pos:myofibroblastic A 0245, pAb min in protease XIV 2% in TBS- sarcoma EDTA aAntibodies were diluted in 1% PBS-0,1% BSA-sodium azide. TABLE A. ISIS classes, associated gene lists and GO-terms distribution analysis.

UP UP GO-terms ISIS Group Group DLD No. Genes No. Genes No. Genes g group group Bonferroni j a b b c d e f Probability GO-term CLASS 0 1 Score P<0.002 FP<10 genes FP<10% h h i 0 1 P<0.05

1 24 15 16.82 100 79 47 0 47 53 NO 2 10 29 15.2 25 20 3 0.051 24 1 NO 3 32 7 13.82 35 20 3 0.037 14 21 NO 4 12 27 13.46 298 263 482 0 - - NO 5 10 29 13.25 314 239 514 0 - - NO 6 12 27 13.17 181 147 196 0.001 - - YES ECM 7 8 31 13.07 92 59 33 0.005 - - NO 8 4 35 13.02 86 50 7 0.017 - - NO 9 7 32 12.99 159 115 118 0 - - NO 1 14 25 12.98 105 99 87 0.001 - - NO 11 5 34 12.97 38 21 3 0.06 33 5 NO 12 12 27 12.92 36 30 5 0.016 30 6 NO 13 9 30 12.64 9 10 1 0.23 7 3 NO 14 4 35 12.62 61 26 5 0.038 - - NO 15 10 29 12.46 301 250 478 0 - - YES Intracellular 16 33 6 12.41 16 11 3 0.154 2 14 NO 17 5 34 12.26 450 281 605 0 - - NO 18 9 30 12.07 90 72 38 0.002 - - NO 19 9 30 12.00 266 239 323 0 - - NO 20 21 18 11.63 75 67 57 0.003 10 65 YES ECM 21 33 6 11.42 70 38 6 0.012 9 61 NO 22 6 33 11.2 79 56 27 0.009 60 19 NO 23 9 30 10.95 106 79 56 0.003 77 29 YES ECM 24 6 33 9.493 31 16 5 0.062 29 2 NO a Binary partitions obtained by analysis of 39 advanced EOC by automated class

discovery with ISIS software (von Heydebreck et al, 2001). b Number of samples within each group of samples. c DLD score associated to each binary partition discovered with ISIS. d Number of genes significant at P < 0.002 in univariate F-test obtained using BRB-

ArrayTools (Radmacher et al, 2002; McShane et al, 2002).. e Number of genes containing less than 10 false positives with a 90% interval of

confidence. f Number of genes containing less than 10% false positives with a 90% interval of

confidence. g Probability that the predicted number of genes is significant (at P < 0.002) by chance

and not indicative of differences between the classes. h Number of upregulated genes in group 0 or group 1 of tumor samples. i ISIS classes with associated gene lists significantly (P < 0.05 using Bonferroni

correction) enriched in specific GO-terms in comparison to the distribution of GO-

terms of all genes present on the array, as assessed using EASE software v.2.0

(Hosack et al, 2003). j Significant retrieved GO-terms. FIGURE A. Hierarchical clustering of all analyzed samples except the cell lines. Cluster- ing was done using 3309 genes, using the Pearson metric and the average clustering method. Expression levels are relative to a common reference, obtained by pooling RNA from ten human cell lines. Increased (orange) or decreased (blue) expression of the genes is shown for each sample. Major clusters and several of the genes they contain are shown in boxes A-O. REFERENCES

Birindelli S, Perrone F, Oggionni M, Lavarino C, Pasini B, Vergani B, Ranzani GN, Pierotti MA and Pilotti S. (2001). Lab Invest, 81, 833-844.

Diehn M, Sherlock G, Binkley G, Jin H, Matese JC, Hernandez-Boussard T, Rees CA, Cherry JM, Botstein D, Brown PO and Alizadeh AA. (2003). Nucleic Acids Res, 31, 219- 223.

Donghi R, Longoni A, Pilotti S, Michieli P, Della Porta G and Pierotti MA. (1993). J Clin Invest, 91, 1753-1760.

Fielden MR, Halgren RG, Dere E and Zacharewski TR. (2002). Bioinformatics, 18, 771- 773.

Gelfi C, Righetti SC, Zunino F, Della TG, Pierotti MA and Righetti PG. (1997). Electrophoresis, 18, 2921-2927.

Guffanti A, Reid JF, Alcalay M and Simon G. (2002). Trends Genet, 18, 589-592.

Hosack DA, Dennis G, Jr., Sherman BT, Lane HC and Lempicki RA. (2003). Genome Biol, 4, R70.

McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC and Simon R. (2002). Bioinformatics, 18, 1462-1469.

Radmacher MD, McShane LM and Simon R. (2002). J Comput Biol, 9, 505-511.

Schaner ME, Ross DT, Ciaravino G, Sorlie T, Troyanskaya O, Diehn M, Wang YC, Duran GE, Sikic TL, Caldeira S, Skomedal H, Tu IP, Hernandez-Boussard T, Johnson SW, O'Dwyer PJ, Fero MJ, Kristensen GB, Borresen-Dale AL, Hastie T, Tibshirani R, van de RM, Teng NN, Longacre TA, Botstein D, Brown PO and Sikic BI. (2003). Mol Biol Cell, 14, 4376-4386. von Heydebreck A, Huber W, Poustka A and Vingron M. (2001). Bioinformatics, 17, S107-S114.

Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Behling CA, Monk BJ, Lockhart DJ, Burger RA and Hampton GM. (2001). Proc Natl Acad Sci U S A, 98, 1176-1181.