Supplementary Section

SUPPLEMENTARY SECTION

a)  Hierarchical clustering analysis

Hierarchical clustering analysis of all 81 samples of the study, as well as specific subgroups of specimens was used to explore data obtained by gene expression profiling. Gene expression matrices were filtered to exclude rows displaying missing values exceeding 10%, and both genes and experiments were mean-centered. Hierarchical clustering, carried on using the Pearson metric and the average clustering method, revealed the aggregation of cancer specimens from the same patient and of OSE samples. Analysis of gene expression profiles using only tumor samples (excluding all cell line specimens) revealed several distinct clusters, which varied in number of genes and levels of expression (see Figure A).

Hierarchical clustering analysis indicated the following:

·  Cluster D and clusters N and O displayed opposite gene expression profiles, determining the major split of samples. These three clusters contained a very large number of genes so that clear functional relationships were not evident. However several growth factors, morphogens and signal transduction proteins were revealed in these groups (VEGF, WNT5A, PDGFB in cluster D, WNT4 and BMP4 in cluster N and MADH2, MADH5 and MADH6 in cluster O).

·  Clusters E and G included numerous genes involved in the host immune response, probably identifying tumor samples containing lymphocytes and immune cells. Cluster E contained several interferon induced proteins (G1P2, G1P3, MX1, IFIT1, IFI16), while cluster G contained several MHC class II molecules.

·  Cluster F contained a number of genes previously proposed as EOC markers, such as KRT17, MEIS1, PAX8 and EPHB3 (Welsh et al, 2001; Schaner et al, 2003).

·  Cluster M was clearly enriched in ECM proteins and contained both FGF2 and FGFR4.

b)  Unsupervised class discovery, associated gene list extraction, and GO-terms distribution comparison

Class discovery (unsupervised clustering) was carried out using ISIS v.2.0 software (von Heydebreck et al, 2001). Briefly, the software generates a large set of average gene expression profiles by standard hierarchical clustering and then, for each average profile, checks whether the clustering suggests one or more binary class distinctions of the set of samples. Finally, a statistical score (diagonal linear discriminant, DLD) is calculated for each candidate bipartition which quantifies how strongly the two classes are separated by the expression levels of a suitable subset of genes. The procedure was carried out using the default ISIS parameters (p=50, poffs=0, and all possible candidate splits=149), and 24 candidate splits were obtained. All 24 partitions revealed in tumor samples were further compared with available clinical data.

Because ISIS software does not provide the list of genes responsible for the partition formed, a two-sample univariate F-test (with randomized variance model and with FDR correction) was used to extract gene lists. A nominal P-value < 0.002 was chosen as a threshold, in order to limit the number of recovered genes. The confidence level of false discovery rate assessment was 90%, the maximum allowed number of false-positive genes was 10, and the maximum allowed proportion of false-positive genes was 0.1. The procedure was applied to all discovered partitions.

The number of retrieved genes varied among the different lists associated to the binary partitions, and GO-terms distribution analysis using EASE software v.2.0 (Hosack et al, 2003) was applied to reveal functional associations. The distribution of GO-terms associated to each gene list was compared to the distribution of GO-terms associated to all genes present on the array. Up- and downregulated genes within each list were analyzed separately, using only valid UniGene identifiers. Table A summarizes the results. Briefly, this analysis identified three gene lists (associated with ISIS classes 6, 20 and 24 respectively), which were enriched in GO-terms related to the ECM and one list (associated with ISIS class 15) enriched in genes localized in the intracellular compartment. ECM and its remodeling emerged as the major relevant functional theme in our expression data. The three identified gene lists as well as the three related partitions largely overlapped, and the ISIS class 20 was chosen for further analysis, since it was the most balanced in terms of number of samples in each group.

c ) Gene list description.

The ECM/FGF2 signaling-related classifier, as predicted using our dataset, contained genes related to the ECM and to elements functionally related to FGF2 signaling. The functional category related to ECM included genes encoding ECM structural components as well as genes involved in its remodeling and in cell adhesion processes: collagens (COL3A1, COL5A1, COL6A3, COL9A3), the proteoglycans fibulin 2 (FBLN2) and fibromodulin (FMOD), the fibronectin 1 gene (FN1), thrombospondin 2 (THBS2), Lutheran blood group (LU), fibroblast activation protein-alpha (FAP), lysyl oxidase-like 1 (LOXL1), latent transforming growth factor-beta binding protein 2 (LTBP2), SPARC-like 1 (SPARCL1), proteoglycan 1, secretory granule (PRG1), osteoblast-specific factor 2 (OSF2). Along with FGF2 this classifier contained the FGFR4, one clone corresponding to an immature form of the FGFR2, as well as other ECM related genes reported to be regulated by FGF2 in various cellular contexts: OB cadherin (CDH11), lumican (LUM), biglycan (BGN).

Among the other genes of the top-DLD-classifier, several are associated with the immune/inflammatory response of the host: CXCL2 chemokine (also known as GRO2 oncogene), interleukin 16 (IL16), MHC class II transactivator (MHC2TA) and the Fc fragment of IgG receptor and transporter-alpha (FCGRT), Fc fragment of IgG low-affinity IIIa receptor (FCGR3A), Duffy blood group (FY), lymphocyte-specific protein tyrosine kinase (LCK), tumor necrosis factor receptor superfamily member 6 (TNFRSF6) and Rhesus blood group-associated glycoprotein (RHAG). The top-DLD-classifier also accounted for genes involved in transcription regulation, such as GATA binding protein 1 (GATA1), GA binding protein transcription factors alpha subunit (60 kDa) and beta subunit 2 (47 kDa) (GABPA, GABPB2), NGFI-A binding protein 2 (NAB2), AE binding protein 1 (AEBP1) and RING1 and YY1 binding protein (RYBP).

Complete gene lists are available from the web sites of IFOM (http://www.ifom.it/) and LNCIB (http://www.lncib.it/).

d)  Ascitic cell recovery

Cells present in ascitic fluid were collected by centrifugation, resuspended in RPMI 1460 medium, stratified over a 75-100% Ficoll-Hypaque discontinuous density gradient and centrifuged to harvest tumor-associated lymphocytes and tumor cells. Tumor cells were enriched over the 75% Ficoll density gradient. Contaminating monocytes were removed by plastic adherence for 1 h at 37°C. Purity of ovarian tumor cell populations was determined by flow-cytometric analysis of different tumor markers (Ca125, FR, Herb-B2, EGF-R) and leukocyte differentiation antigens (CD3, CD14, CD16, CD28, CD25).

e)  Analysis of TP53

Genomic DNA was extracted from frozen specimens when available (12 cases). In the remaining 30 cases, methylene-blue stained sections from formalin-fixed, paraffin-embedded tissues were microdissected under the microscope to obtain malignant tissues. Genomic DNA was extracted as described (Birindelli et al, 2001). Samples were screened by PCR-SSCP (single strand conformation polymorphism) (Donghi et al, 1993) or by DG-DGGE (double gradient-denaturating gradient gel electrophoresis) (Gelfi et al, 1997) for the presence of TP53 mutations in the most frequently affected exons (5 through 8) of the gene. Samples with mutations were identified by the presence of one or more new bands or a shift in position compared with a control wild-type cell line and control mutated samples. These cases were subjected to automated DNA sequencing (ABI Prism 377, Applied Biosystems) and each sequence reaction was performed at least twice in sense and antisense strands

f)  cDNA microarrays

After printing, slides were cross-linked at 45 mj/cm2 and stored in a desiccator. Before hybridization, all slides were treated with 50% formamide for 2 min at 70°C to remove excess DNA, followed by 1% SDS/H2O for 5 min at room temperature to reduce overall background. Pre-hybridization was performed in UltraHyb hybridization buffer for 1 h at 42°C (Ambion, Austin, TX). All clones were annotated according to their GenBank accession number or their I.M.A.G.E. clone identifier, using the SOURCE (Diehn et al, 2003) and the IFOM EST Annotation Machine (Guffanti et al, 2002) resources.

g)  Target cDNA preparation

Probe labeling reaction was carried out in a final volume of 40 ml (1X first-strand buffer; 0.01 M DTT, 0.1 mM dATP, dGTP, dTTP, 6.25 mM dCTP; 0.33 mM Cy3 or Cy5 dCTP; 1 mCi a32P-dCPT; 20 U RNase inhibitor from human placenta (Roche Applied Science, Indianapolis, IN); 300 U SuperScript II reverse transcriptase (Life Technologies, Fredrick, MA). Samples were incubated at 42°C for 2 h and the reactions stopped by addition of 4 ml of 0.5 M EDTA, pH 8. Starting RNA template was removed by alkali hydrolysis adding 4 ml of 0.5 M NaOH, followed by incubation at 70°C for 15 min and finally neutralization with 4 ml of 0.5 M HCl. Unincorporated nucleotides were removed from labeled probes using Microcon YM-50 columns (Millipore, Bedford, MA).

h)  Slide scanning and image analysis

cDNA microarrays were scanned using the GenePix 4000A microarray scanner at a resolution of 10 mm and analyzed using GenePix Pro v.3.0 software (Axon Instruments, Union City, CA). Image scanning and acquisition processes were carried out as follows:

1.  The Cy5 laser photomultiplier voltage (PMT) of each sample was modulated according to the fixed Cy3 PMT value of the reference to obtain a scatter plot of the fluorescent ratios with a regression ratio of 1, thereby indicating a balance between the two channels.

2.  Removal of poor quality spots ("flagging") was carried out both automatically using GenePix Pro software and manually. Such spots were excluded from further analyses.

i)  Raw data filtering, normalization and gene expression matrix construction

The GenePix Pro GPR raw data files containing the spot intensities were processed using the GenePix post-processing program GP3 (Fielden et al, 2002) Each GPR file associated to each sample was processed to correct, filter and normalize the data to obtain reliable Cy3 and Cy5 ratios of each cDNA target.

Raw data filtering procedures to remove low-quality data included:

1. Removal of failed PCR clones;

2. Removal of flagged spots identified during scanning;

3. Removal of spots not meeting the following criteria:

a. negative or saturated local background corrected signal intensities in both channels;

b. median signal intensity less than the median local background plus two standard deviations of the median local background;

4. Removal of clones showing only one single valid measurement in the three replicates;

After the filtering procedures, each slide was normalized to balance the Cy3 and Cy5 channels using a global trimmed mean obtained by eliminating the upper and lower 5% of the data. This value was subtracted from each data point. The final output of each sample contained Cy3 and Cy5 local background-corrected signals, Cy5 to Cy3 ratios (log2) and a flag tag for each corresponding clone. The final gene expression matrix was obtained by collating log2 ratios and flag data from the 81 experiments. In all subsequent analyses, control spike genes were removed and a maximum of 10% of missing (invalid) values was allowed. The nearest neighbor method was chosen to estimate missing values. Mean centering was applied to all genes to standardize the dataset. The Pearson metric and average or complete linkage methods were used in hierarchical clustering. The gene expression matrix used for ISIS class discovery procedure accounted for 39 specimens of primary tumors from advanced disease (with the exclusion of the 2 clear cell cases) and was further filtered to about 2000 genes to reduce the noise, as suggested by the software authors (von Heydebreck et al, 2001), by keeping only genes showing a row standard deviation greater than 0.27.

j)  Supervised learning

Supervised learning methods implemented in BRB-ArrayTools were used to select the best discriminating genes associated with the known phenotypes. Standard class comparison was done using a two-sample univariate F-test (with randomized variance model and with FDR correction with a nominal P-value < 0.0025). The confidence level of false discovery rate assessment was 90%, the maximum allowed number of false-positive genes was 10, and the maximum allowed proportion of false-positive genes was 0.1. The estimated probability of identifying at least 113 genes as significant (P < 0.0025) by chance, when no real differences exist between the classes was 0.00606. All permutation tests carried out used at least 1000 permutations.

k)  Immunohistochemistry

Tumor sections (1-2 mm) were serially cut from formalin-fixed, paraffin-embedded tissue mounted in poly-L-lysine (Sigma, St. Louis, MO)-coated slides, deparaffinized in xylene and hydrated in graded alcohols. Endogenous peroxidase activity was inhibited by treating sections with 0.3% hydrogen peroxide in methanol for 30 min. Slides were washed three times in 0.05 M PBS-0.1% Triton, incubated with normal goat or human albumin diluted 1:50 and 1:100, respectively, in 1% PBS 0.1%-BSA-sodium azide and incubated overnight with the following primary antibodies:

Antibody / Supplier / Epitope retrieval / Dilutiona / Controls
FGF2 (147)
sc-79, pAb / Santa Cruz / 6-min autoclave in citrate buffer (pH 6) + 5-min in protease XIV 2% in TBS-EDTA / 1:50 / Pos:pancreas
Neg:blocking peptide
FGFR4 (c16)
sc-124, pAb / Santa Cruz / 6-min autoclave in citrate buffer (pH 6) / 1:200 / Pos:neuroendocrine carcinoma
Neg:blocking peptide
FN1
A 0245, pAb / Dako / 6-min autoclave in EDTA (pH 8) + 5-min in protease XIV 2% in TBS-EDTA / 1:800 / Pos:myofibroblastic sarcoma

aAntibodies were diluted in 1% PBS-0,1% BSA-sodium azide.


TABLE A. ISIS classes, associated gene lists and GO-terms distribution analysis.

ISIS CLASSa / Group
0b / Group
1b / DLD
Scorec / No. Genes
P<0.002d / No. Genes
FP<10 genese / No. Genes
FP<10%f / Probabilityg / UP
group 0h / UP
group 1h / GO-terms
Bonferroni
P<0.05i / GO-termj
1 / 24 / 15 / 16.82 / 100 / 79 / 47 / 0 / 47 / 53 / NO
2 / 10 / 29 / 15.2 / 25 / 20 / 3 / 0.051 / 24 / 1 / NO
3 / 32 / 7 / 13.82 / 35 / 20 / 3 / 0.037 / 14 / 21 / NO
4 / 12 / 27 / 13.46 / 298 / 263 / 482 / 0 / - / - / NO
5 / 10 / 29 / 13.25 / 314 / 239 / 514 / 0 / - / - / NO
6 / 12 / 27 / 13.17 / 181 / 147 / 196 / 0.001 / - / - / YES / ECM
7 / 8 / 31 / 13.07 / 92 / 59 / 33 / 0.005 / - / - / NO
8 / 4 / 35 / 13.02 / 86 / 50 / 7 / 0.017 / - / - / NO
9 / 7 / 32 / 12.99 / 159 / 115 / 118 / 0 / - / - / NO
1 / 14 / 25 / 12.98 / 105 / 99 / 87 / 0.001 / - / - / NO
11 / 5 / 34 / 12.97 / 38 / 21 / 3 / 0.06 / 33 / 5 / NO
12 / 12 / 27 / 12.92 / 36 / 30 / 5 / 0.016 / 30 / 6 / NO
13 / 9 / 30 / 12.64 / 9 / 10 / 1 / 0.23 / 7 / 3 / NO
14 / 4 / 35 / 12.62 / 61 / 26 / 5 / 0.038 / - / - / NO
15 / 10 / 29 / 12.46 / 301 / 250 / 478 / 0 / - / - / YES / Intracellular
16 / 33 / 6 / 12.41 / 16 / 11 / 3 / 0.154 / 2 / 14 / NO
17 / 5 / 34 / 12.26 / 450 / 281 / 605 / 0 / - / - / NO
18 / 9 / 30 / 12.07 / 90 / 72 / 38 / 0.002 / - / - / NO
19 / 9 / 30 / 12.00 / 266 / 239 / 323 / 0 / - / - / NO
20 / 21 / 18 / 11.63 / 75 / 67 / 57 / 0.003 / 10 / 65 / YES / ECM
21 / 33 / 6 / 11.42 / 70 / 38 / 6 / 0.012 / 9 / 61 / NO
22 / 6 / 33 / 11.2 / 79 / 56 / 27 / 0.009 / 60 / 19 / NO
23 / 9 / 30 / 10.95 / 106 / 79 / 56 / 0.003 / 77 / 29 / YES / ECM
24 / 6 / 33 / 9.493 / 31 / 16 / 5 / 0.062 / 29 / 2 / NO

a Binary partitions obtained by analysis of 39 advanced EOC by automated class discovery with ISIS software (von Heydebreck et al, 2001).