Supplementary Data

Overall Survival CCNE1 Progression Free Survival CCNE1

Overall Survival MCM5 Progression Free Survival MCM5

S-Fig. 1 Supplemental material

Analysis of gene expression data .

Procedure 1

Image addressing, segmentation and flagging

In the first one, we carried out image addressing and segmentation with the CSIRO Spot software

(CSIRO Australia, http://spot.cmis.csiro.au/spot/), and data analysis with the “R” software

(http://cran.at.r-project.org/). Each spot was found using the array grid and the “seeded growing region algorithm” over the image. For each fluorescent image, the average pixel intensity within each spot was determined, and a local background was computed equal to the “morphX.close.open” value (where X is either R for Red or G for Green), measured as reported (ref ). Spots deemed unsuitable for accurate quantification because of array artefacts were manually flagged to be identified and in case excluded from further analysis.

Intensity and log ratio computation

For each found array dataset two distinct classes of log2 ratio have been computed. The first

(MA=(FR– BGR)/(FG– BGG)) was the classic log ratio of the two net fluorescence signal (F-BG) determined by subtraction of the local background (BG) from the spot median intensity (F). The second (MB=(FR/BGR)/(FG/BGG)) instead was the log ratio of the two relative intensities (F/BG) of each channel fluorescence (F) of spot in respect to its own local background (BG). The log2 geometric mean intensities and log ratios of both classes were arbitrarily set to 0 if a spot had not a net fluorescence signal greater than 0 in both channels (i. e. is a unreliable spot).

LOWESS normalisation

Both kind of M-values were normalised and corrected from possible biases using the LOWESS

(ref) smoothing functions algorithm, in a MA vs. A plot for the first (ref) and in a MB vs. A plot for the other. While markers, controls, unreliable and flagged spots were excluded from LOWESS computation, all M-values were normalized by LOWESS assessment subtraction. The R lowess function in stats package was used, the best f span smoothing parameter of each interpolation curve has been found by iterating the LOWESS algorithm and by visual inspection of smoothed functions, while the delta parameter was set to 0.01 times the range of the intensity in each scatter plot and iter iteration parameter was 1000.

Z-score standardization and data quantization

In order to evaluate which genes have probably modified their expression level an intensity- dependent Z-score (ref) calculation has been performed for each microarray data set and log ratio class (MA and MB). The Z-scores were determined separating the positive normalized log ratios from the negative ones, in order to account for the data distribution asymmetry in the standard deviation evaluation. Again spot were selected not to be markers, controls, unreliable or flagged. On positive (or negative) selected data was applied a reflection transformation over the M=0 axis and resulting negative (or positive) data were merged to selected ones in a new symmetrical data set. On the two resulting data sets intensity dependent standard deviations were computed. Then the whole original data set was standardized using both functions calculated for each log ratio class (MA and

MB). For markers, controls, unreliable or flagged spots Z-score was set to 0, otherwise was taken the Z-score from the data set (positive or negative) corresponding to its original value (positive or negative). Z-score signs were inverted in swap data sets and technical replicates were averaged by geometric mean for each patient. When the two technical replicates of a sample were discordant then mean Z-score for the spot was set to 0. Then geometric mean of MA and MB was calculated in each patient in order to have a unique estimate of spots expression. Again if MA and MB differed in signs average spot Z-score was defined to be 0. Before undergoing clustering phase, one more step was done: a new discrete data set was built from the original one as described in table S1. Both data sets were used in following steps of analysis.

Unsupervised filtering and clustering A unsupervised analysis was done to find gene signatures correlating to patients clinical data. First a filter was applied to the 13496 genes of in the platform (marker and controls were excluded). Only

3294 continuous genes and 3474 from the discrete data set having |Zeta Score|=>2 in at least 4 out of 68 Stage I patients were selected. To both sets of genes was applied a divisional clustering algorithm based on neural nets: SOM (Self Organizing Maps). A 11 rows by 6 columns starting grid with hexagonal symmetry was used. It was initialized by projecting it onto the plane of the first two

Principal Components calculated for the genes to be clustered. The gene clustering was implemented in R.

An agglomerative hierarchical clustering on both genes and patients was used to define gene signatures associated to groups of samples. The Cluster3 program (ref) was used with Average

Linkage clustering rule and Euclidean distance settings. The algorithm was applied for each cluster in the set of 132 found (66 in quantized data set and 66 for the continuous one).

For each cluster in one of the two data set (continuous or discrete data) were found corresponding clusters in the other (discrete or continuous respectively) that share most genes. In that way a kind of “homology map” for the two data set has been obtained. Each cluster was visually inspected to find samples group (defined by gene signatures and sample tree branches) that can be confirmed in its homologous clusters. 74 redundant groups were found.

Procedure 2

Image analysis and data transformation

Images were analysed using the GenePix Pro software (Axon Instruments) version 3.0. GenePix

Result files (GPR) and the related scan images, together with the Agilent cDNA pattern file and information on the hybridised samples and labelling signs (positive for normal labelling and negative for dye-swap labelling), were then loaded into the Rosetta Resolver SE software. The

Axon GenePix error model was applied to transform data and then duplicate ratio profiles were combined to obtain one ratio experiment for each patient. Log ratios, log ratio errors and p-values were thus available for the non flagged sequences of each ratio experiment.

Unsupervised filtering and clustering

As in procedure 1, 13496 genes (marker and controls were excluded) were subjected to filtering.

Only sequences with at most 20% of missing values, vector standard deviation of at least 0.4 and p- value less than 0.01 in at least 7 out of 68 stage I patients (∼10%) were submitted to “weighted-by- error” self-organizing-maps on a 6x6 grid. 1116 sequences passed the filter and the 36 nodes obtained after SOM were subjected to weighted agglomerative hierarchical clustering, using cosine correlation as metric for sequences, Euclidean distance for patients and average link as linkage method. Successively, the patient dendrograms obtained for each cluster were split into two (or sometimes three) groups based on the form of the tree branches or on signatures. So 34 patient groups were selected to be tested for statistical correlation to available clinical parameters.

Metaclustering

In order to identify identical and complementary groups and hence reducing their number a

“metacluster analysis” was done. Presence and absence information (codified as 1 and 0) of each patient in each group was reported in a table and that was the input for a hierarchical clustering

(Euclidean distance and Complete linkage rule). So groups number was reduced from 108 (i. e.

74+34) to 94 not redundant clusters which were tested for statistical correlation to available clinical parameters.

Genebank Gene Name Sense Antisense Ta (°C) Acc. Number ADAMTS9 BM469961 GTGACACCTCAGAACACAAA GCTAAGCCGCTGTTTAATGC 60 CDH9 NM_016279 GTCTGGAGTCGGTACATCTG GGCTGTCCTTGCAATATGCT 60 CTH NM_001902 GTGTATGGAGGTACAAACAG CAGCCTTCAATGTCAATCACC 60 GRB14 XM_001131281 CGCTTGGAGGAAAAAAGGAT GCTGGGACCGGTGGATAG 60 IGFBP1 NM_000596 TCCCCATGCTGCAGAGGCAGGGAG AGAGCCTTCGAGCCATCATA 60 NDRG1 AL550073 AACAGTTTGGGCTGAAAAGC ATAAGGACAAGGCCCTCCAC 60

CYP2S1 NM_030622 GATGGCCATGGGGTTTTCTT ATCAGCTCCTCGCCTTCTC 60

CYP4F12 AK091995 ATTGTCAGGAGAGGCCCAGT CCCGTCATGGGAGAGGTAAT 60

MGST2 BY796194 CGGGCACAACAAAACTGTGT CCAGACCCAGACAAGTAGCA 60

NQO1 BU167840 GATATTCCAGTTCCCCCTGC CGGAAGGGTCCTTTGTCATA 60

NR1I2 AK122990 AGACACTGCAGGTGGCTTCCA TCTGGGGAGAAGAGGGAGAT 60

TNF NM_000594 AGAGGGAAGAGTTCCCCAGG CAGCTTGAGGGTTTGCTACA 60

CCNE1 M74093 TGTACTGAGCTGGGCAAATA ACACACCTCCATTAACCAATCC 60 MCM5 NM_006739 ACTGCGACAGGTACCTGTGTG ACACGGATGTAGGAGCTTCG 60

Cyclophillin A NM_021130 GCGTCTCCTTTGAGCTGTTT GTCTTGGCAGTGCAGATGAA 60 Actin B XR_019170 CACCCACACTGTGCCCATCTA CAGCGGAACCGCTCATTGCCAATGG 60

S-Table I List of primers pair sequences, Genebank accession number and annealing temperature (Ta) of genes analysed by real time RT-PCR

A) PFS

Hazard Lower Upper Variable p ratio 95% CI 95% CI Age 1.03 0.99 1.08 0.186 Sub-stage 2.73 1.08 6.86 0.033 Clear cells 1.96 0.73 5.24 0.180 Grading 2.91 1.25 6.76 0.013

B) OS

Hazard Lower Upper Variable p ratio 95% CI 95% CI Age 1.05 0.99 1.11 0.139 Sub-stage 1.95 0.78 4.87 0.151 Clear cells 4.19 1.34 13.06 0.013 Grading 2.56 0.98 6.63 0.054

S-Table II. Relationships between PFS (A) or OS (B) and age, sub-stage, clear-cell histotype and grading by univariate Cox proportional hazard models. p is the p-value referred to Welch t-test (p<0.05). 1) Type of cluster: frequencies and percentages of patients in each hierarchical cluster

CLUSTER N. pts N. pts N. pts N. pts in Total in Total in Total in Total cluster pts % cluster pts % cluster pts % cluster pts %

A 19 68 0.2794 Y 30 68 0.4412 AW 12 68 0.1765 BU 20 68 0.2941

B 13 68 0.1912 Z 37 68 0.5441 AX 11 68 0.1618 BV 31 68 0.4559

C 13 68 0.1912 AA 5 68 0.0735 AY 14 68 0.2059 BW 12 68 0.1765

D 6 68 0.0882 AB 34 68 0.5000 AZ 9 68 0.1324 BX 14 68 0.2059

E 28 68 0.4118 AC 35 68 0.5147 BA 28 68 0.4118 BY 12 68 0.1765

F 16 68 0.2353 AD 50 68 0.7353 BB 31 68 0.4559 BZ 19 68 0.2794

G 3 68 0.0441 AE 6 68 0.0882 BC 12 68 0.1765 CA 7 68 0.1029

H 21 68 0.3088 AF 9 68 0.1324 BD 11 68 0.1618 BC 12 68 0.1765

I 25 68 0.3676 AG 5 68 0.0735 BE 13 68 0.1912 CC 35 68 0.5147

J 10 68 0.1471 AH 56 68 0.8235 BF 5 68 0.0735 CD 10 68 0.1471

K 13 68 0.1912 AI 35 68 0.5147 BG 19 68 0.2794 CE 13 68 0.1912

L 35 68 0.5147 AJ 24 68 0.3529 BH 32 68 0.4706 CF 14 68 0.2059

M 36 68 0.5294 AK 8 68 0.1176 BI 17 68 0.2500 CH 19 68 0.2794

Nn 51 68 0.7500 AL 6 68 0.0882 BJ 11 68 0.1618 CI 27 68 0.3971

O 7 68 0.1029 AM 17 68 0.2500 BK 13 68 0.1912 CJ 22 68 0.3235

P 10 68 0.1471 AN 8 68 0.1176 BL 10 68 0.1471 CL 33 68 0.4853

Q 6 68 0.0882 AO 47 68 0.6912 BM 8 68 0.1176 CN 11 68 0.1618

R 11 68 0.1618 AP 13 68 0.1912 BN 10 68 0.1471 CO 21 68 0.3088

S 29 68 0.4265 AQ 52 68 0.7647 BO 22 68 0.3235 CP 34 68 0.5000

T 13 68 0.1912 AR 4 68 0.0588 BP 36 68 0.5294 CQ 13 68 0.1912

U 25 68 0.3676 AS 20 68 0.2941 BQ 32 68 0.4706 CR 23 68 0.3382

V 7 68 0.1029 AT 11 68 0.1618 BR 16 68 0.2353 CS 28 67 0.4179

W 16 68 0.2353 AU 31 68 0.4559 BS 6 68 0.0882 X 33 68 0.4853 AV 6 68 0.0882 BT 4 68 0.0588

2) correlation among clusters and tumor characteristics: frequencies and percentages of patients in each cluster divided by substage (a, b, c) SUBSTAGE SUBSTAGE SUBSTAGE a b c a b c a b c N. N. N. pts N. N. N. pts pts pts pts pts in pts pts pts in To in in in in clu in in in clus t clust Tot clust Tot clust Tot clust Tot ste Tot clust Tot clust Tot clust Tot ter pts % er pts % er pts % er pts % er pts % r pts % er pts % er pts % er pts % A 5 17 .294 1 4 .25 13 47 .276 AG 2 17 .117 0 4 .00 3 47 .063 BM 3 17 .176 0 4 .0 5 47 .106 B 4 17 .235 0 4 .00 9 47 .191 AH 14 17 .823 4 4 1.0 38 47 .808 BN 4 17 .235 0 4 .0 6 47 .127 C 3 17 .176 0 4 .00 10 47 .212 AI 9 17 .529 1 4 .25 25 47 .531 BO 4 17 .235 2 4 .5 16 47 .340 D 2 17 .117 0 4 .00 4 47 .085 AJ 5 17 .294 2 4 .50 17 47 .361 BP 9 17 .529 2 4 .5 25 47 .531 E 7 17 .411 1 4 .25 20 47 .425 AK 2 17 .117 0 4 .00 6 47 .127 BQ 8 17 .470 2 4 .5 22 47 .468 F 3 17 .176 2 4 .50 11 47 .234 AL 3 17 .176 0 4 .00 3 47 .063 BR 5 17 .294 1 4 .2 10 47 .212 G 1 17 .058 0 4 .00 2 47 .042 AM 5 17 .294 0 4 .00 12 47 .255 BS 3 17 .176 0 4 .0 3 47 .063 H 6 17 .352 1 4 .25 14 47 .297 AN 3 17 .176 0 4 .00 5 47 .106 BT 3 17 .176 0 4 .0 1 47 .021 I 8 17 .470 1 4 .25 16 47 .340 AO 13 17 .764 1 4 .25 33 47 .702 BU 6 17 .352 2 4 .5 12 47 .255 J 3 17 .176 0 4 .00 7 47 .148 AP 3 17 .176 2 4 .50 8 47 .170 BV 9 17 .529 1 4 .3 21 47 .446 K 4 17 .235 0 4 .00 9 47 .191 AQ 13 17 .764 2 4 .50 37 47 .787 BW 4 17 .235 0 4 .0 8 47 .170 L 9 17 .529 1 4 .25 25 47 .531 AR 2 17 .117 1 4 .25 1 47 .021 BX 4 17 .235 0 4 .0 10 47 .212 M 9 17 .529 1 4 .25 26 47 .553 AS 5 17 .294 1 4 .25 14 47 .297 BY 4 17 .235 0 4 .0 8 47 .170 Nn 13 17 .764 2 4 .50 36 47 .766 AT 3 17 .176 2 4 .50 6 47 .127 BZ 6 17 .352 0 4 .0 13 47 .276 O 2 17 .117 0 4 .00 5 47 .106 AU 4 17 .235 2 4 .50 25 47 .531 CA 2 17 .117 0 4 .0 5 47 .106 P 1 17 .058 0 4 .00 9 47 .191 AV 1 17 .058 0 4 .00 5 47 .106 BC 3 17 .176 0 4 .0 9 47 .191 Q 2 17 .117 0 4 .00 4 47 .085 AW 2 17 .117 1 4 .25 9 47 .191 CC 12 17 .705 2 4 .5 21 47 .446 R 3 17 .176 0 4 .00 8 47 .170 AX 3 17 .176 1 4 .25 7 47 .148 CD 3 17 .176 0 4 .0 7 47 .148 S 5 17 .294 1 4 .25 23 47 .489 AY 3 17 .176 0 4 .00 11 47 .234 CE 5 17 .294 0 4 .0 8 47 .170 T 3 17 .176 1 4 .25 9 47 .191 AZ 3 17 .176 1 4 .25 5 47 .106 CF 2 17 .117 0 4 .0 12 47 .255 U 5 17 .294 1 4 .25 19 47 .404 BA 7 17 .411 1 4 .25 20 47 .425 CH 8 17 .47 0 4 .0 11 47 .234 V 0 17 .000 1 4 .25 6 47 .127 BB 8 17 .470 2 4 .50 21 47 .446 CI 3 17 .176 2 4 .5 22 47 .468 W 6 17 .352 1 4 .25 9 47 .191 BC 3 17 .176 0 4 .00 9 47 .191 CJ 6 17 .352 2 4 .0 14 47 .297 X 9 17 .529 2 4 .50 22 47 .468 BD 2 17 .117 1 4 .25 8 47 .170 CL 8 17 .470 2 4 .5 23 47 .489 Y 8 17 .470 1 4 .25 21 47 .446 BE 4 17 .235 0 4 .00 9 47 .191 CN 3 17 .176 0 4 .0 8 47 .170 Z 11 17 .647 1 4 .25 25 47 .531 BF 1 17 .058 0 4 .00 4 47 .085 CO 2 17 .117 2 4 .5 17 47 .361 AA 2 17 .117 0 4 .00 3 47 .063 BG 5 17 .294 0 4 .00 14 47 .297 CP 12 17 .705 1 4 .3 21 47 .446 AB 7 17 .411 1 4 .25 26 47 .553 BH 10 17 .588 1 4 .25 21 47 .446 CQ 3 17 .176 1 4 .3 9 47 .191 AC 9 17 .529 1 4 .25 25 47 .531 BI 2 17 .117 2 4 .50 13 47 .276 CR 7 17 .411 1 4 .3 15 47 .319 AD 14 17 .823 1 4 .25 35 47 .744 BJ 4 17 .235 1 4 .25 6 47 .127 CS 8 17 .470 2 4 .5 18 46 .391 AE 2 17 .117 0 4 .00 4 47 .085 BK 4 17 .235 0 4 .00 9 47 .191 AF 3 17 .176 0 4 .00 6 47 .127 BL 3 17 .176 1 4 .25 6 47 .127

3) correlation among clusters and tumor characteristics (Histotype): frequencies and percentages of patients in each cluster divided by histotype (serous, mucinous, endometrioid, clear cells and undefined)

Histotype serous mucinous endometr Undiff clearcells

pts in tot pts in tot pts in tot pts in tot pts in tot cluster pts % cluster pts % cluster pts % cluster pts % cluster pts %

A 5 24 .2083 5 10 .5000 4 17 .2353 0 1 .0000 5 16 .3125

B 2 24 .0833 0 10 .0000 9 17 .5294 0 1 .0000 2 16 .1250

C 1 24 .0417 0 10 .0000 0 17 .0000 0 1 .0000 12 16 .7500

D 0 24 .0000 6 10 .6000 0 17 .0000 0 1 .0000 0 16 .0000

E 7 24 .2917 8 10 .8000 4 17 .2353 1 1 1.000 8 16 .5000

F 8 24 .3333 1 10 .1000 3 17 .1765 1 1 1.000 3 16 .1875

G 1 24 .0417 0 10 .0000 1 17 .0588 0 1 .0000 1 16 .0625

H 5 24 .2083 3 10 .3000 4 17 .2353 0 1 .0000 9 16 .5625

I 8 24 .3333 2 10 .2000 12 17 .7059 1 1 1.000 2 16 .1250

J 2 24 .0833 0 10 .0000 7 17 .4118 0 1 .0000 1 16 .0625

K 2 24 .0833 1 10 .1000 8 17 .4706 0 1 .0000 2 16 .1250

L 9 24 .3750 5 10 .5000 10 17 .5882 1 1 1.000 10 16 .6250

M 9 24 .3750 5 10 .5000 10 17 .5882 1 1 1.000 11 16 .6875

Nn 16 24 .6667 7 10 .7000 13 17 .7647 1 1 1.000 14 16 .8750

O 0 24 .0000 6 10 .6000 1 17 .0588 0 1 .0000 0 16 .0000

P 1 24 .0417 1 10 .1000 2 17 .1176 0 1 .0000 6 16 .3750

Q 0 24 .0000 6 10 .6000 0 17 .0000 0 1 .0000 0 16 .0000

R 1 24 .0417 0 10 .0000 0 17 .0000 0 1 .0000 10 16 .6250 Histotype serous mucinous endometr Undiff clearcells

pts in tot pts in tot pts in tot pts in tot pts in tot cluster pts % cluster pts % cluster pts % cluster pts % cluster pts %

S 9 24 .3750 4 10 .4000 9 17 .5294 1 1 1.000 6 16 .3750

T 4 24 .1667 3 10 .3000 2 17 .1176 1 1 1.000 3 16 .1875

U 8 24 .3333 4 10 .4000 7 17 .4118 1 1 1.000 5 16 .3125

V 2 24 .0833 1 10 .1000 2 17 .1176 0 1 .0000 2 16 .1250

W 7 24 .2917 2 10 .2000 4 17 .2353 0 1 .0000 3 16 .1875

X 14 24 .5833 4 10 .4000 9 17 .5294 0 1 .0000 6 16 .3750

Y 12 24 .5000 2 10 .2000 8 17 .4706 0 1 .0000 8 16 .5000

Z 12 24 .5000 7 10 .7000 15 17 .8824 1 1 1.000 2 16 .1250

AA 0 24 .0000 5 10 .5000 0 17 .0000 0 1 .0000 0 16 .0000

AB 9 24 .3750 3 10 .3000 10 17 .5882 1 1 1.000 11 16 .6875

AC 9 24 .3750 5 10 .5000 10 17 .5882 1 1 1.000 10 16 .6250

AD 15 24 .6250 5 10 .5000 14 17 .8235 1 1 1.000 15 16 .9375

AE 2 24 .0833 0 10 .0000 4 17 .2353 0 1 .0000 0 16 .0000

AF 2 24 .0833 0 10 .0000 6 17 .3529 0 1 .0000 1 16 .0625

AG 1 24 .0417 0 10 .0000 4 17 .2353 0 1 .0000 0 16 .0000

AH 22 24 .9167 9 10 .9000 9 17 .5294 1 1 1.000 15 16 .9375

AI 9 24 .3750 5 10 .5000 9 17 .5294 1 1 1.000 11 16 .6875

AJ 9 24 .3750 7 10 .7000 2 17 .1176 0 1 .0000 6 16 .3750

AK 2 24 .0833 1 10 .1000 5 17 .2941 0 1 .0000 0 16 .0000

AL 1 24 .0417 0 10 .0000 5 17 .2941 0 1 .0000 0 16 .0000

AM 1 24 .0417 4 10 .4000 0 17 .0000 0 1 .0000 12 16 .7500 Histotype serous mucinous endometr Undiff clearcells

pts in tot pts in tot pts in tot pts in tot pts in tot cluster pts % cluster pts % cluster pts % cluster pts % cluster pts %

AN 1 24 .0417 0 10 .0000 6 17 .3529 0 1 .0000 1 16 .0625

AO 13 24 .5417 5 10 .5000 13 17 .7647 1 1 1.000 15 16 .9375

AP 7 24 .2917 1 10 .1000 2 17 .1176 1 1 1.000 2 16 .1250

AQ 17 24 .7083 8 10 .8000 12 17 .7059 0 1 .0000 15 16 .9375

AR 2 24 .0833 0 10 .0000 1 17 .0588 0 1 .0000 1 16 .0625

AS 5 24 .2083 3 10 .3000 4 17 .2353 0 1 .0000 8 16 .5000

AT 6 24 .2500 0 10 .0000 2 17 .1176 1 1 1.000 2 16 .1250

AU 10 24 .4167 3 10 .3000 9 17 .5294 1 1 1.000 8 16 .5000

AV 0 24 .0000 0 10 .0000 0 17 .0000 0 1 .0000 6 16 .3750

AW 4 24 .1667 2 10 .2000 3 17 .1765 1 1 1.000 2 16 .1250

AX 6 24 .2500 1 10 .1000 4 17 .2353 0 1 .0000 0 16 .0000

AY 1 24 .0417 0 10 .0000 0 17 .0000 0 1 .0000 13 16 .8125

AZ 3 24 .1250 2 10 .2000 3 17 .1765 0 1 .0000 1 16 .0625

BA 12 24 .5000 4 10 .4000 7 17 .4118 0 1 .0000 5 16 .3125

BB 14 24 .5833 5 10 .5000 7 17 .4118 0 1 .0000 5 16 .3125

BC 2 24 .0833 0 10 .0000 8 17 .4706 0 1 .0000 2 16 .1250

BD 8 24 .3333 0 10 .0000 3 17 .1765 0 1 .0000 0 16 .0000

BE 3 24 .1250 4 10 .4000 4 17 .2353 0 1 .0000 2 16 .1250

BF 0 24 .0000 5 10 .5000 0 17 .0000 0 1 .0000 0 16 .0000

BG 6 24 .2500 1 10 .1000 5 17 .2941 0 1 .0000 7 16 .4375

BH 12 24 .5000 4 10 .4000 10 17 .5882 0 1 .0000 6 16 .3750 Histotype serous mucinous endometr Undiff clearcells

pts in tot pts in tot pts in tot pts in tot pts in tot cluster pts % cluster pts % cluster pts % cluster pts % cluster pts %

BI 7 24 .2917 1 10 .1000 3 17 .1765 1 1 1.000 5 16 .3125

BJ 6 24 .2500 1 10 .1000 2 17 .1176 0 1 .0000 2 16 .1250

BK 2 24 .0833 7 10 .7000 3 17 .1765 0 1 .0000 1 16 .0625

BL 8 24 .3333 0 10 .0000 1 17 .0588 0 1 .0000 1 16 .0625

BM 5 24 .2083 1 10 .1000 1 17 .0588 0 1 .0000 1 16 .0625

BN 5 24 .2083 2 10 .2000 2 17 .1176 0 1 .0000 1 16 .0625

BO 9 24 .3750 3 10 .3000 6 17 .3529 0 1 .0000 4 16 .2500

BP 10 24 .4167 5 10 .5000 9 17 .5294 1 1 1.000 11 16 .6875

BQ 14 24 .5833 5 10 .5000 8 17 .4706 0 1 .0000 5 16 .3125

BR 3 24 .1250 3 10 .3000 2 17 .1176 0 1 .0000 8 16 .5000

BS 3 24 .1250 1 10 .1000 1 17 .0588 0 1 .0000 1 16 .0625

BT 2 24 .0833 1 10 .1000 1 17 .0588 0 1 .0000 0 16 .0000

BU 10 24 .4167 4 10 .4000 2 17 .1176 0 1 .0000 4 16 .2500

BV 7 24 .2917 6 10 .6000 8 17 .4706 0 1 .0000 10 16 .6250

BW 6 24 .2500 2 10 .2000 2 17 .1176 0 1 .0000 2 16 .1250

BX 1 24 .0417 6 10 .6000 0 17 .0000 0 1 .0000 7 16 .4375

BY 3 24 .1250 7 10 .7000 2 17 .1176 0 1 .0000 0 16 .0000

BZ 5 24 .2083 2 10 .2000 11 17 .6471 0 1 .0000 1 16 .0625

CA 0 24 .0000 6 10 .6000 0 17 .0000 0 1 .0000 1 16 .0625

BC 2 24 .0833 0 10 .0000 8 17 .4706 0 1 .0000 2 16 .1250

CC 10 24 .4167 5 10 .5000 10 17 .5882 0 1 .0000 10 16 .6250 Histotype serous mucinous endometr Undiff clearcells

pts in tot pts in tot pts in tot pts in tot pts in tot cluster pts % cluster pts % cluster pts % cluster pts % cluster pts %

CD 1 24 .0417 0 10 .0000 0 17 .0000 0 1 .0000 9 16 .5625

CE 2 24 .0833 1 10 .1000 1 17 .0588 0 1 .0000 9 16 .5625

CF 4 24 .1667 1 10 .1000 4 17 .2353 1 1 1.000 4 16 .2500

CH 9 24 .3750 1 10 .1000 5 17 .2941 0 1 .0000 4 16 .2500

CI 7 24 .2917 5 10 .5000 5 17 .2941 1 1 1.000 9 16 .5625

CJ 8 24 .3333 4 10 .4000 7 17 .4118 0 1 .0000 3 16 .1875

CL 15 24 .6250 5 10 .5000 8 17 .4706 0 1 .0000 5 16 .3125

CN 2 24 .0833 0 10 .0000 8 17 .4706 0 1 .0000 1 16 .0625

CO 7 24 .2917 2 10 .2000 6 17 .3529 0 1 .0000 6 16 .3750

CP 11 24 .4583 7 10 .7000 9 17 .5294 0 1 .0000 7 16 .4375

CQ 6 24 .2500 1 10 .1000 2 17 .1176 1 1 1.000 3 16 .1875

CR 5 24 .2083 6 10 .6000 3 17 .1765 0 1 .0000 9 16 .5625

CS 14 24 .5833 2 10 .2000 8 17 .4706 . 0 . 4 16 .2500

4) correlation among clusters and tumor characteristics (Grading): frequencies and percentages of patients in each cluster divided by grading (1, 2, 3)

Grading 1 2 3

pts in toT pts in tot pts in tot cluster pts % cluster pts % cluster pts %

A 5 13 .3846 5 19 .2632 9 36 .2500

B 2 13 .1538 6 19 .3158 5 36 .1389

C 0 13 .0000 1 19 .0526 12 36 .3333

D 4 13 .3077 2 19 .1053 0 36 .0000

E 7 13 .5385 8 19 .4211 13 36 .3611

F 1 13 .0769 2 19 .1053 13 36 .3611

G 0 13 .0000 1 19 .0526 2 36 .0556

H 4 13 .3077 5 19 .2632 12 36 .3333

I 8 13 .6154 9 19 .4737 8 36 .2222

J 2 13 .1538 5 19 .2632 3 36 .0833

K 3 13 .2308 5 19 .2632 5 36 .1389

L 7 13 .5385 6 19 .3158 22 36 .6111

M 7 13 .5385 6 19 .3158 23 36 .6389

Nn 11 13 .8462 11 19 .5789 29 36 .8056

O 4 13 .3077 3 19 .1579 0 36 .0000

P 2 13 .1538 0 19 .0000 8 36 .2222

Q 4 13 .3077 2 19 .1053 0 36 .0000

R 0 13 .0000 1 19 .0526 10 36 .2778

S 7 13 .5385 6 19 .3158 16 36 .4444 Grading 1 2 3

pts in toT pts in tot pts in tot cluster pts % cluster pts % cluster pts %

T 3 13 .2308 1 19 .0526 9 36 .2500

U 6 13 .4615 4 19 .2105 15 36 .4167

V 2 13 .1538 0 19 .0000 5 36 .1389

W 4 13 .3077 4 19 .2105 8 36 .2222

X 7 13 .5385 13 19 .6842 13 36 .3611

Y 6 13 .4615 6 19 .3158 18 36 .5000

Z 11 13 .8462 14 19 .7368 12 36 .3333

AA 3 13 .2308 2 19 .1053 0 36 .0000

AB 6 13 .4615 5 19 .2632 23 36 .6389

AC 7 13 .5385 6 19 .3158 22 36 .6111

AD 9 13 .6923 12 19 .6316 29 36 .8056

AE 1 13 .0769 4 19 .2105 1 36 .0278

AF 2 13 .1538 4 19 .2105 3 36 .0833

AG 1 13 .0769 3 19 .1579 1 36 .0278

AH 10 13 .7692 13 19 .6842 33 36 .9167

AI 7 13 .5385 5 19 .2632 23 36 .6389

AJ 4 13 .3077 7 19 .3684 13 36 .3611

AK 3 13 .2308 4 19 .2105 1 36 .0278

AL 2 13 .1538 4 19 .2105 0 36 .0000

AM 2 13 .1538 3 19 .1579 12 36 .3333

AN 1 13 .0769 5 19 .2632 2 36 .0556 Grading 1 2 3

pts in toT pts in tot pts in tot cluster pts % cluster pts % cluster pts %

AO 9 13 .6923 10 19 .5263 28 36 .7778

AP 1 13 .0769 2 19 .1053 10 36 .2778

AQ 12 13 .9231 16 19 .8421 24 36 .6667

AR 0 13 .0000 1 19 .0526 3 36 .0833

AS 4 13 .3077 5 19 .2632 11 36 .3056

AT 0 13 .0000 1 19 .0526 10 36 .2778

AU 6 13 .4615 6 19 .3158 19 36 .5278

AV 0 13 .0000 0 19 .0000 6 36 .1667

AW 3 13 .2308 3 19 .1579 6 36 .1667

AX 3 13 .2308 2 19 .1053 6 36 .1667

AY 0 13 .0000 1 19 .0526 13 36 .3611

AZ 2 13 .1538 4 19 .2105 3 36 .0833

BA 5 13 .3846 13 19 .6842 10 36 .2778

BB 6 13 .4615 13 19 .6842 12 36 .3333

BC 2 13 .1538 6 19 .3158 4 36 .1111

BD 2 13 .1538 5 19 .2632 4 36 .1111

BE 4 13 .3077 6 19 .3158 3 36 .0833

BF 4 13 .3077 1 19 .0526 0 36 .0000

BG 3 13 .2308 5 19 .2632 11 36 .3056

BH 7 13 .5385 6 19 .3158 19 36 .5278

BI 1 13 .0769 8 19 .4211 8 36 .2222 Grading 1 2 3

pts in toT pts in tot pts in tot cluster pts % cluster pts % cluster pts %

BJ 3 13 .2308 1 19 .0526 7 36 .1944

BK 7 13 .5385 5 19 .2632 1 36 .0278

BL 1 13 .0769 3 19 .1579 6 36 .1667

BM 2 13 .1538 2 19 .1053 4 36 .1111

BN 3 13 .2308 4 19 .2105 3 36 .0833

BO 4 13 .3077 9 19 .4737 9 36 .2500

BP 6 13 .4615 6 19 .3158 24 36 .6667

BQ 7 13 .5385 13 19 .6842 12 36 .3333

BR 2 13 .1538 4 19 .2105 10 36 .2778

BS 2 13 .1538 2 19 .1053 2 36 .0556

BT 2 13 .1538 2 19 .1053 0 36 .0000

BU 5 13 .3846 6 19 .3158 9 36 .2500

BV 6 13 .4615 10 19 .5263 15 36 .4167

BW 3 13 .2308 2 19 .1053 7 36 .1944

BX 3 13 .2308 3 19 .1579 8 36 .2222

BY 6 13 .4615 6 19 .3158 0 36 .0000

BZ 6 13 .4615 8 19 .4211 5 36 .1389

CA 4 13 .3077 2 19 .1053 1 36 .0278

BC 2 13 .1538 6 19 .3158 4 36 .1111

CC 8 13 .6154 12 19 .6316 15 36 .4167

CD 0 13 .0000 1 19 .0526 9 36 .2500 Grading 1 2 3

pts in toT pts in tot pts in tot cluster pts % cluster pts % cluster pts %

CE 2 13 .1538 2 19 .1053 9 36 .2500

CF 3 13 .2308 2 19 .1053 9 36 .2500

CH 4 13 .3077 5 19 .2632 10 36 .2778

CI 3 13 .2308 5 19 .2632 19 36 .5278

CJ 6 13 .4615 9 19 .4737 7 36 .1944

CL 8 13 .6154 13 19 .6842 12 36 .3333

CN 2 13 .1538 6 19 .3158 3 36 .0833

CO 4 13 .3077 6 19 .3158 11 36 .3056

CP 8 13 .6154 11 19 .5789 15 36 .4167

CQ 1 13 .0769 2 19 .1053 10 36 .2778

CR 5 13 .3846 7 19 .3684 11 36 .3056

CS 6 13 .4615 11 19 .5789 11 35 .3143

Median age at Chemotherapeutic 52 diagnosis Regimens Histopathological N° of Percentage CBDCA NT parameters cases (%) Stage I 21 a 13 61.9 5 9 b 1 4.76 1 c 7 33.34 4 2 Grade 1 8 30.09 1 7 2 7 33.34 4 3 3 6 28.57 5 1 Histotype Serous 6 28.57 3 3 Mucinous 9 42.86 1 8 Endometroid 5 23.81 5 Clear cell 1 4.76 1

S-Table IV. Clinical parameters and chemotherapy regimens of the 21 EOC samples used as test set. CBDCA, carboplatin; NT, not treated. Genbank GO Biological Process Symbol Name Acc. N° Aldo-keto reductase family 1, member C2 (dihydrodiol dehydrogenase 2; bile acid BC040210 AKR1C2 binding protein; 3-alpha hydroxysteroid dehydrogenase, type III) Solute carrier family 6 (amino acid NM_007231 SLC6A14 transporter), member 14 Organic acid transport Aldo-keto reductase family 1, member C4 (chlordecone reductase; 3-alpha BC020744 AKR1C4 hydroxysteroid dehydrogenase, type I; dihydrodiol dehydrogenase 4) ADP-ribosylation-like factor 6 interacting AC092060 ARL6IP5 protein 5 Aldo-keto reductase family 1, member C2 (dihydrodiol dehydrogenase 2; bile acid BC040210 AKR1C2 binding protein; 3-alpha hydroxysteroid dehydrogenase, type III) Solute carrier family 6 (amino acid NM_007231 SLC6A14 transporter), member 14 Carboxylic acid transport Aldo-keto reductase family 1, member C4 (chlordecone reductase; 3-alpha BC020744 AKR1C4 hydroxysteroid dehydrogenase, type I; dihydrodiol dehydrogenase 4) ADP-ribosylation-like factor 6 interacting AC092060 ARL6IP5 protein 5 AB008193 LTB4R Leukotriene B4 receptor NM_004080 DGKB Diacylglycerol kinase, beta 90kDa AL121917 GNAS GNAS complex locus Growth hormone releasing hormone Second-messenger-mediated NM_000823 GHRHR receptor signaling Guanine nucleotide binding protein (G AK126708 GNAI2 protein), alpha inhibiting activity polypeptide 2 CA307692 ADORA2A Adenosine A2a receptor NM_145754 KIFC2 Kinesin family member C2 NM_000928 PLA2G1B Phospholipase A2, group IB (pancreas) AK023766 DNAH1 Dynein, axonemal, heavy polypeptide 1 NM_004240 TRIP10 Thyroid hormone receptor interactor 10 Cytoskeleton organization and BG286577 TUBB Tubulin, beta biogenesis NM_183419 RNF19 Ring finger protein 19 AF177198 TLN1 Talin 1 CF457414 SORBS1 Sorbin and SH3 domain containing 1 DB565402 LASP1 LIM and SH3 protein 1 CA314918 LMO7 LIM domain 7 Minichromosome maintenance complex DB353455 MCM5 component 5 Regulation of progression Ribosomal protein S6 kinase, 70kDa, NM_003952 RPS6KB2 through cell cycle polypeptide 2 AF195139 PNN Pinin, desmosome associated protein M74093, CCNE1 Cyclin E1 BG761079 AY561635 KLK10 Kallikrein-related peptidase 10 NM_006191 PA2G4 Proliferation-associated 2G4, 38kDa NM_138292 ATM Ataxia telangiectasia mutated Placental growth factor, vascular BC007255 PGF endothelial growth factor-related protein NM_004073 PLK3 Polo-like kinase 3 (Drosophila) NM_003377 VEGFB Vascular endothelial growth factor B DB509863 CCNL2 Cyclin L2 Basic helix-loop-helix domain containing, BC068292 BHLHB2 class B, 2 NM_002509 NKX2-2 NK2 homeobox 2 AF195139 PNN Pinin, desmosome associated protein NM_005169 PHOX2A Paired-like homeobox 2a M74093, CCNE1 BG761079 Cyclin E1 NM_006191 PA2G4 Proliferation-associated 2G4, 38kDa BC001562 NCOA4 Nuclear receptor coactivator 4 AJ492196 ZNF248 Zinc finger protein 248 CB269721 NRIP1 Nuclear receptor interacting protein 1 BC007333 ETV5 Ets variant gene 5 (ets-related molecule) B double prime 1, subunit of RNA NM_018429 BDP1 polymerase III transcription initiation factor IIIB AF317391 BCOR BCL6 co-repressor TAF7 RNA polymerase II, TATA box AF349038 TAF7 binding protein (TBP)-associated factor, 55kDa MYC-associated zinc finger protein Regulation of nucleobase, NM_002383 MAZ (purine-binding transcription factor) nucleoside, nucleotide and BC111408 ZNF276 Zinc finger protein 276 nucleic acid metabolism BE748366 PAX8 Paired box 8 BX385997 SQSTM1 Sequestosome 1 GCN5 general control of amino-acid DB552558 GCN5L2 synthesis 5-like 2 (yeast) DQ895028 SMAD4 SMAD family member 4 Nuclear receptor subfamily 4, group A, CD364918 NR4A1 member 1 Core-binding factor, runt domain, alpha AI584154 CBFA2T3 subunit 2; translocated to, 3 BC012070 ZBTB7B Zinc finger and BTB domain containing 7B CT004126 JUN Jun oncogene AL161658 INSM1 Insulinoma-associated 1 DB509863 CCNL2 Cyclin L2 SWI/SNF related, matrix associated, actin DB210960 SMARCA5 dependent regulator of chromatin, subfamily a, member 5 Minichromosome maintenance complex DB353455 MCM5 component 5 AAH07256 ZNF23 Zinc finger protein 23 (KOX 16) NM_005599 NHLH2 Nescient helix loop helix 2 ELK3, ETS-domain protein (SRF accessory NM_005230 ELK3 protein 2) NM_138292 ATM Ataxia telangiectasia mutated SWI/SNF related, matrix associated, actin DA493585 SMARCA1 dependent regulator of chromatin, subfamily a, member 1 Basic helix-loop-helix domain containing, BC068292 BHLHB2 class B, 2 NM_002509 NKX2-2 NK2 homeobox 2 AF195139 PNN Pinin, desmosome associated protein NM_005169 PHOX2A Paired-like homeobox 2a M74093, CCNE1 BG761079 Cyclin E1 NM_006191 PA2G4 Proliferation-associated 2G4, 38kDa BC001562 NCOA4 Nuclear receptor coactivator 4 AJ492196 ZNF248 Zinc finger protein 248 CB269721 NRIP1 Nuclear receptor interacting protein 1 BC007333 ETV5 Ets variant gene 5 (ets-related molecule) TATA box binding protein (TBP)- AB209594 TAF1C associated factor, RNA polymerase I, C, 110kDa B double prime 1, subunit of RNA NM_018429 BDP1 polymerase III transcription initiation factor IIIB AF317391 BCOR BCL6 co-repressor TAF7 RNA polymerase II, TATA box AF349038 TAF7 binding protein (TBP)-associated factor, 55kDa Transcription MYC-associated zinc finger protein NM_002383 MAZ (purine-binding transcription factor) BC111408 ZNF276 Zinc finger protein 276 BE748366 PAX8 Paired box 8 BX385997 SQSTM1 Sequestosome 1 GCN5 general control of amino-acid DB552558 GCN5L2 synthesis 5-like 2 (yeast) DQ895028 SMAD4 SMAD family member 4 Nuclear receptor subfamily 4, group A, CD364918 NR4A1 member 1 Core-binding factor, runt domain, alpha AI584154 CBFA2T3 subunit 2; translocated to, 3 BC012070 ZBTB7B Zinc finger and BTB domain containing 7B CT004126 JUN Jun oncogene AL161658 INSM1 Insulinoma-associated 1 DB509863 CCNL2 Cyclin L2 SWI/SNF related, matrix associated, actin DB210960 SMARCA5 dependent regulator of chromatin, subfamily a, member 5 Minichromosome maintenance complex DB353455 MCM5 component 5 AAH07256 ZNF23 Zinc finger protein 23 (KOX 16) NM_005599 NHLH2 Nescient helix loop helix 2 ELK3, ETS-domain protein (SRF accessory NM_005230 ELK3 protein 2) NM_138292 ATM Ataxia telangiectasia mutated SWI/SNF related, matrix associated, actin DA493585 SMARCA1 dependent regulator of chromatin, subfamily a, member 1

S-Table V. Significant GO Biological Process Ontologies resulted from David functional annotation enrichment analysis applied to the 188 genes defining the differences between relapsed and not relapsed samples. For each ontology group, Genebank accession number, gene symbol and name are reported. Red color refers to genes up-regulated in relapsers compared to non-relapsers. Green color refers to down- regulated genes in relapsers compared to non-relapsers. A) PFS

Hazard Lower Upper Variable p ratio 95% CI 95% CI CCNE1 1.417 1.1 1.826 0.0069 MCM5 1.797 0.943 3.425 0.075 Grading 3.091 1.313 7.276 0.0098 Histotype 0.728 0.46 1.15 0.173 Chemotherapy 0.417 0.164 1.058 0.066 Chemo*CCNE1* 3.393 0.952 12.089 0.0595 Chemo*MCM5* 6.032 0.772 47.12 0.087

B) OS

Hazard Lower Upper Variable p ratio 95% CI 95% CI CCNE1 1.132 0.842 1.523 0.412 MCM5 0.766 0.371 1.580 0.47 Grading 2.656 1.017 6.939 0.046 Histotype 1.062 0.654 1.724 0.8075 Chemotherapy 0.384 0.125 1.178 0.094 Chemo*CCNE1* 1.651 0.358 7.616 0.5202 Chemo*MCM5* 0.670 0.173 2.592 0.562

C) PFS

Hazard Lower Upper Variable p ratio 95% CI 95% CI CCNE1 1.238 0.931 1.646 0.1412 Grading 2.415 0.972 5.999 0.0577

S-Table VI: Relationships between PFS (A) or OS (B) and CCNE1, MCM5, histotype, grading and chemotherapy by univariate Cox proportional hazard models. Relationships between PFS (C) and CCNE1 and grading by multivariate Cox proportional hazard models. Chemo*CCNE1 refers to stratified CCNE1 according to the discriminative gene expression level both for OS and PFS. Chemo*MCM5 refers to stratified MCM5 according to the discriminative gene expression level both for OS and PFS. p is the p-value (p<0.05).