Validated prediction of clinical outcome in sarcomas and

multiple types of cancer based on a -expression signature

related to genome complexity.

Frédéric Chibon1,2, Pauline Lagarde1,2, Sébastien Salas1, Gaëlle Pérot3,

Véronique Brouste4, Franck Tirode3, Carlo Lucchesi3, Aurélien de Reynies5, Audrey

Kauffmann6, Binh Bui1,2, Philippe Terrier7, Sylvie Bonvalot7, Axel Le Cesne7,

Dominique Vince-Ranchère8, Jean-Yves Blay8, Françoise Collin9, Louis Guillou10,

Agnès Leroux11, Jean-Michel Coindre1,2,12 and Alain Aurias3.

Supplementary informations

Nature Medicine: doi:10.1038/nm.2174

Cohort 1 n = 183 P = 0.41

0 1 year 2 years 3 years 4 years 5 years

G1 Patients at risk 91 71 57 40 33 29 Cumulated events 7 16 22 32 36 37 MFS 0.92 0.82 0.74 0.61 0.55 0.53

G2 Patients at risk 91 58 44 37 28 24 Cumulated events 5 22 30 34 35 36 MFS 0.95 0.74 0.63 0.57 0.56 0.54

Supplementary Figure 1: Prognostic value of the Carter’s signature in sarcomas cohort 1. The 183 patients of the cohort 1 have been stratified according to Carter’s signature (22). Tumors with an above-average level of Carter’s signature expression are in blue and tumors with an over-average level are in red. n = total number of cases. P values correspond to the log rank test comparing the survival curves.

Nature Medicine: doi:10.1038/nm.2174

GISTs

Independant group (Yamaguchi et al; 2008) n = 32 P = 0.003

Breast Cancers

Training set (Van’t Veer et al; 2002) Validation set (Van de Vijver et al; 2002) n = 78 n = 295 P = 2.67x10-4 P = 1.34x10-5

Lymphomas

Training set (Lenz et al; 2008) Validation set (Lenz et al; 2008) n = 136 n = 278 P = 0.01 P = 5.34x10-4

Supplementary figure 2: Metastasis-free survival (MFS) or overall survival (OS) analysis according to CINSARC signature in 3 other types of tumors To apply CINSARC centroid method, a training/validation methodology has been used. For GIST, centroids are those calculated in sarcomas cohort 1 (training set) and for breast cancers and lymphomas centroid have been calculated in each corresponding training set. Then CINSARC grade is assigned by Pearson correlation. Survival probability (Y axis) refers to MFS for GISTs and breast cancers, and to OS for lymphomas. Patients with the lowest CINSARC scores (Centroid C1) are in blue and those with the higher ones(Centroid C2) are in red.

Nature Medicine: doi:10.1038/nm.2174 n = total number of cases. P values correspond to the log rank test comparing the survival curves.

Amplified Arm Rearranged Total (%) 28 (16) 40 (23) 106 (61) FNCLCC grade (%) 1 & 2 14 (61) 24 (65) 29 (28) 3 9 (39) 13 (35) 75 (72) nd 5 3 2 Histological type (%) Undifferentiated 3(11) 9 (22) 52 (49) sarcomas 0 21 (53) 29 (27) Leiomyosarcomas Dedifferentiated 25 (89) 4 (10) 15 (14) liposarcomas 6 (15) 10 (10) Other Location (%) External trunk 9 (32) 26 (65) 100 (94) Trunk wall 1 (3.5) 8 (20) 18 (17) Extremities 7 (25) 18 (45) 81 (76) Head and neck 1 (3.5) 0 1 (1) Internal trunk 19 (68) 14 (35) 6 (6) CINSARC grade C1 22 (79) 15 (37) 42 (40) C2 6 (21) 25 (63) 64 (60) Supplementary table 1: CGH profiles and tumors characteristics For each profile, percentage in brackets indicates the proportion of tumors of the corresponding characteristic. From these data it appears that “amplified” sarcomas are mainly dedifferentiated liposarcomas (MDM2 amplification) and 2/3 of sarcomas with such genetics are localized in the internal trunk (among the 19 “amplified” sarcomas of the trunk 18 are dedifferentiated liposarcomas). In contrast, almost all “rearranged” sarcomas originate from external trunk and correspond to undifferentiated sarcomas and leiomyosarcomas. nd= not determined

Nature Medicine: doi:10.1038/nm.2174

a) Number of Probesets/clones in Input : 92 Number of corresponding unique Gene IDs found : 73 Observed in my Observed Fisher Exact GO ID selection On array pValue GO Term GO:0007067 19 122 1.46E-24 mitosis GO:0051301 19 174 7.10E-22 cell division GO:0007049 21 422 1.36E-17 cell cycle GO:0000775 6 37 1.78E-08 , pericentric region GO:0000074 8 181 1.02E-06 regulation of progression through cell cycle GO:0005694 6 118 1.02E-05 chromosome GO:0004674 8 338 7.96E-05 serine/threonine kinase activity GO:0008283 7 248 8.41E-05 cell proliferation GO:0006270 3 19 1.00E-04 DNA replication initiation GO:0000776 3 21 1.20E-04 kinetochore GO:0003777 4 62 1.54E-04 microtubule motor activity GO:0007018 4 75 3.22E-04 microtubule-based movement regulation of cyclin-dependent protein kinase GO:0000079 3 35 5.23E-04 activity GO:0005813 3 48 1.14E-03 centrosome GO:0005875 3 54 1.58E-03 microtubule associated complex GO:0006468 7 475 3.67E-03 protein amino acid phosphorylation GO:0046982 3 80 4.83E-03 protein heterodimerization activity GO:0005874 4 178 6.25E-03 microtubule GO:0006260 3 96 8.13E-03 DNA replication GO:0016301 3 184 4.18E-02 kinase activity b) Number of Probesets/clones in Input : 118 Number of corresponding unique Entrez Gene IDs found : 86 Observed in my Observed Fisher Exact GO ID selection On array pValue GO Term GO:0007067 23 122 4.50E-28 mitosis GO:0051301 23 174 7.15E-25 cell division GO:0007049 27 422 1.48E-21 cell cycle GO:0000775 8 37 9.04E-11 chromosome, pericentric region GO:0005819 6 14 7.45E-10 spindle GO:0007018 9 75 8.12E-10 microtubule-based movement GO:0003777 8 62 1.51E-09 microtubule motor activity GO:0005876 5 12 2.39E-08 spindle microtubule GO:0000074 10 181 9.46E-08 regulation of progression through cell cycle GO:0008283 11 248 1.72E-07 cell proliferation GO:0005874 8 178 7.01E-06 microtubule traversing start control point of mitotic cell GO:0007089 3 5 9.18E-06 cycle GO:0005875 5 54 1.62E-05 microtubule associated complex GO:0005694 6 118 5.52E-05 chromosome GO:0005871 3 16 1.39E-04 kinesin complex regulation of cyclin-dependent protein kinase GO:0000079 3 35 1.23E-03 activity GO:0004674 7 338 1.40E-03 protein serine/threonine kinase activity GO:0006468 8 475 5.32E-03 protein amino acid phosphorylation

Nature Medicine: doi:10.1038/nm.2174 GO:0006260 3 96 1.79E-02 DNA replication GO:0008284 3 145 4.99E-02 positive regulation of cell proliferation Supplementary table 2: GO analysis of significantly differentially expressed probesets in the t-tests according to histological grade (a) and number of CGH imbalances (b). Selected pathways are indicated in bold (P < 10-5). Note that among the 8 pathways significantly related to histological grade, 7 are also involved in chromosomal complexity observed by CGH. This likely means that the biological mechanisms involved in tumor aggressiveness are very close to those related to genome complexity.

Genes Location ID Probeset Chromosome Cytoband Starting nucleotide ANLN 1552619_a_at 7 p14.2 36395954 ASPM 219918_s_at 1 q31.3 195320014 AURKA 204092_s_at 20 q13.2 54377851 AURKB 239219_at 17 p13.1 8053900 BIRC5 202095_s_at 17 q25.3 73721943 BUB1 215509_s_at 2 q13 111132263 BUB1B 203755_at 15 q15.1 38240671 C13orf34 219544_at 13 q22.1 72215688 CCNA2 203418_at 4 q27 122958003 CCNB1 214710_s_at 5 q13.2 68498643 CCNB2 202705_at 15 q22.2 57184611 CDC2 203213_at 10 q21.2 62208241 CDC20 202870_s_at 1 p34.2 43597214 CDC45L 204126_s_at 22 q11.21 17847467 CDC6 203967_at 17 q21.2 35697671 CDC7 204510_at 1 p22.2 91739031 CDCA2 236957_at 8 p21.2 25396840 CDCA3 223307_at 12 p13.31 6828253 CDCA8 221520_s_at 1 p34.3 37930745 CENPA 204962_s_at 2 p23.3 26862424 CENPE 205046_at 4 q24 104246652 CENPL 1554271_a_at 1 q25.1 172035640 CEP55 218542_at 10 q23.33 95249894 CHEK1 205394_at 11 q24.2 125001852 CKS2 204170_s_at 9 q22.2 91115932 ECT2 219787_s_at 3 q26.31 174003355 ESPL1 38158_at 12 q13.13 51948383 FBXO5 234863_x_at 6 q25.2 153334024 FOXM1 202580_x_at 12 p13.33 2837110 H2AFX 205436_s_at 11 q23.3 118469796 HP1BP3 1554251_at 1 p36.12 20974735 KIAA1794 213007_at 15 q26.1 87637175 KIF11 204444_at 10 q23.33 94342970 KIF14 236641_at 1 q32.1 198787252 KIF15 219306_at 3 p21.31 44778288 KIF18A 221258_s_at 11 p14.1 27999125 KIF20A 218755_at 5 q31.2 137543246 KIF23 244427_at 15 q23 67524442 KIF2C 209408_at 1 p34.1 44978137 KIF4A 218355_at X q13.1 69426686 KIFC1 1555278_a_at 11 p11.2 46722142 MAD2L1 1554768_a_at 4 q27 121200631 MCM2 202107_s_at 3 q21.3 128800849 MCM7 210983_s_at 7 q22.1 99528854 MELK 204825_at 9 p13.2 36562872 NCAPH 212949_at 2 q11.2 96365230 NDE1 222625_s_at 16 p13.11 15651581 NEK2 204641_at 1 q32.3 209902744 NUF2 223381_at 1 q23.3 161558377 OIP5 213599_at 15 q15.1 39388609 PAK3 /// UBE2C 202954_at 20 q13.12 43874709 PBK 219148_at 8 p21.1 27723331 PLK4 204886_at 4 q28.1 129021599 PRC1 218009_s_at 15 q26.1 89310278 PTTG1 203554_x_at 5 q33.3 159781442 RAD51AP1 204146_at 12 p13.32 4518293 RNASEH2A 203022_at 19 p13.13 12778438 RRM2 201890_at 2 p25.1 10179961 SGOL2 230165_at 2 q33.1 201146742 SMC2 204240_s_at 9 q31.1 105896427 SPAG5 203145_at 17 q11.2 23928720 SPBC25 209891_at 2 q24.3 169436141 TOP2A 201291_s_at 17 q21.2 35660692 TPX2 210052_s_at 20 q11.21 29790791 TRIP13 204033_at 5 p15.33 946068 TTK 204822_at 6 q14.1 80771104 ZWINT 204026_s_at 10 q21.1 57787212 Supplementary table 3: The 67 Affymetrix® probesets of the CINSARC and their genomic location.

Nature Medicine: doi:10.1038/nm.2174

Observed Fisher in my Observed Exact GO.ID selection on array pValue Z-Score GO.Term GO:0000775 10 37 1.06E-14 23.58 chromosome, pericentric region GO:0005819 7 14 3.88E-12 27.03 spindle GO:0005876 6 12 1.48E-10 25.02 spindle microtubule GO:0005694 10 118 3.49E-10 12.73 chromosome GO:0005875 6 54 3.42E-07 11.42 microtubule associated complex GO:0005874 8 178 2.32E-06 7.88 microtubule GO:0000776 4 21 5.18E-06 12.42 kinetochore GO:0005871 3 16 9.08E-05 10.67 kinesin complex GO:0005813 4 48 0.0001 7.96 centrosome outer kinetochore of condensed GO:0000940 2 3 0.0002 16.72 chromosome GO:0030496 2 7 0.0008 10.84 midbody GO:0005657 2 8 0.0010 10.12 replication fork GO:0005814 2 9 0.0012 9.52 centriole GO:0015630 2 13 0.0022 7.84 microtubule cytoskeleton GO:0000922 2 16 0.0032 7.02 spindle pole GO:0000785 3 75 0.0059 4.47 chromatin GO:0000786 2 32 0.0111 4.77 nucleosome GO:0001939 1 3 0.0187 8.30 female pronucleus GO:0005816 1 3 0.0187 8.30 spindle pole body GO:0000930 1 4 0.0233 7.15 gamma-tubulin complex nuclear origin of replication recognition GO:0005664 1 4 0.0233 7.15 complex GO:0015030 1 4 0.0233 7.15 Cajal body GO:0005881 1 6 0.0325 5.78 cytoplasmic microtubule GO:0043234 2 64 0.0385 3.10 protein complex Supplementary table 4: GO analysis of the 67 CINSARC genes. The CINSARC genes are involved in these 24 significant pathways, 17 of which are directly related to mitosis control and chromosome integrity.

Nature Medicine: doi:10.1038/nm.2174