ARTICLE

DOI: 10.1038/s41467-017-02602-0 OPEN Integrative genomic and transcriptomic analysis of leiomyosarcoma

Priya Chudasama

Leiomyosarcoma (LMS) is an aggressive mesenchymal malignancy with few therapeutic options. The mechanisms underlying LMS development, including clinically actionable genetic vulnerabilities, are largely unknown. Here we show, using whole-exome and transcriptome

1234567890():,; sequencing, that LMS tumors are characterized by substantial mutational heterogeneity, near-universal inactivation of TP53 and RB1, widespread DNA copy number alterations including chromothripsis, and frequent whole-genome duplication. Furthermore, we detect alternative telomere lengthening in 78% of cases and identify recurrent alterations in telo- mere maintenance such as ATRX, RBL2, and SP100, providing insight into the genetic basis of this mechanism. Finally, most tumors display hallmarks of “BRCAness”, including alterations in homologous recombination DNA repair genes, multiple structural rearrange- ments, and enrichment of specific mutational signatures, and cultured LMS cells are sensitive towards olaparib and cisplatin. This comprehensive study of LMS genomics has uncovered key biological features that may inform future experimental research and enable the design of novel therapies.

Correspondence and requests for materials should be addressed to S.F. (email: [email protected]). #A full list of authors and their affliations appears at the end of the paper.

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0

eiomyosarcomas (LMS) are malignant tumors of smooth- PI3K-AKT signaling (MTOR, LAMA4) (Fig. 1c). These data Lmuscle origin that occur across age groups, accounting for showed that LMS tumors exhibit substantial mutational hetero- 10% of all soft-tissue sarcomas, and most commonly involve geneity and are possibly driven by loss of TP53 and/or RB1 the uterus, retroperitoneum, and large blood vessels. Long-term function together with a diverse spectrum of less commonly survival in LMS patients may be achieved through surgical mutated “ hills”, which may be different for each patient10. excision and adjuvant radiotherapy. However, local recurrence and/or metastasis develop in approximately 40% of cases1. fl Widespread DNA copy number changes and chromothripsis in Patients with disseminated LMS are usually incurable, as re ected LMS. We next performed genome-wide analysis of somatic copy- by a median survival after development of distant metastases of fi 2 number alterations (CNAs) and identi ed recurrent losses in 12 months , and cytotoxic chemotherapy is generally adminis- regions of 10, 11q, 13, 16q, and 17p13 (comprising tered with palliative intent. TP53) and recurrent gains of 17p12 (affecting Cytogenetic studies have shown that LMS are genetically MYOCD) (Fig. 2a, b and Supplementary Figure 1b), consistent complex, often exhibiting chaotic karyotypes, and no pathogno- with previous molecular cytogenetic studies5, 11. Most recurrently monic chromosomal rearrangements have been detected. More mutated genes were additionally targeted by CNAs (Supple- recent investigations employing microarray technologies and mentary Figure 1a). Furthermore, multiple cancer drivers as well targeted sequencing approaches have provided insight into as components of the CINSARC prognostic gene expression recurrent genetic features of LMS and associated histopathologic 12 3–5 signature were affected by CNAs in at least 30% of cases, characteristics and clinical outcomes . However, systematic including genes encoding tumor suppressors (PTEN, RB1, TP53), genome- and transcriptome-wide investigations of LMS using DNA repair (BRCA2, ATM), chromatin modifiers next-generation sequencing technology are lacking, and clinically (RBL2, DNMT3A, KAT6B), cytokine receptors (ALK, FGFR2, actionable genetic vulnerabilities remain unknown. FLT3, LIFR), and transcriptional regulators (PAX3, FOXO1, In this study, we have used whole-exome and RNA sequencing CDX2, SUFU) (Fig. 2a). We also detected regions of significant to characterize the molecular landscape of LMS. We identify a focal gains and losses using GISTIC2.013 (q < 0.25, Benjamini perturbed tumor suppressor network, widespread genomic −Hochberg correction) (Fig. 2b and Supplementary Data 1), and instability, and alternative lengthening of telomeres (ALT) as fi clustering of broad and focal CNAs demonstrated that individual hallmarks of this disease. Furthermore, our ndings uncover tumors had highly rearranged genomes (Supplementary Fig- genomic imprints of defective homologous recombination repair ure 1b). In addition, chromothripsis14 was present in 17 of (HRR) of DNA double-strand breaks as potential liability of LMS fi 49 samples (35%), with the number of affected chromosomes per tumors that could be exploited for therapeutic bene t, and pro- tumor ranging from one to five (Fig. 2c and Supplementary vide a map for future studies of additional genetic alterations or Data 1). Thus, variable patterns of widespread CNA and localized deregulated cellular processes as entry points for molecularly chromosome shattering further add to the genomic complexity of targeted interventions. LMS.

Results Transcriptomic characterization of LMS. We next sought to delineate biologically relevant subgroups of LMS defined by dif- Mutational landscape of LMS. We performed whole-exome fi sequencing and transcriptome sequencing in a cohort of 49 ferent gene expression pro les. Both unsupervised hierarchical patients with LMS (non-uterine, n = 39; uterine, n = 10; newly clustering (Fig. 3a) and principal component analysis (Supple- diagnosed, n = 20; locally recurrent, n = 6; metastatic, n = 23) mentary Figure 2a) revealed three distinct subgroups of patients. (Supplementary Data 1). We detected a total of 14,259 (median, analysis using DAVID on the top 100 highly – variable genes showed greater than tenfold enrichment (false 223; range, 79 1101) somatic single-nucleotide variants (SNVs), < of which 2522 (median, 39; range, 10–226) were non-silent, and discovery rate 0.05) of biological processes related to platelet 297 somatic small insertions/deletions (indels; median, 3; range, degranulation, complement activation, and metabolism for sub- 0–50) (Fig. 1a and Supplementary Data 1). The median somatic group 1; and muscle development and function and regulation of mutation rate was 3.09 (range, 1.05–14.76) per megabase (Mb) of membrane potential for subgroup 2. Subgroup 3 was character- target sequence, comparable to the rates observed in clear-cell ized by low expression of genes separating subgroups 1 and 2, but 6 showed medium to high levels of genes associated with myofibril kidney cancer or hepatocellular carcinoma . Recurrence analysis fi − using MutSigCV7 identified TP53 (49%), RB1 (27%), and ATRX assembly, muscle lament function, and cell cell signaling com- (24%) as significantly mutated genes (q < 0.01, Benjamini mon to subgroups 1 and 2 (Fig. 3a and Supplementary Data 1). −Hochberg correction) (Fig. 1a and Supplementary Figure 1a). Increased expression of ARL4C or CASQ2 and LMOD1, respec- tively, indicated that subgroups 2 and 3 correspond to previously TP53 mutations clustered in the DNA binding and tetrameriza- fi 15 tion motifs, whereas those affecting RB1 and ATRX were dis- identi ed LMS subtypes II and I (Supplementary Figure 2b) . tributed across the entire (Fig. 1b). SNVs and indels were also present in other established cancer genes8, albeit at low Biallelic inactivation of TP53 and RB1 in LMS. Further analysis frequencies (Fig. 1a, Supplementary Figure 1a, and Supplemen- of transcriptome data, coupled with RT-PCR validation, uncov- tary Data 1). Network analysis of the integrated collection of ered high-confidence fusion transcripts arising from chromoso- SNVs and indels using HotNet29 identified two significantly mal rearrangements in 34 of 37 cases (total number of fusions, n mutated subnetworks centered on TP53 and RB1 as “hot” nodes = 183; range, n = 1–29; Fig. 3b, c, Supplementary Figure 2c, and (P < 0.05, two-stage multiple hypothesis test and 100 permuta- Supplementary Data 1). While no recurrent fusions were detec- tions of the global interaction network), which encompassed ted, multiple rearrangements targeted TP53 and RB1 (Fig. 4a, b), genes related to DNA damage response and telomere main- which were predicted to result in out-of-frame fusion proteins or tenance (TOPORS, ATR, TP53BP1, TELO2), cell cycle and loss of critical functional domains in the majority of cases. This apoptosis regulation (PSDM11, CASP7, XPO1), epigenetic reg- indicated that TP53 and RB1 are disrupted by a variety of genetic ulation (HIST3H3, SETD7, KMT2C), MAPK signaling and posi- mechanisms in LMS tumors. In accordance, careful examination tive regulation of muscle cell proliferation (MAPK14, DUSP10, of exome data additionally revealed protein-damaging micro- MEF2C), regulation of mRNA stability (ZFP36L1, SRSF5), and deletions (20–100 base pairs (bp)), inversions, and exon skipping

2 NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0 ARTICLE events (Fig. 4c and Supplementary Figure 3a–c). Furthermore, we demonstrated biallelic disruption of TP53 and RB1 in 92 and 94% identified three cases with pathogenic germline alterations of cases, respectively (Fig. 5a). Three tumors with wild-type RB1 affecting TP53 (hemizygous loss, n = 1) or RB1 (mutation, n = 2). displayed loss of CDKN2A expression and overexpression of Integration of SNVs, indels, CNAs, fusions, and microalterations CCND1 as alternative mechanisms of RB1 suppression (Fig. 5b).

a c 12 10 8 TELO2 6 Samples (no.) 4 2 0105152025

Alterations (no.) 0 49% TP53 * 27% RB1 * 24% ATRX * ATR 12% MUC16 SNRPN PTEN 10% DST 10% TTN 8% NALCN TOPORS 8% APOB PSMD11 8% CELSR1 TP53 8% CUBN 8% PKD1L1 6% SSPO EIF2S2 SETD7 6% CRB1 6% IGSF10 6% WHSC1L1 XPO1 TP53BP1 6% SRCAP 4% PTEN MYH9 4% LAMA4 4% NF1 4% AMER1 4% CARS 4% COL1A1 4% FUBP1 SLC35E1 4% GATA3 4% KMT2C 4% NSD1 4% PPFIBP1 4% PSIP1 LAMA3 4% SPEN 4% TRIP11 4% TSC2 KDM5A CASP7 4% ALMS1 2% ATM RB1 LMS36 LMS50 LMS40 LMS38 LMS37 LMS12 LMS28 LMS14 LMS34 LMS17 LMS21 LMS24 LMS26 LMS22 LMS46 LMS13 LMS23 LMS27 LMS33 LMS42 LMS35 LMS43 LMS47 LMS49 LMS41 LMS18 LMS20 LMS45 LMS44 LMS39 LMS16 LMS48 LMS29 LMS15 LMS31 LMS11 LMS19 LMS30 LMS32 ULMS08 ULMS05 ULMS07 ULMS01 ULMS06 ULMS10 ULMS04 ULMS02 ULMS03 ULMS09

MAPK- MEF2C APK2 Alterations Gender Agee (years) Treatment ZFP36L1 Missense mutation Male <40 Chemotherapy KMT2C Stop-gain mutation Female 41–50 Radiation MAPK14 Splice-site mutation 51–60 Chemotherapy + radiation Frameshift deletion 61–70 Untreated SRSF5 DUSP10 Non-frameshift deletion >70 Unknown Frameshift insertion EPB42 KRT8 Germline missense mutation Germline stop-gain mutation P value (–log10) MutSigCV 07.0 b P58 V157F C238W K351* T155N I173M C176F 64fs 89fs 209fs R213Q G227V 228fs Y236C M237V R249S T256K L257Q E285K Y327* 341fs R342*

TP53 Trans SH3 DNA binding Tetra amino acid 0 100 200 300 393 E125* S443* Q176* R251* R376 Q444R M457R Y709C R830 7fs 346fs 545fs R698S

RB1 DUF3452 RB_A RB_B RB_C amino acid 0 200 400 600 800 982 Missense mutation Stop-gain mutation Splice-site mutation Frameshift deletion R666* D775Y S1245* Q1421* S2146P Q2242* 610fs 1582fs 1537fs 1726fs M1596V Frameshift insertion ATRX EZH2-ID SNF2 N-ter Hel amino acid 0 400 800 1200 1600 2000 2492

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0

Finally, we detected a single loss-of-function mutation in the of ALT considerably exceeded that of potentially deleterious basic helix-loop-helix domain of MAX, previously described in ATRX alterations (Figs. 1a, c and 6a, c), we investigated additional hereditary pheochromocytoma16, that was associated with over- genes from the TelNet database and observed that LMS tumors expression of CDK4 and CCND2 (Fig. 5b), possibly through are characterized by recurrent alterations in a broad spectrum of enhanced formation of MYC-MAX heterodimers activating the telomere maintenance genes (Fig. 6c). Of these, deletions of RBL2 CDK4 and CCND2 promoters or via disruption of the MAD- (P = 0.008) and SP100 (P = 0.02) showed the strongest association MAX repressor complex17, 18. These data showed that inactiva- with ALT positivity (P-values determined by Fisher exact test). tion of TP53 and RB1 is near-obligatory in LMS. RBL2 has been shown to block ALT by interacting with RINT132. SP100 has been implicated in ALT suppression by sequestering the MRE11/RAD50/NBS complex and is a major component of Whole-genome duplication in LMS. The variant allele fre- 33, 34 quencies of SNVs and indels affecting TP53 and RB1 were con- ALT-associated PML bodies . These data indicated that gruent with tumor purity, establishing TP53 and RB1 inactivation mechanisms beyond ATRX loss account for the exceptionally as truncal events in LMS development (Fig. 5c). Further investi- high frequency of ALT in LMS. gation of allele-specific copy number profiles revealed that 27 of 49 cases had undergone whole-genome duplication (WGD), “BRCAness” as potentially actionable feature of LMS. Our resulting in an average ploidy close to 4 (Fig. 5a, d and Supple- finding of frequent deletions targeting genes implicated in HRR of mentary Data 1). In most cases, only mutant TP53 and RB1 were DNA double-strand breaks (Fig. 2a), e.g. ATM, BRCA2, and – detectable irrespective of ploidy, suggesting that the respective PTEN35 37, prompted us to inquire if LMS tumors show genomic – wild-type alleles had been lost before WGD. Accordingly, the imprints of defective HRR, i.e. a “BRCAness” phenotype6, 38 40, allele-specific copy number profiles for a primary tumor/metas- which confers sensitivity to DNA double-strand break-inducing tasis pair demonstrated that the former had acquired TP53 and drugs, such as platinum derivatives, and poly(ADP-ribose) RB1 alterations with concomitant loss of wild-type chromosomes polymerase (PARP) inhibitors41.Wefirst interrogated genes that 17p and 13 (Fig. 5d). By comparison, the metastasis showed only have been described as synthetic lethal to PARP inhibition38, 42 minor differences regarding mutations and fusion transcripts, but and observed deleterious aberrations in multiple HRR compo- had undergone WGD (Fig. 5d, Supplementary Figure 2c, and nents, including PTEN (57%), BRCA2 (53%), ATM (22%), Supplementary Data 1), implying that tetraploidization was a CHEK1 (22%), XRCC3 (18%), CHEK2 (12%), BRCA1 (10%), and progression event preceded by loss of wild-type TP53 and RB1. RAD51 (10%), as well as in members of the Fanconi anemia These data again indicated that LMS is driven by a perturbed complementation groups, namely FANCA (27%) and FANCD2 tumor suppressor network (Fig. 5e), which gives rise to WGD and (10%) (Fig. 7a). Next, we detected enrichment of five known gross genomic instability, thereby accelerating tumor evolution, mutational signatures6 (Alexandrov-COSMIC (AC) 1: clock-like, – in the majority of cases19 21. spontaneous deamination; AC3: associated with defective HRR; AC5: clock-like, mechanism unknown; AC6 and AC26: asso- High frequency of ALT in LMS. To achieve replicative immor- ciated with mismatch repair (MMR) defects). Signature AC3 tality, approximately 85% of cancers re-activate TERT expres- contributed to the mutational catalog in 98% of samples, and the 22 confidence interval of the exposure to AC3 excluded zero in 57% sion . The remaining 15% maintain telomere length via a fi telomerase-independent mechanism termed ALT, which appears of samples (Fig. 7b). Comparison of the signatures identi ed in 23–25 the LMS cohort against a background of 7042 cancer samples to be particularly prevalent in cells of mesenchymal origin . = ALT has been correlated with loss of ATRX, a chromatin (whole-genome sequencing, n 507; whole-exome sequencing, n = 6535)6 demonstrated significant enrichment of AC1 (P = remodeling factor that incorporates histone variant H3.3 into − − − – 3 = 30 = 41 telomeric and pericentromeric regions in complex with DAXX26 1.31×10 ), AC3 (P 2.67×10 ), and AC26 (P 9.28×10 )in 28 fi LMS tumors (P-values determined by Fisher exact test followed . Our nding of recurrent ATRX alterations (SNVs, indels, − CNAs; Fig. 1a, b and Supplementary Figure 1a) suggested that by Benjamini Hochberg correction). Finally, clonogenic assays demonstrated that LMS cell lines harboring aberrations of mul- ALT might be a common feature of LMS. We therefore tested 49 patient samples for the presence of C-circles, extrachromosomal tiple genes that are synthetic lethal to PARP inhibition (Supple- mentary Figure 4) responded to the PARP inhibitor olaparib in a telomeric repeats that are hallmarks of ALT29. C-circles were detected in 38 of 49 samples (78%; Fig. 6a and Supplementary dose-dependent manner, an effect that was enhanced by a pulse of cisplatin prior to continuous olaparib treatment (Fig. 7c). Data 1), the highest frequency of ALT reported to date for any tumor entity30. ALT-positive cells also display extensive telomere These data showed that most LMS tumors exhibit phenotypic traits of “BRCAness”, which might provide a rationale for length heterogeneity, including the presence of very long telo- meres31 and typically resulting in high telomere content. Quan- therapies that target defective HRR. titative PCR revealed a wide range of telomere content in both ALT-positive and ALT-negative LMS tumors, but no correlation Discussion between ALT status and telomere content, both absolute and This study represents a comprehensive analysis of the genomic relative to normal controls, indicating that telomere content is not alterations that underlie the development of LMS, an aggressive a relevant marker for ALT in LMS (Fig. 6b). Since the frequency and difficult-to-treat malignancy for which no targeted therapy

Fig. 1 Mutational landscape of adult LMS. a Frequency and type of mutations. Rows represent individual genes, columns represent individual tumors. Genes are sorted according to frequency of SNVs/indels (left). Asterisks indicate significantly mutated genes according to MutSigCV. Bars depict the number of SNVs/indels for individual tumors (top) and genes (right). Established cancer genes are shown in bold. Types of mutations and selected clinical features are annotated according to the color codes (bottom). b Schematic representation of SNVs/indels in TP53, RB1, and ATRX. Protein domains are indicated (Trans transactivation domain, SH3 Src homology 3-like domain, Tetra tetramerization domain, DUF3452 domain of unknown function, RB_A RB1- associated protein domain A, RB_B RB1-associated protein domain B, RB_C RB1-associated protein domain C, EZH2-ID EZH2 interaction domain, SNF2 N ter SNF2 family N-terminal domain, Hel helicase domain). c Top subnetworks from HotNet2 analysis of genes harboring SNVs/indels. MutSigCV P-values (−log10) for individual genes are annotated according to the color code

4 NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0 ARTICLE

a ARNT MLLT1 MYOCD MUC1 MAP2K4 LMNA LIFR NCOR1 Frequency (%) 100 Gain 50 0 Loss 50 100 Chromosome 1 2 34 5678 9 1011 1213 141516 17 18 19 20 21 22 23

C2orf44 PAX3 NCOA4 TLX1 PICALM CDX2 CYLD TP53 NCOA1 ACSL3 CCDC6 NFKB2 MAML2 FLT3 DNMT3A ACKR3 TET1 SUFU BIRC3 BRCA2 ALK PRF1 NT5C2 ATM LHFP KAT6B VTI1A DDX10 FOXO1 NUTM2A TCF7L2 POU2AF1 LCP1 BMPR1A FGFR2 SDHD RB1 b PTEN CEP55 ZBTB16 FAS KIF11 PAFAH1B2 1 PCSK7 2 c 4 chr 3 4 chr 9 3 4 5p15.33 5 2 2 6 7 8 8q11.21 9 0 0 10

Chromosome 11 12 −2 −2 13 13q34 14 15 15q26.3 16

17p12 log2 ratio (tumor vs. control read counts) log2 ratio (tumor vs. control read counts) 17 −4 −4 18 19 50 100 150 50 100 19q13.33 Mb Mb 21 20 22

0.25 –2 –3 –6 –10 –20 4 4 10 10 10 chr 15 chr 17 10 10 q value (gains) 1p36.33 1 2 2 1q43 2 2q37.3 3 0 0 4 4p16.3 4q35.1 5 5q35.3 6 6p25.3 −2 −2 7 9p24.3 8 9 9q34.3 log2 ratio (tumor vs. control read counts) −4 log2 ratio (tumor vs. control read counts) −4 10 10p15.3 10q22.2 40 60 80 100 20 40 60 80 11q25 12 Mb Mb 13 13q14.2 14 15 16q24.3 16 17p13.1 17 17q25.3 18 18q23 19 19p13.3 21 20 19q13.43 22 22q13.33

–2 –3 –5 –9 0.25 –20 10 10 10 10 10 q value (losses)

Fig. 2 Genomic imbalances in adult LMS. a Overall pattern of CNAs. Chromosomes are represented along the horizontal axis, frequencies of chromosomal gains (red) and losses (blue) are represented along the vertical axis. Established cancer genes (black) and components of the CINSARC signature (blue) affected by CNAs in at least 30% of cases are indicated. b GISTIC2.0 plot of recurrent focal gains (top) and losses (bottom). The green line indicates the cut-off for significance (q = 0.25). c Read-depth plots of case LMS24 showing oscillating CNAs of chromosomes 3, 9, 15, and 17 (red dotted lines), indicative of chromothripsis. Gray lines indicate centromeres. Mb megabase, chr chromosome

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0

a SG1 SG2 SG3

TNNT1 TRIM54 FBP2 MYOZ1 TMEM132D HCN4 CTA−150C2.13 DGKK SST RP11−95M15.1 TNP1 RP11−1042B17.3 CST5 RP11−1338A24.1 LMOD2 RP11−539E17.4 REG1A IAPP RP1−46F2.3 AC097713.3 RBP2 RP11−711K1.7 IDI2 AC104794.4 RP11−63E5.1 RP11−805I24.3 INS KRT18P36 GAGE1 SSX2 AP002856.7 RP11−805I24.2 IGHEP1 MYL2 MAGEB2 ZNF716 RP1−46F2.2 FAM230B AC069277.2 CASP14 RP11−429E11.2 RP11−128P17.2 HTR2C PRSS1 FATE1 MYADML2 PPP1R17 RP11−672L10.2 C10orf71 PPP1R3A MAGEA3 PAGE2 TNNI1 MYOD1 PITX3 CHRND CHRNA1 LCE3A CAV3 RP11−334A14.8 SLITRK6 AKR1D1 TMEM196 ACSM2A LRG1 CHRNA4 FGL1 SLC25A47 SLC2A2 F9 C8A FGB C8B IGFBP1 TM4SF5 FGG Normalized read counts APOH HRG F13B ARG1 CYP4F2 10 CPN1 APOF CYP2B6 AZGP1 8 GLYAT BAAT TM4SF4 CRP GC 6 PLG APOA1 FABP1 HAO1 C9 4 UGT2B7 SLC13A5 UGT2B4 C3P1 2 SPP2 LMS35 LMS50 LMS27 LMS36 LMS40 LMS29 LMS28 LMS34 LMS33 LMS49 LMS19 LMS44 LMS14 LMS43 LMS26 LMS13 LMS39 LMS47 LMS22 LMS12 LMS15 LMS18 LMS45 LMS48 LMS37 LMS17 LMS21 LMS42 LMS11 ULMS02 ULMS04 ULMS09 ULMS10 ULMS07 ULMS06 ULMS03 ULMS05 b

MKL1 LMS21 LMS27 MYH9 LMS44 ZNF335 SEC23B (2) CSRP2BP (3) STK4 DUSP10 RP11-815M8.1

Y Y Y X X X 1 22 1 22 1 22 21 21 21 20 2 20 2 LDLRAD4 20 2 19 LINC00670 19 19 18 (2) 18 3 STX8 18 3 3 17 17 17 16 16 16 4 4 4 15 15 15 C15orf38 14 5 14 5 SORD2P 14 5 13 13 13 6 6 6 (2) 12 12 12 STK24 7 7 7 11 R 11 8 11 8

9 RB1 9 AC1 8 DNAJC3 10 10 9 GLI1 10 ITGA5

(2) (2) DLG2 LGR4

VCL

BBOX1-AS1

CAMK2G

14 c 30 Interchromosomal 30 Intrachromosomal 12 25 25 10 20 20 8 15 15 6 10 10 4 Number of fusion events Number of fusion events Number of fusion events 5 5 2

0 0 0 RB1 VCL ABR TP53 MKL1 UBR1 RAC1 GLG1 DLG2 RFX7 FMN1 PELI1 chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 PEPD MYH9 chrX chrY SMG6 GATM STK24 LMS39 LMS29 LMS45 LMS44 LMS18 LMS15 LMS34 LMS14 LMS42 LMS48 LMS17 LMS21 LMS28 LMS49 LMS11 LMS12 LMS22 LMS26 LMS35 LMS36 LMS13 LMS19 LMS27 LMS33 LMS37 LMS43 LMS31 LMS47 TOP3A DLEU2 MYH10 ADCY5 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 KDM6B TRPM7 ZNF461 DIAPH3 ULMS03 ULMS09 ULMS06 ULMS10 ULMS02 ULMS05 ULMS07 LRRC48 DNAJC3 OSBPL5 CAMK2G SLC28A2 LDLRAD4 SMARCA2 FAM189A1 NOTCH2NL Fig. 3 Transcriptomic characterization of adult LMS. a Unsupervised hierarchical clustering based on the top 100 differentially expressed genes showing separation of tumors into three subgroups (SG1–3; dendrogram colors green, brown, and magenta). The heatmap displays normalized read count values for individual genes, which were centered, scaled (z-score), and quantile-discretized. b Structural variant plots of fusion transcripts in three tumors identified by TopHat2 and validated by RT-PCR (blue, intrachromosomal; red, interchromosomal) or visual inspection using Integrative Genomics Viewer (gray). Numbers in parentheses indicate the number of fusions involving the respective gene. c Number of fusion events per chromosome (left), tumor (middle), and gene (right). chr, chromosome

6 NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0 ARTICLE

abTP53: translocation and fusion with TCERG1 TP53 RB1 LMS44_TS chr5 chr17 FLNA chr5:145,843,633–145,846,882 chr17:7,589,422–7,592,671 PTOV1−AS1 CSMD2 HORMAD1

Y Y Breakpoint Breakpoint X 1 X 22 22 1 21 21 FHOD3 20 2 20 2 19 19 TP53 18 TNFSF12 3 18 3 17 17 16 4 16 4 15 FMN1 15 5 14 14 5 TECRG1 13 6 13 DNAJC3 (2) 6 KCTD20 12 DIAPH3 (2) 12 7 DLEU1RB1 7 11 8 11 8 9

9 10 ATP8A2 10 EPSTI1

ZNF37A TCERG1 TP53

c RB1: fusion with ATP8A2 LMS45_TS TP53 Wild-type chr13 chr13

e4 Intergenic region chr13:26,271,145–26,275,487 chr13:48,936,033–48,940,375 5′-UTR chr17p13.1 chr17q21.31 e10 e11 Breakpoint Breakpoint

RB1 Wild-type

e12 e24 e2 e17 e19 e

Microdeletion Distal inversion i18 i19

Intragenic inversion Exon skipping ATP8A2 RB1

Fig. 4 Genetic lesions targeting TP53 and RB1 in adult LMS. a Structural variant plots of all fusion transcripts involving TP53 and RB1 detected in 37 tumors. b Interchromosomal rearrangement resulting in a non-functional TP53-TCERG1 fusion transcript in case LMS44 (top) and intrachromosomal rearrangement resulting in a non-functional RB1-ATP8A2 fusion transcript in case LMS45 (bottom). TS transcriptome sequencing, chr chromosome. c Schematic representation of different genetic lesions targeting TP53 and RB1. e exon, i intron, chr chromosome, UTR untranslated region exists. Our findings not only advance current insights into the telomere maintenance genes51, our comprehensive catalog of molecular basis of LMS, which are primarily based on lower- genomic and transcriptomic alterations in LMS tumors provides resolution microarray analyses and targeted sequencing of selec- an opportunity to select novel candidate drivers of ALT, such as ted cancer genes, but may also have tangible clinical implications. RBL2 and SP100, for future functional and mechanistic We observed that widespread genomic imbalances, in parti- investigations. cular chromosomal losses affecting tumor suppressor genes such Treatment of advanced-stage soft-tissue sarcoma, including as TP53, RB1, and PTEN, are a hallmark of LMS, in keeping with LMS, is difficult, and for more than 30 years, doxorubicin, ifos- previous studies3, 43, 44. However, an unexpected finding in this famide, and dacarbazine were the only active drugs in this setting. study was the very high frequency of biallelic TP53 and RB1 Additional agents have been tested, including gemcitabine, tax- inactivation in LMS tumors. While it has been known that anes, trabectedin, pazopanib, and eribulin. However, none has patients with Li-Fraumeni syndrome or hereditary retino- proven superior to doxorubicin, and molecularly guided ther- blastoma, which are associated with germline defects in TP53 and apeutic strategies remain elusive52. Very recent data indicate that RB1, respectively, have an increased risk for developing LMS as the anti-PDGFRA olaratumab in combination with secondary malignancy45, 46, the frequency of TP53 and RB1 dis- doxorubicin may improve survival, but these results await con- ruption in sporadic LMS was reported to be in the range of 50% firmation from phase 3 clinical trials53. We have found that most or lower3, 43, 44. In our study, whole-exome and transcriptome LMS tumors exhibit genomic “scarring” suggestive of impaired sequencing enabled the discovery that TP53 and RB1 are targeted HRR of DNA double-strand breaks, which might represent a by diverse genetic mechanisms (SNVs, indels, CNAs, chromo- suitable target for therapeutic intervention through repositioning somal rearrangements, and microalterations (e.g. novel deletions of small-molecule PARP inhibitors38. Given that the concept of affecting the TP53 transcription start site)) in more than 90% of “BRCAness” was primarily introduced in BRCA1/2-deficient cases, establishing biallelic TP53 and RB1 inactivation as unifying epithelial cancers, further mechanistic evaluation of the HRR feature of LMS development. In addition to providing an pathway and, most importantly, genomics-guided clinical trials in exhaustive picture of the tumor suppressor landscape of LMS, our LMS patients will be necessary to formally establish whether a data identify chromothripsis and WGD, crucial events in the “BRCAness” phenotype confers sensitivity to these drugs as in pathogenesis of various cancers47, 48, as previously unrecognized breast, ovarian, and prostate cancer. However, preclinical obser- manifestations of genomic instability in this disease. vations54 as well as preliminary data from a phase 1b trial of Our findings indicate that LMS cells primarily rely on ALT to olaparib and trabectedin in unselected patients with relapsed overcome replicative mortality. However, the high prevalence of bone and soft-tissue sarcomas (Grignani et al., ASCO Annual ALT in LMS (78% in our cohort) cannot be explained by the Meeting, 2016) suggest that this might be the case. frequency of potentially deleterious ATRX alterations observed by Apart from defective HRR, our analysis revealed additional us and others (49% and 16–26% of cases, respectively)4, 49, 50.In leads for investigations into genetic alterations or deregulated conjunction with the continuously growing list of putative cellular processes that might be exploited for therapeutic benefit.

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0

– For example, blockade of the PI3K-AKT-mTOR axis might be chemotherapeutics or other targeted agents58 60. A recent study effective in LMS tumors (57% in our cohort) harboring PTEN reported that the BLM DNA helicase drives an aggravated ALT alterations55, 56. Furthermore, it has been shown that ALT ren- phenotype in the absence of FANCD2 and FANCA61, suggesting ders cancer cells sensitive to ATR inhibitors57. Amplifications of that BLM inhibition59 may provide a means to target FANCD2- TOP3A (28%), BLM (12%), and DNMT1 (12%) may provide a and FANCA-deficient LMS cells. Finally, DNA methyltransferase basis for the combinatorial use of the respective inhibitors with inhibitors enhance the cytotoxic effect of PARP inhibition in

a b A *** * TP53 B Hemizygous loss 400 WGD Rearrangement TS Small deletion 300 4 Somatic mutation RB1 status 3 Germline deletion/mutation Aberrant 2 200 Wild-type Microdeletion expression (FPKM) 1 No change detected Integral 0 100 Present copy number copy –1 –2 Absent CDKN2A 0 LOH CNN-LOH Higher- Biallelic * CDKN2A deletion or loss ploidy alteration of expression/ 0 50 100 150 LOH CCND1 overexpression MAX mutation CCND1 expression (FPKM) A * * ** * RB1 B WGD TS 6 100 MAX status 5 4 Mutant 3 Wild-type 2 expression (FPKM) Integral 1

copy number copy 50 0

–1 CDK4 LOH CNN-LOH Higher- Normal ploidy LOH 0510 CCND2 expression (FPKM)

c 1.00 d LMS17 (primary tumor) - WGD absent Ploidy: 1.75, aberrant cell fraction: 86%, goodness of fit: 98.8%

0.75 Chromosome 5 12345678910111213141516171819202122X

variant 0.50 4 3 TP53 0.25 allele frequency 2 copy number copy 0.00 Allele-specific 1

0.00 0.25 0.50 0.75 1.00 0 Tumor purity (%) LMS21 (metastasis) - WGD present Ploidy: 3.03, aberrant cell fraction: 84%, goodness of fit: 96.9% 1.00 Chromosome 5 0.75 12345678910111213141516171819202122X 4 0.50

variant 3

2 RB1 0.25 allele frequency copy number copy Allele-specific 1

0.00 0 0.00 0.25 0.50 0.75 1.00 Tumor purity (%)

CDKN2A e MAX 8% MAD PI3K PTEN 2% 57% p16INK4a p14ARF 8% 8% PIP2 PIP3

CDK4 CCND2 CCND1 MDM2 2% 2% 6% PDK1

RB1 TP53 94% 92% AKT E2F1 Growth arrest 6% apoptosis MTORC1 Proliferation

RPS6K1

Protein synthesis

8 NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0 ARTICLE cancer cells62, suggesting a mechanism-based strategy for com- baits at 65 °C for 16 h. Paired-end sequencing (2×101 bp) was carried out with a bination therapy of LMS tumors with BRCAness, a subset of HiSeq 2500 instrument (Illumina). which are characterized by DNMT1 copy number gains that may increase PARP binding to chromatin. Mapping of whole-exome sequencing data. Reads were mapped to the 1000 In summary, this comprehensive genomic and transcriptomic Genomes Phase 2 assembly of the Genome Reference Consortium (build 37, version hs37d5) using BWA (version 0.6.2) with default parameters and analysis has unveiled that LMS is characterized by substantial maximum insert size set to 1000 bp. BAM files were sorted with SAMtools (version mutational heterogeneity, genomic instability, universal inacti- 0.1.19), and duplicates were marked with Picard tools (version 1.90). Sequencing vation of TP53 and RB1, and frequent WGD. Furthermore, we coverages and additional quality parameters are summarized in Supplementary have established that most LMS tumors rely on ALT to escape Data 1. replicative senescence, and identified recurrent alterations in a broad spectrum of telomere maintenance genes. Finally, our Whole-genome sequencing. Whole-genome sequencing libraries were prepared findings uncover “BRCAness” as potentially actionable feature of using the TrueSeq Nano Library Preparation Kit (Illumina) using the manu- facturer’s instructions. Paired-end sequencing (2×151 bp) was carried out with a LMS tumors, and provide a rich resource for guiding future HiSeq X instrument (Illumina). investigations into the mechanisms underlying LMS development and the design of novel therapeutic strategies. Mapping of whole-genome sequencing data. Reads were mapped to the 1000 Genomes Phase 2 assembly of the Genome Reference Consortium human genome Methods (build 37, version hs37d5) using BWA mem (version 0.7.8) with option -T 0. BAM fi 63 Patient samples. For whole-exome and transcriptome sequencing, fresh-frozen les were sorted with SAMtools (version 0.1.19) , and duplicates were marked tumor specimens and matched normal control samples (Supplementary Data 1) with Picard tools (version 1.125) using default parameters. were collected from 49 adult patients who had been diagnosed with LMS according to World Health Organization criteria at four German cancer centers (NCT Hei- Transcriptome sequencing. RNA sequencing libraries were prepared using the delberg and Heidelberg University Hospital, Heidelberg; Mannheim University TruSeq RNA Sample Preparation Kit v2 (Illumina), normalized to 10 nM, pooled to Medical Center, Mannheim; West German Cancer Center, Essen; Eberhard Karls 11-plexes, and clustered on a cBot system (Illumina) to a final concentration of 10 University Hospital, Tübingen). Specimens were obtained from different anatomic pM with a spike-in of 1% PhiX Control v3 (Illumina). Paired-end sequencing sites, and the cohort included both treatment-naïve and previously treated patients (2×101 bp) was carried out with a HiSeq 2000 instrument (Illumina). (Supplementary Data 1). Samples were pseudonymized, and tumor histology and cellularity were assessed at the Institute of Pathology, Heidelberg University Hospital, prior to further processing. Twelve cases were excluded from tran- Mapping of transcriptome sequencing data. RNA sequencing reads were scriptome sequencing due to insufficient quantity and/or quality of RNA. Patient mapped with STAR (version 2.3.0e)64. For building the index, the 1000 Genomes samples were obtained under protocol S-206/2011, approved by the Ethics Com- reference sequence with GENCODE version 17 transcript annotations was used. mittee of Heidelberg University, with written informed consent from all human For alignment, the following parameters were used: alignIntronMax 500,000, participants. This study was conducted in accordance with the Declaration of alignMatesGapMax 500,000, outSAMunmapped Within, outFilterMultimapNmax Helsinki. 1, outFilterMismatchNmax 3, outFilterMismatchNoverLmax 0.3, sjdbOverhang 50, chimSegmentMin 15, chimScoreMin 1, chimScoreJunctionNonGTAG 0, chim- JunctionOverhangMin 15. The output was converted to sorted BAM files with Cell lines . SK-UT-1, SK-UT-1B, and MES-SA cells were purchased from American SAMtools, and duplicates were marked with Picard tools (version 1.90). Type Culture Collection. SK-LMS-1 cells were provided by Sebastian Bauer (West German Cancer Center, Essen). Cell line identity and purity were verified using the Multiplex Cell Authentication and Contamination Tests (Multiplexion). All cell Detection of SNVs and small indels. Somatic SNVs were detected from matched lines were regularly tested for mycoplasma contamination using the Venor GeM tumor/normal pairs with our in-house analysis pipeline based on SAMtools mpi- Mycoplasma Detection Kit (Minerva). Cell lines were cultured as follows: SK-LMS- leup and bcftools with parameter adjustments and using heuristic filtering as 1 in RPMI-1640 (Life Technologies), 15% FBS; SK-UT-1 and SK-UT-1B in MEM previously described65. In brief, SAMtools (version 0.1.19) mpileup was called on (Life Technologies), 10% FBS; MES-SA in McCoy’s medium, 10% FBS. All media the tumor BAM file with parameters RE -q 20 -ug to consider only reads with a were supplemented with 1% penicillin/streptomycin and 1% L-glutamine minimum mapping quality of 20 and bases with a minimum base quality of 13. The (Biochrom). output was piped to BCFtools (version 0.1.19) view, which, by using parameters -vcgN -p 2.0, reports all positions containing at least one high-quality non-refer- ence base. From these initial SNV calls, the ones with at least five variant reads and Isolation of analytes. DNA and RNA from tumor specimens and DNA from a variant allele frequency of at least 5% were retained. Any variant call that was control samples were isolated at the central DKFZ-HIPO Sample Processing supported by reads from only one strand was discarded if one of the Illumina- Laboratory using the AllPrep DNA/RNA/Protein Mini Kit (Qiagen), followed by specific error profiles occurred in a sequence context of ±10 bases around the SNV. quality control and quantification using a Qubit 2.0 Fluorometer (Invitrogen) and a For categorizing variants as germline or somatic, a pileup of the bases in the 2100 Bioanalyzer system (Agilent). matched control sample was generated for each SNV position by SAMtools mpi- leup with parameters -Q 0 -q 1, considering uniquely mapping reads and not Whole-exome sequencing. Exome capturing was performed using SureSelect putting a restriction to base quality. For high-confidence somatic SNVs, the cov- Human All Exon V5+UTRs in-solution capture reagents (Agilent). Briefly, 1.5 μg erage at the position in the control must be at least ten, and less than 1/30 of the genomic DNA were fragmented to 150–200 bp insert size with a Covaris S2 device, control bases may support the variant observed in the tumor. Variants that were and 250 ng of Illumina adapter-containing libraries were hybridized with exome located in regions of low mappability or overlapped with entries of the

Fig. 5 Biallelic inactivation of TP53 and RB1 and whole-genome duplication (WGD) in adult LMS. a Combined analysis of genetic lesions and allele-specific copy number showing frequent biallelic inactivation of TP53 and RB1. In the top panels, samples are plotted from left to right based on their copy number composition, and genetic lesions specific for the A and B alleles as well as the presence or absence of WGD are annotated according to the color code. Asterisks indicate cases with either loss of CDKN2A expression in combination with CCND1 overexpression or MAX mutation. In the bottom panels, allele- specific integral copy numbers are plotted. Cases with retention of a single allele are assigned to the loss-of-heterozygosity (LOH) group, cases with one or more alleles derived from the same parental allele are assigned to the copy number-neutral (CNN) or higher-ploidy LOH groups, and cases with different combinations of maternal and paternal alleles are assigned to the normal or biallelic alteration group, respectively. TS transcriptome sequencing. b Scatter plots showing expression of CDKN2A and CCND1 in cases with wild-type and aberrant RB1 (left) and expression of CDK4 and CCND2 in cases with wild-type and mutant MAX (right). FPKM fragments per kilobase of transcript per million mapped reads. c Scatter plots showing congruency of TP53 and RB1 variant allele frequencies with tumor purity as detected by allele-specific copy number analysis. d Allele-specific copy-number profiles for a primary tumor/ metastasis pair showing absence of WGD in the primary tumor (top) and presence of WGD in the metastasis (bottom). Chromosomes are represented along the horizontal axis, copy numbers are indicated along the vertical axis. The purple line indicates the total allele-specific copy number. The blue line indicates the minor allele-specific copy number. e Genes involved in cell cycle regulation or PI3K-AKT-mTOR signaling recurrently affected by genetic alterations in LMS tumors. Blue and red boxes denote genes with inactivating and activating lesions, respectively. Percentage values indicate the collective frequencies of SNVs, indels, CNAs, fusions, microalterations, and aberrant expression affecting the respective genes

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0

a

U2OS HeLa LMS33 LMS11 LMS12 LMS13 LMS34 LMS35 ULMS07ULMS09LMS30 ULMS10ULMS03LMS24 ULMS04LMS36 LMS14 LMS15 LMS37 LMS16 LMS39 LMS17 LMS19 + pol ALT-positive ALT-negative – pol

LMS43 LMS26 LMS27 LMS48 LMS49 LMS50 LMS31 ULMS10LMS32 LMS20 LMS40 LMS41 LMS21 LMS42 ULMS01ULMS02LMS22 ULMS05ULMS06LMS44 LMS28 ULMS08LMS45 LMS46 LMS47 LMS18 LMS29 + pol

– pol

b 5 5 ALT-positive ALT-positive ALT-negative ALT-negative 4 4 3

2 3

1 2

0 Telomere content

LMS tumor samples of tumor sample (T/S) 1 Relative telomere content –1

–2 0 ((T/S) tumor vs. (T/S) control, log2 ratio) LMS tumor samples

c 15 10 Samples (no.) 5 5 0 10 15 30 20 25 0 49% ATRX Alterations (no.) Alterations 35% RBL2 35% RPA1 33% TOP3A 31% SP100 29% RIF1 27% TERF2IP 24% TERF2 24% TERT 22% MRE11A 22% TPP1 20% ASF1A 20% CBX3 18% BLM 18% DNMT1 18% ERCC4 18% HMBOX1 18% PIN1 18% SUMO1 18% UBE2I 16% HDAC7 16% RINT1 16% RPA2 16% TINF2 14% CDKN1A 14% PARP2 14% PML 12% POT1 12% DAXX 12% FEN1 12% HMGN5 Alterations 12% PCNA Missense mutation 12% XRCC6 3′-UTR mutation 10% ASF1B Stop-gain mutation 10% SPEN6 10% TEN1 Insertion/deletion 8% SUMO2 Hemizygous deletion 8% TERF1 Homozygous deletion 6% H3F3A Amplification 4% NSMCE2 ALT status

LMS20 LMS14 LMS44 LMS17 LMS45 LMS18 LMS37 LMS21 LMS47 LMS28 LMS48 LMS36 LMS22 LMS30 LMS34 LMS35 LMS19 LMS40 LMS16 LMS41 LMS39 LMS12 LMS23 LMS27 LMS24 LMS15 LMS31 LMS29 LMS42 LMS49 LMS13 LMS46 LMS33 LMS11 LMS50 LMS38 LMS43 LMS32 LMS26 Positive ULMS10 ULMS08 ULMS03 ULMS04 ULMS07 ULMS05 ULMS06 ULMS01 ULMS02 ULMS09 Negative n.d.

Fig. 6 High frequency of alternative lengthening of telomeres (ALT) in adult LMS. a Detection of C-circles in LMS tumors and control cell lines (U2OS, positive control; HeLa, negative control). Shown are test samples (top row) and control samples (bottom row). ALT-positive samples, as inferred from the enriched C-circle signal, are indicated in red. ALT-negative samples are indicated in blue. +pol, with polymerase; −pol, without polymerase. b Measurement of telomere content in LMS tumors. Telomere quantitative PCR was performed on tumor and matched control samples, and telomere repeat signals were normalized to a single-copy gene (36B4; T/S ratio). Shown are the telomere contents of tumor samples relative to those of control samples (left) and the absolute telomere contents of tumor samples (right). c Recurrent alterations in telomerase maintenance genes in LMS tumors. Rows represent individual genes, columns represent individual tumors. Genes are sorted according to frequency of SNVs, indels, and CNAs (left). Bars depict the number of alterations for individual tumors (top) and genes (right). Types of alterations and ALT status are annotated according to the color codes (bottom). UTR untranslated region; n.d. not determined

10 NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0 ARTICLE hiSeqDepthTopPt1Pct track from the UCSC Genome Browser, Encode DAC using ANNOVAR, somatic, non-silent coding variants of high confidence were Blacklisted Regions, or Duke Excluded Regions were excluded. High-confidence selected except for the analysis of mutational signatures, where all high confidence, SNVs were also not allowed to overlap with any two of the following features at the including non-coding and silent, somatic variants were used. Small indels were same time: tandem repeats, simple repeats, low complexity, satellite repeats, or identified by Platypus (version 0.5.2; parameters: genIndels = 1, genSNPs = 0, segmental duplications. After annotation with RefSeq (version September 2013) ploidy = 2, nIndividuals = 2) by providing matched tumor and control BAM files.

a 15 10 Samples (no.) 5 25 5 10 15 30 20 0 0 57% PTEN Alterations (no.) 55% BRCA2 37% USP11 35% CDK1 27% FANCA 27% PNKP 27% STK36 24% LIG1 24% XRCC3 22% ATM 22% CHEK1 18% MAPK12 18% RAD51 18% SHFM1 18% XRCC1 16% CHEK2 16% FANCC 16% TSSK3 14% CDK5 14% FANCD2 14% XRCC2 12% BRCA1 12% NAMPT 12% RAD54B 12% XAB2 Alterations 10% RAD54L Missense mutation 8% ATR 3′-UTR mutation 8% DDB1 Hemizygous deletion 8% RAD18 Homozygous deletion 6% BAP1 Amplification 6% PALB2 Treatment 4% NBN Chemotherapy Radiation

LMS18 LMS46 LMS17 LMS27 LMS42 LMS12 LMS48 LMS22 LMS20 LMS24 LMS33 LMS31 LMS44 LMS16 LMS43 LMS21 LMS14 LMS32 LMS45 LMS49 LMS29 LMS13 LMS34 LMS35 LMS15 LMS30 LMS23 LMS39 LMS47 LMS41 LMS11 LMS40 LMS19 LMS38 LMS26 LMS50 LMS28 LMS37 LMS36 Chemotherapy + radiation ULMS06 ULMS05 ULMS07 ULMS02 ULMS03 ULMS10 ULMS08 ULMS01 ULMS04 ULMS09 Untreated Unknown b

800 Total 400 0

800 AC1 400 0

800 AC3 400 0

800 AC5 400 0 Exposure (no. of SNVs)

800 AC6 400 0

800 AC26 400 0 LMS35 LMS40 LMS18 LMS11 LMS49 LMS48 LMS34 LMS16 LMS21 LMS15 LMS44 LMS28 LMS13 LMS45 LMS14 LMS24 LMS42 LMS17 LMS30 LMS29 LMS32 LMS27 LMS47 LMS23 LMS12 LMS22 LMS31 LMS38 LMS50 LMS43 LMS33 LMS37 LMS20 LMS46 LMS36 LMS39 LMS26 LMS19 LMS41 ULMS04 ULMS08 ULMS10 ULMS01 ULMS02 ULMS05 ULMS06 ULMS09 ULMS03 ULMS07

c Olaparib (μM) Cisplatin (5 μM) + olaparib (μM) UT 1234512345

SK-LMS-1

SK-UT-1

SK-UT-1B

MES-SA

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0

To be considered high confidence, somatic calls (control genotype 0/0) were in-house pipeline that determines the recurrence of gene-specific mutations and required to either have the Platypus filter flag PASS or pass custom filters allowing scores the different possibilities of mutations per gene. Average gene expression for low variant frequency using a scoring scheme. Candidates with the badReads levels were determined based on the previously calculated fragments per kilobase of flag, alleleBias, or strandBias were discarded if the variant allele frequency was transcript per million mapped reads values, which were calculated using Cuf- <10%. Additionally, combinations of Platypus non-PASS filter flags, bad quality flinks68. The resulting files were used as input for MutSigCV7 and processed using values, low genotype quality, very low variant counts in the tumor, and presence of default parameters. P-values were corrected for multiple hypothesis testing using variant reads in the control were not tolerated. Indels were annotated with the Benjamini−Hochberg procedure, and genes with q < 0.01 were considered ANNOVAR, and somatic high-confidence indels falling into a coding sequence or significantly mutated. splice site were extracted. SNVs and indels in LMS cell lines were called without matched control. In addition to filters and annotations described above, calls were Identification of significantly mutated gene networks. Network analysis was further filtered for variants found in ExAC (version 0.3.1; http://exac. performed using HotNet2 (version 1.0.1)9, and the global interaction network broadinstitute.org) with allele frequencies >0.0001, in the 1000 Genomes Phase 3 (HINT+HI2012) was retrieved from the HotNet2 website (http://compbio- of the Genome Reference Consortium human genome with allele frequencies research.cs.brown.edu/pancancer/hotnet2). For each node (gene) in the global >0.001, and in our in-house control data set (n = 655) in more than 5% of the network, the −log10 P-value from MutSigCV served as the initial heat, which samples. Oncoprints integrating information on SNVs, indels, and CNAs were diffuses to adjacent nodes through edges (known interactions) with a weight δ > generated using the R package ComplexHeatmap66. 0.008450441. Areas accumulating more heat were identified as subnetworks, and significantly mutated subnetworks were determined based on a two-stage multiple Supervised analysis of mutational signatures. Using the package YAPSA (Yet hypothesis test69 and 100 permutations of the global interaction network. Sig- Another Package for Signature Analysis)67, a linear combination decomposition of nificant subnetworks (P < 0.05) were visualized with Cytoscape (version 2.6.2). the mutational catalog with predefined signatures from the COSMIC database (http://cancer.sanger.ac.uk/cosmic/signatures, downloaded in June 2016) was Detection of DNA CNAs. For LMS patient samples, copy numbers were estimated computed by non-negative least squares (NNLS). Prior to decomposition, the from exome data using read-depth plots and an in-house pipeline using VarScan2 mutational catalog was corrected for different occurrences of the triplet motifs copynumber and copyCaller modules. Regions were filtered for unmappable between the whole genome and the target capture regions used for whole-exome genomic stretches, merged by requiring at least 70 markers per called copy number sequencing (function normalizeMotifs_otherRownames() from YAPSA). To event, and annotated with RefSeq genes using BEDTools. High-resolution CNA increase specificity, the NNLS algorithm was applied twice; after the first execution, profiles were generated with CNVsvd (manuscript in preparation), which deter- only those signatures whose exposures, i.e. contributions in the linear combination, mines the total number of fragments from non-overlapping 250-bp windows based were higher than a certain cut-off were kept, and the NNLS was run again with the on the whole-exome sequencing capture design. Systematic variance introduced by reduced set of signatures. As the detectability of different signatures may vary, the sequence context or sequencing technology bias was captured through analysis of a following signature-specific cut-offs were determined in a random operator char- reference data set, i.e. all normal controls with sufficient quality statistics, and these acteristic analysis using publicly available data on mutational catalogs of 7042 estimated local variance components were subsequently used to attenuate sys- cancers (whole-genome sequencing, n = 507; whole-exome sequencing, 6535)6 and tematic variance in all sequenced specimens, including controls. Finally, normal- mutational signatures from COSMIC: AC1, 0; AC2, 0.03404847; AC3, 0.139839; ized fragment count statistics were used to estimate CNA profiles. Segmentation AC4, 0.02281439; AC5, 0; AC6, 0.003660315; AC7, 0.02841319; AC8, 0.1870989; was performed with PSCBS70, segmentation files and windows used for CNA AC9, 0.0953648; AC10, 0.0164065; AC11, 0.08238725; AC12, 0.1920715; AC13, estimation were converted to a compound segmentation file and marker files that 0.03769936; AC14, 0.03080224; AC15, 0.03182855; AC16, 0.3553548; AC17, were used as input for GISTIC2.013, and processing was performed with default 0.004075963; AC18, 0.2692715; AC19, 0.04038686; AC20, 0.05066134; AC21, parameters. For LMS cell lines, copy numbers were estimated from whole-genome 0.04219805; AC22, 0.03908793; AC23, 0.03900049; AC24, 0.04254174; AC25, sequencing data using allele-specific copy number estimation from sequencing 0.02448377; AC26, 0.02830282; AC27, 0.02223076; AC28, 0.0315642; AC29, (ACEseq, manuscript in preparation), which employs tumor coverage and BAF and 0.07392201; AC30, 0.06332517. The cut-offs are also stored in the R package also estimates tumor cell content and ploidy. Allele frequencies were obtained YAPSA and can be retrieved with the following R code: library(YAPSA), data during pre-processing of whole-genome sequencing data for all SNPs recorded in (cutoffs), cutoffCosmicValid_rel_df[6,]. Confidence intervals were computed using dbSNP (build 135), and positions with BAF values between 0.1 and 0.9 in the the concept of profile likelihoods. Likelihoods were computed from the distribution tumor were assumed to be heterozygous in the germline. To improve sensitivity for of the residues after NNLS decomposition (initial model of the data). To compute the detection of allelic imbalances, heterozygous and homozygous SNPs were the confidence interval of a given signature, the exposure to this signature was phased with IMPUTE (version 2)71. In addition, the coverage for 10-kilobase (kb) perturbed and fixed as compared to the initial model, and the exposures to the windows with sufficient mapping quality and read density in an in-house control remaining signatures computed again by NNLS, yielding an alternative model with was recorded for the tumor and corrected for GC content- and replication timing- one degree of freedom less. Likelihoods were again computed from the distribution dependent coverage bias. The genome was segmented using the R package of the residuals of the alternative model. Next, a likelihood ratio test for the log- PSCBS70, and segments were clustered according to coverage ratios and BAF values likelihoods of the initial and alternative models was computed, yielding a test using k-means clustering. The R package mclust was used to determine the optimal statistic and a P-value for the perturbation. To compute the limits of 95% con- number of clusters based on the Bayesian information criterion. Small segments fidence intervals, the perturbations corresponding to P-values of 0.05/2 = 0.025 (<9 kb) were attached to the more similar neighbor. Finally, tumor cell content and (two-sided likelihood ratio test) were computed by a Gauss−Newton method (R ploidy of a sample were estimated by fitting different tumor cell content and ploidy package pracma). The set of mutational signatures extracted from the LMS cohort combinations to the data. Segments with balanced BAF values were fitted to even- was compared to the set of mutational signatures extracted from a background of numbered copy number states, whereas unbalanced segments could also be fitted to 7042 cancer samples (whole-genome sequencing, n = 507; whole-exome sequen- uneven copy numbers. Finally, estimated tumor cell content and ploidy values were cing, n = 6535)6 by Fisher exact tests and subsequent correction for multiple used to compute the total and allele-specific copy number for each segment. comparisons according to the Benjamini−Hochberg method.

Analysis of allele-specific copy number and tumor purity. Allele-specific copy Detection of germline variants. For TP53 and RB1, non-silent coding variants number profiles and tumor purity of LMS patient samples were analyzed with and splice-site mutations with read support in the matched normal control were ASCAT72 and Sequenza73. Input files for ASCAT were generated using an in-house filtered for single-nucleotide polymorphisms (SNPs) recorded in the 1000 Gen- algorithm that extracts fragment counts from tumor and matched normal BAM omes Phase 2 assembly of the Genome Reference Consortium human genome with files at positions listed in dbSNP (build 137), and only sufficiently covered regions allele frequencies >0.001 or in ExAC (version 0.3.1) with allele frequencies >0.0001 with >10 fragments and fragments with an alignment score >30 were considered. and were visually inspected using Integrative Genomics Viewer to rule out For further analysis, SNPs heterozygous in normal samples were used, and allele- sequencing artifacts. specific copy number profiles for matched tumor samples were determined with standard parameters. For Sequenza, standard guidelines as specified in the refer- Identification of driver mutations. Variant Call Format files were processed in ence manual were used. In the majority of cases, allele-specific copy numbers and combination with the whole-exome sequencing capture design BED file using an tumor purity estimates were nearly congruent between ASCAT and Sequenza. For

Fig. 7 Evidence for BRCAness in adult LMS. a Alterations in genes reported as synthetic lethal to PARP inhibition. Rows represent individual genes, columns represent individual tumors. Genes are sorted according to frequency of SNVs, indels, and CNAs (left). Bars depict the number of alterations for individual tumors (top) and genes (right). Types of alterations and treatment history are annotated according to the color codes (bottom). UTR untranslated region. b Contribution of mutational signatures to the overall mutational load in LMS tumors. Each bar represents the number of SNVs explained by the respective mutational signature in an individual tumor. Error bars represent 95% confidence intervals. AC Alexandrov-COSMIC. c Clonogenic assays showing dose- dependent sensitivity of LMS cell lines to continuous olaparib treatment (1–5 µM) with or without prior exposure to a 2-h pulse of cisplatin (5 µM). UT untreated

12 NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0 ARTICLE the remaining cases, optimal allele-specific copy number profiles were selected on TeloTAGGG Telomere Length Assay Kit (Roche) according to the manufacturer’s the basis of tumor purity estimates provided by the pathologist and by comparing instructions. Chemiluminescent signals of amplified C-circles were detected with a tumor purity estimates to the mutations with the most dominant variant allele ChemiDoc MP Imaging System (Bio-Rad). Non-saturated exposures were used for frequencies, which frequently included alterations in TP53. WGD was determined evaluation, and tumor samples were classified as ALT-positive when the signal by taking into account the estimated ploidy and the presence of two or multiple intensity of the complete reaction was at least twofold higher than that of the copies of most parental-specific chromosomes. control without polymerase and at least threefold higher than the background intensity. Detection of chromothripsis. Copy number state profiling of LMS tumors based on exome data to detect the alternating copy number states that are characteristic Telomere quantitative PCR. Telomere quantitative PCR was performed as for chromothripsis was performed with the R package cn.MOPS (version 1.20.0)74 described previously80. Briefly, 10 ng DNA from tumor or control samples was using several exome-specific functions and modifications. The getSegmen- added to 1 µl LightCycler 480 SYBR Green I Master mix (Roche) and 500 nM of tReadCountFromBAM function was used in paired mode within enrichment of kit- each forward and reverse primer in a 10 µl reaction. Primer sequences were as specific target regions, and only properly paired reads with a mapping quality ≥20 follows: Telomere forward: 5′-CGG TTT GTT TGG GTT TGG GTT TGG GTT were used for counting and duplicate reads removed. Tumors and control samples TGG GTT TGG GTT-3′; Telomere reverse: 5′-GGC TTG CCT TAC CCT TAC were compared using the referencecn.mops function adjusted with parameters CCT TAC CCT TAC CCT TAC CCT-3′; 36B4 forward: 5′-AGC AAG TGG GAA from the exomecn.mops function and using the DNAcopy algorithm for seg- GGT GTA ATC C-3′; 36B4 reverse: 5′-CCC ATT CTA TCA TCA ACG GGT ACA mentation. In addition, the obtained referencecn.mops log2 ratios were corrected to A-3′. PCR conditions were as follows: 10 min at 95 °C, 40 cycles of 15 s at 95 °C and account for whole-chromosome gains and losses by shifting the ratio according to 60 s at 60 °C. For each tumor and control sample, a T/S ratio (telomere repeat the proportion of total reads per chromosome related to the normal sample. Copy signals normalized to a single copy gene (36B4)) was determined, and the T/S ratios number plots with corrected log2 ratios were used for chromothripsis inference of tumor samples were divided by those of matched control samples. The calcu- based on previously described criteria75. The rationale for adding a more con- lated log2 ratios represent the increase or decrease in telomere content in the tumor servative cut-off was that the number of copy number switches should be con- sample compared to the control sample. sidered in relation to the size of the affected region, as it is more likely to observe ten copy number switches by chance on chromosome 1 than on chromosome 22 Clonogenic assays. LMS cell lines (SK-LMS-1, SK-UT-1, MES-SA, 1×103; SK-UT- due to the size difference. Briefly, copy number plots were evaluated by counting 1B, 2×103) were seeded in six-well plates, and treatment with dimethyl sulfoxide switches between copy number states per chromosome independently by two (DMSO) or olaparib (1–5 µM; Selleck) was initiated 24 h after seeding and con- researchers. The main criterion for calling chromothripsis was that the ratio of the tinued for 10 days, with drug replenishment and medium change every 2 days. number of alternating copy number state switches and the length of the affected Pretreatment with cisplatin (5 µM; Selleck) was performed for 2 h. Thereafter, cells region on an individual chromosome in Mb was higher than 0.2, which corre- were washed with phosphate buffered saline (PBS) and incubated with DMSO or sponds to at least ten alternating switches within 50 Mb. In this study, the mini- olaparib as described above. Following drug treatment, cells were fixed with chilled mum number of switches required for calling chromothripsis was set to 6, which methanol for 10 min, stained with 0.5% crystal violet in 25% methanol for 15 min, should then have occurred within 30 Mb or less to satisfy the ≥0.2 cut-off. and photographed after overnight drying.

Analysis of transcriptome sequencing data. After mapping of transcriptome Data availability. Sequencing data were deposited in the European Genome- data as described above, expression levels were determined per gene and sample as phenome Archive under accession EGAS00001002437. RPKM using RefSeq as gene model. For each gene, overlapping annotated exons from all transcript variants were merged into non-redundant exon units with a custom Perl script. Non-duplicate reads with mapping quality >0 were counted for Received: 21 June 2017 Accepted: 13 December 2017 all exon units with coverageBed from the BEDtools package76. Read counts were summarized per gene and divided by the combined length of its exon units (in kb), and the total number of reads (in millions) was counted by coverageBed. HTSeq- count (version 0.6.0)77 was used to generate read count data at the exon level using a minimum mapping score of 1 and intersection non-empty mode and GENCODE References version 17 as gene model. Size factor and dispersion estimation were calculated for 1. Coindre, J. M. et al. Predictive value of grade for metastasis development in the raw count data before performing Wald statistics using DESeq278. Regularized main histologic types of adult soft tissue sarcomas: a study of 1240 patients logarithm transformation was used for visualization and clustering of read count from the French Federation of Cancer Centers Sarcoma Group. Cancer 91, data. Unsupervised hierarchical clustering was performed using the 100 most – variable genes. Normalized read count values for individual genes were centered 1914 1926 (2001). and scaled (z-score), and quantile discretization was performed. Complete-linkage 2. Penel, N. et al. Testing new regimens in patients with advanced soft tissue sarcoma: analysis of publications from the last 10 years. Ann. Oncol. 22, analysis with Euclidean distance measure was used for clustering. The heatmap was – generated using the R package pheatmap. Principal component analysis was per- 1266 1272 (2011). fi formed using singular value decomposition (prcomp) on the 1000 most variable 3. Agaram, N. P. et al. Targeted exome sequencing pro les genetic alterations in – genes to examine the co-variances between samples. Somatic SNVs and indels were leiomyosarcoma. Genes Chromosomes Cancer 55, 124 130 (2016). annotated with RNA information by generating a pileup of the RNA BAM file 4. Yang, C. Y. et al. Targeted next-generation sequencing of cancer genes fi using SAMtools. Variants were considered expressed if they were present in at least identi ed frequent TP53 and ATRX mutations in leiomyosarcoma. Am. J. one high-quality RNA read. Fusion transcripts were determined using the TopHat2 Transl. Res. 7, 2072–2081 (2015). post-alignment pipeline79, and candidates with a score >300 were selected for 5. Yang, J. et al. Genetic aberrations in soft tissue leiomyosarcoma. Cancer Lett. further analysis. Circos plots were drawn with the R package OmicCircos. 275,1–8 (2009). 6. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. – Validation of fusion transcripts. Fusion transcripts were validated using RT-PCR Nature 500, 415 421 (2013). 7. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for and Sanger sequencing. RNA was reverse-transcribed using the High-Capacity – cDNA Reverse Transcription Kit (Applied Biosystems). Breakpoint-spanning pri- new cancer-associated genes. Nature 499, 214 218 (2013). mers were designed manually and examined for secondary structures using mfold 8. Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, – and off-target binding using Primer-BLAST. Melting temperatures of primers were 177 183 (2004). fi calculated using the thermodynamic parameters of SantaLucia. Amplifications 9. Leiserson, M. D. et al. Pan-cancer network analysis identi es combinations of were carried out using Taq DNA Polymerase (Qiagen) according to the manu- rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, facturer’s instructions. PCR products were visualized in 1% agarose gels and 106–114 (2015). purified using the QIAquick PCR Purification Kit or the QIAquick Gel Extraction 10. Wood, L. D. et al. The genomic landscapes of human breast and colorectal Kit (Qiagen). Direct sequencing was performed with the forward or reverse primer cancers. Science 318, 1108–1113 (2007). of the respective amplification. 11. Italiano, A. et al. Genetic profiling identifies two classes of soft-tissue leiomyosarcomas with distinct clinical characteristics. Clin. Cancer Res. 19, – C-circle analysis. C-circle analysis was performed as described previously29. 1190 1196 (2013). Briefly, 30 ng genomic DNA from tumor samples was incubated with 1 x Φ29 12. Chibon, F. et al. Validated prediction of clinical outcome in sarcomas and Buffer, 0.2 mg/ml bovine serum albumin (BSA), 0.1% (v/v) Tween 20, 1 mM of each multiple types of cancer on the basis of a gene expression signature related to – deoxyadenosine triphosphate (dATP), deoxyguanosine triphosphate (dGTP) and genome complexity. Nat. Med. 16, 781 787 (2010). fi thymidine triphosphate (dTTP) , and with or without 7.5 U Φ29 DNA polymerase 13. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and con dent localization of for 8 h at 30 °C, followed by inactivation for 20 min at 65 °C. After addition of 2x the targets of focal somatic copy-number alteration in human cancers. Genome SSC, the DNA was dot-blotted with a 96-well dot blotter (Bio-Rad) onto a nylon Biol. 12, R41 (2011). membrane (Carl Roth), which was dried immediately and baked for 20 min at 120 ° 14. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single C. Hybridization, wash steps, and development were performed using the catastrophic event during cancer development. Cell 144,27–40 (2011).

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0

15. Guo, X. et al. Clinically relevant molecular subtypes in leiomyosarcoma. Clin. 47. Sansregret, L. & Swanton, C. The role of aneuploidy in cancer evolution. Cold Cancer Res. 21, 3501–3511 (2015). Spring Harb. Perspect. Med. 7, a028373 (2017). 16. Comino-Mendez, I. et al. Exome sequencing identifies MAX mutations as a 48. Zhang, C. Z., Leibowitz, M. L. & Pellman, D. Chromothripsis and beyond: rapid cause of hereditary pheochromocytoma. Nat. Genet. 43, 663–667 (2011). genome evolution from complex chromosomal rearrangements. Genes Dev. 27, 17. Adhikary, S. & Eilers, M. Transcriptional regulation and transformation by Myc 2513–2530 (2013). proteins. Nat. Rev. Mol. Cell Biol. 6, 635–645 (2005). 49. Lee, P. J. et al. Spectrum of mutations in leiomyosarcomas identified by clinical 18. Bouchard, C. et al. Regulation of cyclin D2 gene expression by the Myc/Max/ targeted next-generation sequencing. Exp. Mol. Pathol. 102, 156–161 (2017). Mad network: Myc-dependent TRRAP recruitment and histone acetylation at 50. Makinen, N. et al. Exome sequencing of uterine leiomyosarcomas Identifies the cyclin D2 promoter. Genes Dev. 15, 2042–2047 (2001). frequent mutations in TP53, ATRX, and MED12. PLoS Genet. 12, e1005850 19. Dewhurst, S. & Swanton, C. Tetraploidy and CIN: a dangerous combination. (2016). Cell Cycle 14, 3217 (2015). 51. Braun, D. M., Chung, I., Kepper, N. & Rippe, K. TelNet—a database for human 20. Dewhurst, S. M. et al. Tolerance of whole-genome doubling propagates and yeast genes involved in telomere maintenance. bioRxiv, https://doi.org/ chromosomal instability and accelerates cancer genome evolution. Cancer 10.1101/130153 (2017). Discov. 4, 175–185 (2014). 52. Linch, M., Miah, A. B., Thway, K., Judson, I. R. & Benson, C. Systemic 21. Turajlic, S., McGranahan, N. & Swanton, C. Inferring mutational timing and treatment of soft-tissue sarcoma-gold standard and novel therapies. Nat. Rev. reconstructing tumour evolutionary histories. Biochim. Biophys. Acta 1855, Clin. Oncol. 11, 187–202 (2014). 264–275 (2015). 53. Tap, W. D. et al. Olaratumab and doxorubicin versus doxorubicin alone for 22. Barthel, F. P. et al. Systematic analysis of telomere length and somatic treatment of soft-tissue sarcoma: an open-label phase 1b and randomised phase alterations in 31 cancer types. Nat. Genet. 49, 349–357 (2017). 2 trial. Lancet 388, 488–497 (2016). 23. Bryan, T. M., Englezou, A., Dalla-Pozza, L., Dunham, M. A. & Reddel, R. R. 54. Pignochino, Y. et al. PARP1 expression drives the synergistic antitumor activity Evidence for an alternative mechanism for maintaining telomere length in of trabectedin and PARP1 inhibitors in sarcoma preclinical models. Mol. human tumors and tumor-derived cell lines. Nat. Med. 3, 1271–1274 (1997). Cancer 16, 86 (2017). 24. Cesare, A. J. & Reddel, R. R. Alternative lengthening of telomeres: models, 55. Cuppens, T. et al. Potential targets’ analysis reveals dual PI3K/mTOR pathway mechanisms and implications. Nat. Rev. Genet. 11, 319–330 (2010). inhibition as a promising therapeutic strategy for uterine leiomyosarcomas-an 25. Lafferty-Whyte, K. et al. A gene expression signature classifying telomerase and ENITEC group initiative. Clin. Cancer Res. 23, 1274–1285 (2017). ALT immortalization reveals an hTERT regulatory network and suggests a 56. Dillon, L. M. & Miller, T. W. Therapeutic targeting of cancers with loss of mesenchymal stem cell origin for ALT. Oncogene 28, 3765–3774 (2009). PTEN function. Curr. Drug. Targets 15,65–79 (2014). 26. Heaphy, C. M. et al. Altered telomeres in tumors with ATRX and DAXX 57. Flynn, R. L. et al. Alternative lengthening of telomeres renders cancer cells mutations. Science 333, 425 (2011). hypersensitive to ATR inhibitors. Science 347, 273–277 (2015). 27. Clynes, D. et al. Suppression of the alternative lengthening of telomere pathway 58. Pommier, Y. Drugging topoisomerases: lessons and challenges. ACS Chem. Biol. by the chromatin remodelling factor ATRX. Nat. Commun. 6, 7538 (2015). 8,82–95 (2013). 28. Lovejoy, C. A. et al. Loss of ATRX, genome instability, and an altered DNA 59. Nguyen, G. H. et al. A small molecule inhibitor of the BLM helicase modulates damage response are hallmarks of the alternative lengthening of telomeres chromosome stability in human cells. Chem. Biol. 20,55–62 (2013). pathway. PLoS Genet. 8, e1002772 (2012). 60. Gnyszka, A., Jastrzebski, Z. & Flis, S. DNA methyltransferase inhibitors and 29. Henson, J. D. et al. DNA C-circles are specific and quantifiable markers of their emerging role in epigenetic therapy of cancer. Anticancer Res. 33, alternative-lengthening-of-telomeres activity. Nat. Biotechnol. 27, 1181–1185 2989–2996 (2013). (2009). 61. Root, H. et al. FANCD2 limits BLM-dependent telomere instability in the 30. Dilley, R. L. & Greenberg, R. A. ALTernative telomere maintenance and cancer. alternative lengthening of telomeres pathway. Hum. Mol. Genet. 25, 3255–3268 Trends Cancer 1, 145–156 (2015). (2016). 31. Bryan, T. M., Englezou, A., Gupta, J., Bacchetti, S. & Reddel, R. R. Telomere 62. Muvarak, N. E. et al. Enhancing the cytotoxic effects of PARP inhibitors with elongation in immortal human cells without detectable telomerase activity. DNA demethylating agents—a potential therapy for cancer. Cancer Cell 30, EMBO J. 14, 4240–4248 (1995). 637–650 (2016). 32. Kong, L. J., Meloni, A. R. & Nevins, J. R. The Rb-related p130 protein controls 63. Li, H. et al. The Sequence Alignment/Map format and SAMtools. telomere lengthening through an interaction with a Rad50-interacting protein, Bioinformatics 25, 2078–2079 (2009). RINT-1. Mol. Cell. 22,63–71 (2006). 64. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 33. Chung, I., Osterwald, S., Deeg, K. I. & Rippe, K. PML body meets telomere: the 15–21 (2013). beginning of an ALTernate ending? Nucleus 3, 263–275 (2012). 65. Jones, D. T. et al. Recurrent somatic alterations of FGFR1 and NTRK2 in 34. Jiang, W. Q. et al. Suppression of alternative lengthening of telomeres by pilocytic astrocytoma. Nat. Genet. 45, 927–932 (2013). Sp100-mediated sequestration of the MRE11/RAD50/NBS1 complex. Mol. Cell. 66. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and Biol. 25, 2708–2721 (2005). correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 35. Bassi, C. et al. Nuclear PTEN controls DNA repair and sensitivity to genotoxic (2016). stress. Science 341, 395–399 (2013). 67. Dieter, S. M. et al. Mutant KIT as imatinib-sensitive target in metastatic 36. Gudmundsdottir, K. & Ashworth, A. The roles of BRCA1 and BRCA2 and sinonasal carcinoma. Ann. Oncol. 28, 142–148 (2017). associated proteins in the maintenance of genomic stability. Oncogene 25, 68. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals 5864–5874 (2006). unannotated transcripts and isoform switching during cell differentiation. Nat. 37. Shiloh, Y. ATM and related protein kinases: safeguarding genome integrity. Biotechnol. 28, 511–515 (2010). Nat. Rev. Cancer 3, 155–168 (2003). 69. Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly 38. Lord, C. J. & Ashworth, A. BRCAness revisited. Nat. Rev. Cancer 16, 110–120 mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011). (2016). 70. Olshen, A. B. et al. Parent-specific copy number in paired tumor-normal 39. Mendes-Pereira, A. M. et al. Synthetic lethal targeting of PTEN mutant cells studies using circular binary segmentation. Bioinformatics 27, 2038–2046 with PARP inhibitors. EMBO Mol. Med. 1, 315–322 (2009). (2011). 40. Turner, N., Tutt, A. & Ashworth, A. Hallmarks of ‘BRCAness’ in sporadic 71. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype cancers. Nat. Rev. Cancer 4, 814–819 (2004). imputation method for the next generation of genome-wide association studies. 41. Helleday, T. The underlying mechanism for the PARP and BRCA synthetic PLoS Genet. 5, e1000529 (2009). lethality: clearing up the misunderstandings. Mol. Oncol. 5, 387–393 (2011). 72. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl. 42. Radhakrishnan, S. K., Bebb, D. G. & Lees-Miller, S. P. Targeting ataxia- Acad. Sci. USA 107, 16910–16915 (2010). telangiectasia mutated deficient malignancies with poly ADP ribose polymerase 73. Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles inhibitors. Transl. Cancer Res. 2, 155–162 (2013). from tumor sequencing data. Ann. Oncol. 26,64–70 (2015). 43. Ito, M. et al. Comprehensive mapping of p53 pathway alterations reveals an 74. Klambauer, G. et al. cn.MOPS: mixture of Poissons for discovering copy apparent role for both SNP309 and MDM2 amplification in sarcomagenesis. number variations in next-generation sequencing data with a low false Clin. Cancer Res. 17, 416–426 (2011). discovery rate. Nucleic Acids Res. 40, e69 (2012). 44. Perot, G. et al. Constant p53 pathway inactivation in a large series of soft tissue 75. Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links sarcomas with complex genetics. Am. J. Pathol. 177, 2080–2090 (2010). catastrophic DNA rearrangements with TP53 mutations. Cell 148,59–71 45. Kleinerman, R. A. et al. Risk of soft tissue sarcomas by individual subtype in (2012). survivors of hereditary retinoblastoma. J. Natl. Cancer Inst. 99,24–31 (2007). 76. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing 46. Ognjanovic, S., Olivier, M., Bergemann, T. L. & Hainaut, P. Sarcomas in TP53 genomic features. Bioinformatics 26, 841–842 (2010). germline mutation carriers: a review of the IARC TP53 database. Cancer 118, 77. Anders, S., Pyl, P. T. & Huber, W. HTSeq-a Python framework to work with 1387–1396 (2012). high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

14 NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-02602-0 ARTICLE

78. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and A.S., D.H., I.C., C.S., and S.F. wrote the manuscript, which was reviewed and edited by all dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). co-authors. C.S. and S.F. conceived and supervised the project. 79. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). Additional information 80. O’Callaghan, N., Dhillon, V., Thomas, P. & Fenech, M. A quantitative real-time – Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- PCR method for absolute telomere length. Biotechniques 44, 807 809 (2008). 017-02602-0.

Acknowledgements Competing interests: The authors declare no competing financial interests. The authors thank N. Paramasivam, J. Park, the DKFZ-HIPO and NCT Precision Reprints and permission information is available online at http://npg.nature.com/ Oncology Program (POP) Sample Processing Laboratory, the DKFZ Genomics and reprintsandpermissions/ Proteomics Core Facility, and the DKFZ-HIPO Data Management Group for technical support. We also thank K. Beck, D. Richter, and P. Lichter for infrastructure and pro- Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in gram development within DKFZ-HIPO and NCT POP and D. Braun for providing the published maps and institutional affiliations. TelNet gene list. Tissue samples were provided by the NCT Heidelberg Tissue Bank in accordance with its regulations and after approval by the Ethics Committee of Heidelberg University. This work was supported by grants H018, H021, and H028 from DKFZ- HIPO and NCT POP, as well as by the e:Med Systems Medicine Program of the German Open Access This article is licensed under a Creative Commons Federal Ministry of Education and Research within the CancerTelSys Consortium (grant 01ZX1302). M.A.S. is the recipient of a Rubicon Fellowship from Nederlandse Organi- Attribution 4.0 International License, which permits use, sharing, satie voor Wetenschappelijk Onderzoek (grant 019.153LW.038). D.H. is a member of the adaptation, distribution and reproduction in any medium or format, as long as you give Hartmut Hoffmann-Berling International Graduate School of Molecular and Cellular appropriate credit to the original author(s) and the source, provide a link to the Creative Biology and of the MD/PhD Program of Heidelberg University. C.S. was supported by an Commons license, and indicate if changes were made. The images or other third party ’ Emmy Noether Fellowship from the German Research Foundation. material in this article are included in the article s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory Author contributions regulation or exceeds the permitted use, you will need to obtain permission directly from P.C., I.C., K.I.D., and S.R. designed and performed experiments. P.C., S.S.M., M.A.S., D. the copyright holder. To view a copy of this license, visit http://creativecommons.org/ H., S.-H.W., S.R., M.H., A.E., K.K., L.S., B.Kl., B.B., B.H., and M.R. analyzed and inter- licenses/by/4.0/. preted bioinformatics data. B.Ka., C.E.H., G.E., H.G., S.G., H.-G.K., G.O., B.L., S.B., S.S., A.U., G.M., M.R., P.H., and S.F. contributed patient samples. A.S., W.W., and G.M. performed pathology review. M.Z., M.S., R.E., E.S., R.M.H., S.W., C.v.K., H.G., S.G., K.R., © The Author(s) 2018 B.B., and M.R. provided essential reagents, expertise, and infrastructure. P.C., S.S.M., M.

Priya Chudasama1, Sadaf S. Mughal2,3, Mathijs A. Sanders4,31,31, Daniel Hübschmann5,6,7, Inn Chung8, Katharina I. Deeg8, Siao-Han Wong2, Sophie Rabe1, Mario Hlevnjak9, Marc Zapatka 9, Aurélie Ernst9,10, Kortine Kleinheinz 5, Matthias Schlesner 5, Lina Sieverling2, Barbara Klink 11,12,13, Evelin Schröck11,12,13, Remco M. Hoogenboezem4, Bernd Kasper14, Christoph E. Heilig15, Gerlinde Egerer15, Stephan Wolf16, Christof von Kalle1,10,17,18, Roland Eils 5,6,18, Albrecht Stenzinger10,19, Wilko Weichert20,21, Hanno Glimm1,10,17, Stefan Gröschel1,10,15,17,22, Hans-Georg Kopp23,24, Georg Omlor25, Burkhard Lehner25, Sebastian Bauer26,27, Simon Schimmack28, Alexis Ulrich28, Gunhild Mechtersheimer19, Karsten Rippe 8, Benedikt Brors 2,10, Barbara Hutter 2, Marcus Renner19, Peter Hohenberger14,29, Claudia Scholl1,10,30 & Stefan Fröhling1,10,17

1Division of Translational Oncology, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany. 2Division of Applied Bioinformatics DKFZ and NCT Heidelberg, 69120 Heidelberg, Germany. 3Faculty of Biosciences, Heidelberg University, 69120 Heidelberg, Germany. 4Department of Hematology, Erasmus Medical Center, 3015 CN Rotterdam, The Netherlands. 5Division of Theoretical Bioinformatics DKFZ, 69120 Heidelberg, Germany. 6Department of Bioinformatics and Functional Genomics, Institute of Pharmacy and Molecular Biotechnology, Heidelberg University and BioQuant Center, 69120 Heidelberg, Germany. 7Department of Pediatric Immunology, Hematology and Oncology, Heidelberg University Hospital, 69120 Heidelberg, Germany. 8Research Group Genome Organization and Function, DKFZ and BioQuant Center, 69120 Heidelberg, Germany. 9Division of Molecular Genetics DKFZ, 69120 Heidelberg, Germany. 10German Cancer Consortium (DKTK), 69120 Heidelberg, Germany. 11Institute for Clinical Genetics, Faculty of Medicine Carl Gustav Carus, Technical University Dresden, 01307 Dresden, Germany. 12NCT Dresden, 01307 Dresden, Germany. 13DKTK, 01307 Dresden, Germany. 14Sarcoma Unit, Interdisciplinary Tumor Center Mannheim, Mannheim University Medical Center, Heidelberg University, 68167 Mannheim, Germany. 15Department of Internal Medicine V, Heidelberg University Hospital, 69120 Heidelberg, Germany. 16Genomics and Proteomics Core Facility DKFZ, 69120 Heidelberg, Germany. 17Section for Personalized Oncology, Heidelberg University Hospital, 69120 Heidelberg, Germany. 18DKFZ-Heidelberg Center for Personalized Oncology (HIPO), 69120 Heidelberg, Germany. 19Institute of Pathology, Heidelberg University Hospital, 69120 Heidelberg, Germany. 20Institute of Pathology, Technical University Munich, 81675 Munich, Germany. 21DKTK, 81675 Munich, Germany. 22Research Group Molecular Leukemogenesis DKFZ, 69120 Heidelberg, Germany. 23Department of Hematology and Oncology, Eberhard Karls University, 72076 Tübingen, Germany. 24DKTK, 72076 Tübingen, Germany. 25Department of Orthopedics, Heidelberg University Hospital, 69118 Heidelberg, Germany. 26Sarcoma Center Western German Cancer Center, 45147 Essen, Germany. 27DKTK, 45147 Essen, Germany. 28Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, 69120 Heidelberg, Germany. 29Department of Surgery, Mannheim University Medical Center, Heidelberg University, 68167 Mannheim, Germany. 30Division of Applied Functional Genomics DKFZ, 69120 Heidelberg, Germany. 31Present address: Wellcome Trust Sanger Institute, Hinxton, UK. Priya Chudasama, Sadaf S. Mughal, Mathijs A. Sanders and Daniel Hübschmann contributed equally to this work.

NATURE COMMUNICATIONS | (2018) 9:144 | DOI: 10.1038/s41467-017-02602-0 | www.nature.com/naturecommunications 15