SUPPLEMENTAL INFORMATION

Somatic structural variation targets neurodevelopmental and identifies SHANK2 as a tumor suppressor in neuroblastoma

Gonzalo Lopez1-3+, Karina L. Conkrite1,2+, Miriam Doepner1,2, Komal S. Rathi4, Apexa Modi1,2,11, Zalman Vaksman1,2,4, Lance M. Farra1,2, Eric Hyson1,2, Moataz Noureddine1,2, Jun S. Wei5, Malcolm A. Smith10, Shahab Asgharzadeh6,7, Robert C. Seeger6,7, Javed Khan5, Jaime Guidry Auvil9, Daniela S. Gerhard9, John M. Maris1,2,8,12, Sharon J. Diskin1,2,4,8,12*

1Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA.

2Center for Childhood Research, Children’s Hospital of Philadelphia, Philadelphia, PA, USA.

3Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

4Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA.

5Oncogenomics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.

6Division of Hematology, Oncology and Blood and Marrow Transplantation, Keck School of Medicine of the University of Southern California, Los Angeles, CA, USA.

7The Saban Research Institute, Children’s Hospital of Los Angeles, Los Angeles, CA, USA.

8Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

9Office of Cancer Genomics, National Cancer Institute, Bethesda, MD, USA.

10Cancer Therapy Evaluation Program, National Cancer Institute, Bethesda, MD, USA.

11Genomics and Computational Biology, Biomedical Graduate Studies, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

12Abramson Family Cancer Research Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.

+ Equal contribution

* Corresponding Author: Sharon J. Diskin, Ph.D. 3501 Civic Center Boulevard, CTRB 3026 Philadelphia, PA 19104 Phone: (610) 639-7754 Email: [email protected]

Table of Contents

I. SUPPLEMENTAL METHODS ...... 3 Copy number segmentation, visualization (IGV) and recurrence analysis ...... 3 Filtering of CGI SV calls ...... 3 Mapping and annotation of filtered alignment-based SVs (SJ-BP) ...... 3 Mapping and annotation of DNA amplifications, deep deletions and copy number breakpoints from WGS (RD- BP) and SNP arrays (CN-BP) ...... 4 Co-localization of breakpoints derived from WGS (SJ-BP & RD-BP) and SNP arrays (CN-BP) ...... 4 Tumor mutational burden analyses ...... 5 Filtering and annotation of likely pathogenic somatic SNVs ...... 5 II. SUPPLEMENTAL TABLES ...... 6 Supplemental Table S1: Neuroblastoma patient clinical and biological characteristics...... 6 Supplemental Table S2: Cohort clinical data and sample description (xls file) ...... 7 Supplemental Table S3: Filtered structural variants calls (xls file) ...... 7 Supplemental Table S4: Chromothripsis and high-breakpoint density (xls file) ...... 7 Supplemental Table S5: Recurrently altered genes by alternative breakpoint analyses (xls file) ...... 7 Supplemental Table S6: Orthogonally identified recurrently altered genes (xls file) ...... 7 Supplemental Table S7: SNVs mapping to recurrently altered genes (xls file) ...... 7 Supplemental Table S8: Primer sequences used for Sanger sequencing validations (xls file) ...... 7 Supplemental Table S9: Combined fusion transcripts and DNA structural variations (xls file) ...... 7 Supplemental Table S10: Functional enrichment analysis of recurrently altered genes (xls file) ...... 7 III. SUPPLEMENTAL FIGURES ...... 8 Supplemental Fig S1. CNV and GISTIC analyses for neuroblastomas tumors subtypes...... 8 Supplemental Fig S2. Identification of alignment based structural variants across TARGET datasets...... 9 Supplemental Fig S3. Orthogonal validation of breakpoint detection methods ...... 10 Supplemental Fig S4. TERT rearrangements and expression across neuroblastomas subtypes ...... 11 Supplemental Fig S5. Sanger sequencing validation of rearrangements near TERT gene ...... 12 Supplemental Fig S6. MYCN and ALK structural variants and Sanger validation ...... 13 Supplemental Fig S7. Chromothripsis in 2 associates with MYCN amplification ...... 14 Supplemental Fig S8. Chromothripsis on chromosome 5 associates with TERT rearrangements ...... 15 Supplemental Fig S9. Interchromosomal chromothriptic events absent MYCN and TERT variants ...... 16 Supplemental Fig S10. Variability of SV burden across subtypes ...... 17 Supplemental Fig S11. Schematic of SV breakpoint classification according to impact on known genes ...... 18 Supplemental Fig S12. Recurrently altered genes: PTPRD and ATRX ...... 19 Supplemental Fig S13. Sanger sequencing validation of ATRX deletions ...... 20 Supplemental Fig S14. breakpoints frequently disrupt SHANK2 and DLG2 ...... 21 Supplemental Fig S15. Recurrently altered genes: AUTS2 and CACNA2D3 ...... 22 Supplemental Fig S16. Recurrently altered genes: LINC00910, CDKN2A and PLXDC ...... 23 Supplemental Fig S17. Sanger sequencing validation of SHANK2 structural variants ...... 24 Supplemental Fig S18. Sanger sequencing validation of DLG2 structural variants ...... 25 Supplemental Fig S19. Transcriptional effect of structural variants: eQTL and gene fusions ...... 26 Supplemental Fig S20. Pathway and disease enrichment of genes altered by structural variants...... 28 Supplemental Fig S21. Neurodevelopmental pathways are down-regulated in high-risk neuroblastoma ...... 29 Supplemental Fig S22. Low SHANK2 expression in neuroblastoma is associated with poor survival ...... 30 Supplemental Fig S23. Genes of the postsynaptic density are down-regulated in high-risk neuroblastoma ..... 31 Supplemental Fig S24. SHANK2 expression in neuroblastoma cell lines ...... 32 Supplemental Fig S25. SHANK2 accelerates differentiation of neuroblastoma cells ...... 33 IV. SUPPLEMENTAL REFERENCES ...... 34

2 I. SUPPLEMENTAL METHODS

Copy number segmentation, visualization (IGV) and recurrence analysis CGI “somaticCnvDetailsDiploidBeta” files provide information on estimated ploidy and tumor/blood coverage ratio for every 2-kb along the genome. We used custom scripts to reformat coverage data to be processed with

“copynumber” R bioconductor package 1. We then utilized Winsorization (winsorize) data smoothing and segmentation with piecewise constant segmentation (pcs) algorithm with attributes kmin=2 and gamma=1000.

Segmented data was visualized with IGV and used as input to GISTIC2.02. GISTIC attributes were as follows: - v 30 -refgene hg19 -genegistic 1 -smallmem 1 -broad 1 -twoside 1 -brlen 0.98 -conf 0.90 -armpeel 1 -savegene

1 -gcm extreme -js 2 -rx 0.

Filtering of CGI SV calls The CGI Cancer Pipeline 2.0 produces a full report including quality control, variant calling and CNV analyses.

The “somaticAllJunctionsBeta” files provide unfiltered information for individual junctions detected in a tumor genome that were absent in the corresponding normal genome. We filtered high- confidence somatic variants in entire TARGET repertoire of tumor datasets including neuroblastoma (NBL, n=135), Acute Lymphoblastic

Leukemia (ALL, n=165), Acute Myeloid Leukemia (AML, n=183), Osteosarcoma (OS, n=18) and Wilms Tumors

(WT, n=80). We then removed germline variants that passed CGI filters as well as artifacts and low confidence variants often observed across tumor datasets. We used the Database of Genomic Variants (DGV v. 2016-05-

15, GRCh37) to remove SVs which reciprocal overlap with DGV annotated common events was higher than

50%. We only filtered variants which type matched in both CGI SV set and DGV database. All tumor variants

(before and after filtering) are accessible in the manuscript GitHub repository: https://github.com/diskin-lab- chop/NB_structural_variants/.

Mapping and annotation of filtered alignment-based SVs (SJ-BP) RefSeq gene definitions for hg19 were downloaded from UCSC Genome Browser (10/31/2018) in order to map

SV calls to nearby genes. We used two approaches to map variants. First, numerical changes (tandem- duplications and deletions size <2Mb) containing whole genes were used to define copy number alterations.

Second, we mapped breakpoints relative to gene exonic coordinates. SVs were considered ‘disrupting’ when 3 either one of the breakpoints localized between transcription start and ends of any of isoforms of a gene and

‘proximal’ when localized within 100-kb upstream or 25-kb downstream the most distal isoform of each gene. In addition, we considered ‘intronic’ SVs as those in which both breakpoints mapped to the same intron.

Mapping and annotation of DNA amplifications, deep deletions and copy number breakpoints from WGS (RD-BP) and SNP arrays (CN-BP) Breakpoints were called from segmentation profiles. Due to artifacts observed at subtelomeric and pericentromeric regions, these regions were excluded. For the rest of the genome, a breakpoint was called when the absolute value of the copy number log-ratio difference between contiguous segments is higher than 0.152 for SNP arrays and 0.304 for WGS; both cutoffs account for 10% and 20% copy number change respectively

(i.e for diploid regions, ΔCN = 0.2 and ΔCN = 0.4 respectively). For the WGS dataset, we called amplifications when CN >=8 and deep deletions when CN <= 0.5. For SNP arrays, we used less stringent cutoffs for amplification (CN >= 4.5) and deep deletions (CN <= 0.9). These relaxed criteria were selected due to SNP arrays having a narrower dynamic range and lower resolution. We used RefSeq gene definitions for hg19 downloaded from UCSC Genome Browser (October 31st, 2018 version) to map copy number alterations and breakpoints to nearby genes. Genes were considered amplified or deep deleted when all isoforms were contained within the altered segment boundaries. Breakpoints were considered ‘disrupting’ when the breakpoint localized between transcription start and ends of any isoform of a gene and proximal when localized at within

100-kb upstream or 25-kb.

Co-localization of breakpoints derived from WGS (SJ-BP & RD-BP) and SNP arrays (CN-BP) We studied the co-localization of breakpoints derived from alternative measurements. First, we compared SJ-

BPs and RD-BPs in each of the 135 WGS samples; overall, 30.5% of SJ-BPs co-localize with a RD-BPs

(Supplemental Fig. S3A) whereas 62% RD-BPs matched with SJ-BPs (Supplemental Fig. S3B). We next evaluated the co-localization of breakpoints across WGS and SNP platforms within the subset of 52 overlapping samples. 50.2% of CN-BPs from SNP arrays co-localized with SJ-BPs from the WGS dataset (Supplemental

Fig. S3C) whereas only 8.2% of SJ-BPs co-localize with CN-BPs (Supplemental Fig. S3D). Furthermore, when comparing dosage-based breakpoints across platforms (RD-BP and CN-BP), 23.6% RD-BPs where found co- localizing CN-BP (Supplemental Fig. S3E) whereas 66.6% CN-BP co-localized with RD-BPs (Supplemental 4 Fig. S3F). Finally, we performed a randomized test by sample shuffling (Ni=1000) in order to evaluate whether each of the co-localization percentages listed above could arise by chance or due to recurrence of SVs across samples. All randomized percentage distributions range between 0.7% and 2.3%; in all cases the null hypothesis was discarded (p-value < 0.001, Supplemental Fig. 3G-L).

Tumor mutational burden analyses We obtained measures representative of the burden of different mutation types under study (including SNVs,

SVs and BPs). To this end, the density of mutations of every type is calculated as the average number of mutations in a given sample per sequence window (10Mb for SVs and BPs and 1Mb for SNVs). Instead of a single density value per sample we measure mutational densities for each chromosomal arm, excluding short arms with very low mappability (Chrs 13p, 14p, 15p, 21p, 22p and Y). The remaining 41 chromosomal arms in each sample represent single sample distributions of mutational densities from which quantiles are obtained. We used the interquartile mean (IQM) since it offered a measure robust against outliers while conserving the variability across samples even in low-density breakpoint samples.

Filtering and annotation of likely pathogenic somatic SNVs The CGI cancer pipeline 2.0 provides somatic variant calls for SNVs and small indels. Given the gapped nature of CGI reads which leads to high noise to signal ratio, we incorporated additional SNV filtering. We first annotated

CGI SNV calls using Variant Effect Predictor (VEP) pipeline 3. Our filter follows two steps: 1) collect high quality somatic non-synonymous coding variants (Phred like Fisher’s exact test P<0.001) annotated as having a moderate or high functional impact; this set of variants was combine with COSMIC catalogue of pathogenic variants (release v84). 2) Hot-spot analysis of variants from our combined catalogue (step 1) to identify both clonal and low allele frequency pathogenic variants.

5 II. SUPPLEMENTAL TABLES

Supplemental Table S1: Neuroblastoma patient clinical and biological characteristics.

Expression array Covariate WGS (CGI) SNP (Illumina) RNA-seq (Illumina) (Affy-HuEx)

650 (Stage 4) 98 (Stage 3) 105 (Stage 4) 214 (Stage 4) 26 (Stage 2B) 126 (Stage 4) 6 (Stage 3) 1 (Stage 3) INSS Stage 16 (Stage 2A) 6 (Stage 3) 1 (Stage 2B) 30 (Stage 1) 82 (Stage 1) 21 (Stage 4S) 23 (Stage 4S) 2 (Unknown) 37 (Stage 4S) 5 (Unknown) MYCN 29 (Amp) 241 (Amp) 31 (Amp) 60 (Amp) amplification 106(Not-Amp) 670(Not-Amp) 121 (Not-Amp) 185 (Not-Amp) status 1(Unknown) 3(Unknown) 1(Unknown 2 (Unknown)

695 (High-risk) 106 (High-risk) 127 (High-risk) 215 (High-risk) COG Risk 70 (Inter-Risk) 14 (Inter-Risk) 13 (Inter-Risk) 30 (Low-risk) Group 145 (Low-risk) 15 (Low-risk) 13 (Low-risk) 2 (Unknown) 4 (Unknown) 454 (Alive) 105 (Alive) 74 (Alive) 75 (Alive) Vital status 353 (Dead) 140 (Dead) 61 (Dead) 78 (Dead) 107 (Unknown) 2 (Unknown) Age at 265 (< 18 months) 32 (< 18 months) 29 (< 18 mo/old) 32 (< 18 mo/old) diagnosis 646 (>18 months) 103 (>18 months) 124 (>18 mo/old) 215 (>18 mo/old) group 3 (Unknown)

52 (Female) 408 (Female) 64 (Female) 106 (Female) Gender 83 (Male) 506 (Male) 89 (Male) 141 (Male)

6

Supplemental Table S2: Cohort clinical data and sample description (xls file) A survey of all samples used in this study including clinical information and availability for each genomic profiling platform.

Supplemental Table S3: Filtered structural variants calls (xls file) Resulting SVs in neuroblastoma tumors after additional quality control filtering.

Supplemental Table S4: Chromothripsis and high-breakpoint density chromosomes (xls file) Quantification of different measures of structural variations per chromosomal arm used to infer chromothripsis in WGS samples.

Supplemental Table S5: Recurrently altered genes by alternative breakpoint analyses (xls file) Survey of genes affected by structural variants (SVs) obtained by alternative breakpoint analyses and cohorts (SJ-BP, RD-BP and CN-BP). Altered genes are classified into different lists according to SV impact on “coding” or “non-coding” regions at each gene loci.

Supplemental Table S6: Orthogonally identified recurrently altered genes (xls file) List of 77 genes disrupted by structural variants (SVs) with alignment (SJ-BP) and read-depth (RD-BP) based evidences from the WGS dataset.

Supplemental Table S7: SNVs mapping to recurrently altered genes (xls file) Pathogenic non-synonymous SNVs found in the coding regions and splice sites of recurrently altered genes. Mutation Annotation Format (MAF) data file.

Supplemental Table S8: Primer sequences used for Sanger sequencing validations (xls file) The table informs about sequences used for Sanger validation of SVs of ALK, ATRX, DLG2, SHANK2 and TERT; includes Junction Id which relates to variants described in Supplementary Table 3.

Supplemental Table S9: Combined gene fusion transcripts and DNA structural variations (xls file) List of gene fusions: junction location, gene fusion method used to identify the event including: STARfusion, fusion-CATCHER and DeFUSE (DCC). See Methods.

Supplemental Table S10: Functional enrichment analysis of recurrently altered genes (xls file) Enrichment analysis results for genes harboring recurrent SVs. Analysis derived from ToppGene (see Methods). Separate tabs are included for each of the SV detection methods, and separately for coding and non-coding.

7 III. SUPPLEMENTAL FIGURES

Supplemental Fig S1. CNV and GISTIC analyses for neuroblastomas tumors subtypes. Integrated Genome Viewer (IGV) visualization of DNA copy number gains (red) and losses (blue) across neuroblastoma subtypes in the (A) WGS and (B) SNP datasets. (C-H) GISTIC q-plot analysis neuroblastoma subtypes indicating frequent deletions (blue) and gains (red) obtained from combined LOWR+INTR tumors

8 profiled with (C) WGS and (D) SNP arrays, MNA group profiled with (E) WGS and (F) SNP arrays and HR-NA group profiled with (G) WGS and (H) SNP arrays.

Supplemental Fig S2. Identification of alignment based structural variants across TARGET datasets. (A) distribution of the number of discordantly aligned read pairs supporting each structural variant across TARGET histotypes. (B-D) Structural Variant genomic sizes for (B) deletions, (C) tandem-duplications and (D) inversions + probable-inversion.

9

Supplemental Fig S3. Orthogonal validation of breakpoint detection methods (A-F) Cross platform co-localization of breakpoints across 52 samples with overlapping WGS and SNP profiles. Overall co-localization percentages at the right-side panel of each bar plot. (G-L) Randomized empirical distributions of co-localization percentages using sample shuffling; every null distribution allows calculating empirical p-values for the observed overall co-localization percentages panels A-F.

10 A CN loss CN gain B PAPUTN AMP PAPVXS s AMP Inversion àchr5:43,336,839 PARXLM AMP PARVME s Deletion à chr5:13982706 6 PAMNLH Inversion *** PARNTS Deletion à chr5:51149685 5 PARNCW s Distal-duplicationà chr5:15831020 *** PATEPF s Interchromosomal à chr10:8105872 4 PANNMS Interchromosomal à chr10:78,722,055 *** PARHYL Deletion à chr5:67939809 3 (tpm +1) Complexà chr4:175056852-175061857 expression

PASCKI 2 PATDXC s Interchromosomal à chr2:29,934,976 2 s Duplication à175,817,425

PAPSKM Log PAPEAV TERT 1 PASNEF Deletion à chr5:51077911 PARMTT Interchromosomal à chr11:83,805,372 PASCUF Complex à chr5:20,799,905+34,404,232 0 PATBMM s Complex à chr5:17,411,996+19,293,322 TERT SV + - - - 26 WGS samples WGS 26 PASTKC s Deletion à chr5:1295433 MYCN amp -1 2+ 3- 4 - PAPTLD Complex à chr5:10,353,941-10,354,000 Stage 4 4 4 4S PASATK s Interchromosomal à chr7:92,245,556 PARACM Interchromosomal à chr6:22,883,211 PARAMT s Inversion à chr5:168,243,055 PASFGG s Inversion à chr5:158,124,988 PASBEN Interchromosomal à chr4:175,056,852+175,236,989 PARSRJ Interchromosomal à chr2:17,018,824 PALHVD PALFDZ C PASGPY PAPUTN 8.0 PASVUB *** PADIEY PALWIP PAKPAL 7.5 PAKVTK PALUKI PANEYM PATPPU 7.0 expression

15 SNP samples SNP 15 PATRHD PAURZH PAITCI - - - TERT 6.5 + - - CLPTM1L - TERT SV + - - - - SLC6A19+ TERT- - + + - LPCAT1 MYCN amp tert_sv- mycn+ wt_wt- st1- - + SLC12A7 - - - + Stage 4 4 4 1 - + SLC6A18 - - + + + - - NKD2 - - ++ SLC6A3 + - + 600000 800000 1000000 1200000 1400000 1600000 1800000 600000 800000 1000000 1200000 1400000 1600000 1800000

ChromosomeChromosome 5 5

Supplemental Fig S4. TERT rearrangements and expression across neuroblastomas subtypes (A) SVs at the TERT in 25 WGS samples based on RD-BPs combined with SJ-BPs and 15 tumors profiled by SNP array using CN-BPs. Nearest upstream TERT junctions derived by sequencing are highlighted in text; ‘S’ at the left of the panel indicates positive validation by Sanger sequencing. (B) Expression of TERT in primary tumors with different genetic backgrounds obtained from RNA-seq TARGET cohort. (C) Expression of TERT in primary tumors with different genetic backgrounds obtained from HumanExon arrays TARGET cohort.

11

Supplemental Fig S5. Sanger sequencing validation of rearrangements near TERT gene

Primers and additional details described in Supplemental Table S8.

12 A F

MNA M

F NA - HR M

F - OSR1 M

LR/IR - NT5C1B - RDH14 + MSGN1 - SMC6 + MYCN + DDX1 + LOC653602 + FAM84A - NT5C1B-RDH14 + KCNS3 + GEN1 - RAD51AP2 - NBAS + VSNL1 - FAM49A

1.3´107 1.4´107 1.5´107 1.6´107 1.7´107 1.8´107 1.9´107 2.0´107 1.3e+07 1.4e+07 1.5e+07 1.6e+07 1.7e+07 1.8e+07 1.9e+07 2e+07 B PARNTS S

MNA PARETE S PATDXC NA

- PARAHE

HR PARKGJ S

MYCN ALK 1.3´107 1.8´107 2.3´107 2.8´107 3.3´107

C PARKGJ chr2:29683820+/chr2:30272560+ PARNTS chr2:16015139-/chr2:29980931- Chr 2 left Chr 2 right Chr 2 left Chr 2 right

PARETE chr2:28157496-/chr2:29843162- Chr 2 left Transition seq

PARETE chr2:16065415+/chr2:29901138+ Chr 2 left Chr 2 right

Transition Chr 2 right

Supplemental Fig S6. MYCN and ALK structural variants and Sanger validation (A) IGV track visualization plot shows copy number segmentation data around the MYCN amplicon (Chr 2p.24) region. Three groups separated at the left margin: MNA (red), HR-NA (orange) and S4s (dark green). Left margin separates samples by gender within each tumor group: female (light blue) and male (light green). (B) IGV track visualization plot shows copy number segmentation data comprising MYCN and ALK loci in cases with ALK associated SVs; variant position and type highlighted for ALK; “S” at the left of the panel indicates that Sanger sequencing validation is availabe. (C) Sanger sequencing validation of rearrangements near ALK gene in 3

13 cases with available DNA (PARETE harbored multiple SVs). Primers and additional details described in Supplemental Table 8.

Supplemental Fig S7. Chromothripsis in chromosome 2 associates with MYCN amplification Circos plots representing copy number and large structural variants (<100Kb) in samples with high BP and SV density indicative of chromothripsis. Samples with MYCN amplification are highlighted in red text and TERT rearrangements in blue text.

14

Supplemental Fig S8. Chromothripsis on chromosome 5 associates with TERT rearrangements Circos plots representing copy number and large structural variants (<100Kb) in samples with high BP and SV density indicative of chromothripsis as derived from Supplemental Table S3. Samples with TERT rearrangements are highlighted in blue text.

15

Chr 1 Chr 10

PARHYL PASCUF

Chr 11 Chr 11 Chr X

PARFWB PASPER PARBGP

Supplemental Fig S9. Interchromosomal chromothriptic events absent MYCN and TERT variants Circos plots representing copy number and large structural variants (<100Kb) in samples with high BP and SV density indicative of chromothripsis as derived from Fig. 3B.

16

Supplemental Fig S10. Variability of SV burden across subtypes (A) Number of variants (top) and frequency (bottom) of SVs by variant type across different TARGET histotypes. (B) Number of variants (top) and frequency (bottom) of SVs by variant type across different neuroblastoma subtypes (C) Localization of structural variants in Chromosome 2 represented as a histogram overlaying MNA (red) and HR-NA (orange) tumors. (D) A comparison of the differential frequencies between MNA and HR-NA of structural variants by type when chromosome 2 is ignored (left) versus included (right); orange bars indicate higher frequency in HR-NA and red bars indicate higher frequencies in MNA; p-values calculated by Wilcoxon rank-sum test. (E-F) By chromosome comparison between MNA and HR-NA of the inter-quantile average

17 number of SVs including (E) inversions and (F) deletions (G-H) Mutational burden analysis for (G) SV and (H) SNVs using the Inter-quantile mean (IQM) burden across 41 chromosome arms.

A Upstream proximal region UTRs Downstream proximal region (100Kbp) (25Kbp) exons Likely 3’ 5’ functional TSS Stop effect GAIN/DUPLICATION DELETION Subgenic-gain Unk INVERSION Subgenic-inversion LoF TRANSLOCATION Coding Subgenic-deletion LoF Chr@:pos### Intragenic-translocation Unk

Focal-deletions LoF Copy Focal-gain GoF Number Proximal-deletion Unk Proximal-gain Unk Proximal Proximal-inversion Unk Chr@:pos### Proximal-translocation Unk Intronic events Unk Intronic 3’ 5’

Near promoter region Gene spanning Near 3’ UTR (1Kbp) region region (100bp)

B Upstream proximal region UTRs Downstream proximal region (25Kbp) (100Kbp) exons 3’ 5’ TSS Stop GAIN Unk Coding DELETION Unk breakpoint AMPLIFICATION Unk Proximal DEEP DELETION Unk breakpoint LoF Deep deletion GoF Amplification 3’ 5’

Near promoter region Near 3’ UTR region Gene spanning region (1Kbp) (100bp)

Supplemental Fig S11. Schematic of SV breakpoint classification according to impact on known genes Schematic representation of SVs derived from (A) junction breakpoints (SJ-BPs) classified according to their impact on known genes and (B) read-depth and copy number breakpoints (RD-BPs, CN-BPs) classified according to their impact on known genes.

18 A

PATHVK PATESI PASYPX−28461398 PASYPX−18056492 PASYPX−971041 PASWIJ PASUYG PASBEN PARZIP−chr3−91794 PARYNK PARKGJ PARIKF−23798973 PAREGK PARDUJ PARBLH PARBGP PARAMT PARAHE−7920213 PAPVXS PAPVXS PAPSEI PAPSEI PANLET PANLET PAMNLH PAMNLH PAMNLH

8290000 8774600 9259200 9743800 10228400

PTPRD + + + + − − −

8290000 8774600 9259200 9743800 10228400

Chromosome 9 B

PATNKP

PATGLU−76723471

PATCFL

PASYLD

PASTCN

PASRFS

PASKJX

PARYNK−75107059

PARUXY

PARGUX

PAPKXS

PANLET

76740000 76821200 76902400 76983600 77064800 ATRX − −

76740000 76821200 76902400 76983600 77064800

Chromosome X

Supplemental Fig S12. Recurrently altered genes: PTPRD and ATRX Genomic visualization of sequence junction based structural variants in (A) PTPRD and (B) ATRX loci. Boxes represent the spanned region of deletions (blue), duplications (red) and inversions (yellow). Translocations are represented as vertical line followed by the sample names and genomic coordinate of the destination region. Coordinates of structural variants outside the plot genomic span also included in the labels.

19

PASRFS chrX:76925429+//chrX:77001789+ Chr X left Chr XX rightright

PASTCN chrX:7696379+/AGCT//AGCT/chrX:76979938+ Chr X left Chr XX rightright

PASYLD chrX:76918554+/chrX:7692331+ Chr X left Chr X right

PATCFL chrX:76928010+/chrX:76952072+ Chr X left Chr X right

Supplemental Fig S13. Sanger sequencing validation of ATRX deletions Primers and additional details described in Supplemental Table S8.

20

p15p15 p14p14 p13p13 p12p12 p11p11 q11q11 q12q12 q13q13 q14q14 q21q21 q22q22 q23q23 q24q24 q25q25 1.52

1.01

00 LRR -1.0−1 SHANK2 Chromosome 11 DLG2 Chromosome 11-1.5−2 0.0e+000.0e+00 2.0e+072.0e+07 4.0e+074.0e+07 6.0e+076.0e+07 8.0e+078.0e+07 1.0e+081.0e+08 1.2e+081.2e+08 1.4e+081.4e+08

Supplemental Fig S14. Chromosome 11 breakpoints frequently disrupt SHANK2 and DLG2 Chromosome 11 segmentation plot for HR-NA tumors obtained from WGS highlights regions of high breakpoint density near SHANK2 and DLG2 loci.

21 PASYPX PASXRG−chr17−18781074 PASUYG PASYPX PASSRS PASXRG−chr17−18781074 PASUYG PASKJX−chr7−70167451 PASSRS PASKJX−chr7−70166237 PASKJX−chr7−70167451PASGAP−chr4−49084152 PASKJX−chr7−70166237 PASATK PASGAP−chr4−49084152 PARZCJ PASATK PARSEA PARZCJ PARSEA PARIRD PASYPX A PARIRD PARDUJ PASXRG−chr17−18781074 PARBLH PARDUJ PASUYGPASYPX PASSRS PASXRG−chr17-chr17−18781074- PARBLH 18781074 PASKJXPAPZYPPASUYG−chr7−70167451 PASSRS PASKJX−chr7−70166237 PAPZYP PASKJX−chr7−70167451 PAPUWY−chr12−6680081 PASKJXPASGAP−chr4−49084152 PASKJX−chr7−70166237 PASATK PAPUWY−chr12PASGAP−6680081 -chr4-49084152 PASGAP−chr4−49084152 PAPUWY−chr12−6680228 PARZCJ PASATPASATK PARSEA PAPUWY−chr12−6680228 K PARZCJ PARIRD PAPTLD PARSEA PAPTLD PARDUJ PARIRDPARIRD PARBLH PARDUJ PARBLH PAPZYP PASRFS−chr16−58245199 PAPUWY−chr12−6680081 PAPZYPPAPZYP 68960000 68960000 6922380069223800 6948760069487600 6975140069751400PAPUWY−chr12PAPUWYPAPUWY−700152006680228−chr12-chr12−700152006680081-668008170279000 70279000 PAPTLD PAPUWY−chr12-−chr126680228-6680228 PAPTLD PAPUWY-chr12-6680228

68960000 69223800 69487600 69751400 70015200 70279000 68960000 69223800 69487600 69751400 70015200PARSRJ 70279000 − − + + + − AUTS2 −+ + ++ ++ + 68960000 69223800 69487600 69751400 70015200 70279000 68960000 68960000 6922380069223800PARDCK−chr17−278299846948760069487600 6975140069751400 70015200 70015200 70279000 70279000 ChromosomeChromosome68960000 7 7 69223800 69487600 69751400 70015200 70279000 ChromosomeChromosome 7 Chromosome 7

B PAISNS−chr17−42657036 PASRFS Interchromosomal-chr16:58245199 PALTXM PARDCK Interchromosomal-chr17:27829984 PAKHHZ PARMMT PADSEC PAMZYM PADNFT 54060000 54275400 54490800 54706200 54921600 PAUFUS PATINJ - - PALFIT - - PAISNS Interchromosomal-chr17:- 42657036 PARSRJ - Deletion − - − PALWVJ − - + − CACNA2D3 + +

5405000054060000 5426540054275400 5448080054490800 5469620054706200 5491160054921600 55127000 54050000 54265400 54480800 54696200 54911600 55127000 ChromosomeChromosome 3

Supplemental Fig S15. Recurrently altered genes: AUTS2 and CACNA2D3 (A) Sequence junction based structural variants in AUTS2 locus. Boxes represent the spanned region of deletions (blue) and duplications (red) and inversions (yellow). Translocations are represented as vertical line followed by the sample names and genomic coordinate of the destination region. Coordinates of structural variants outside the plot genomic span also included in the labels. (B) IGV copy number visualization of CACNA2D3 locus with sequence junctions overlay.

22 PATGWT−29228585 PASWVY−chr19−39749837 PASNZU−chr8−138985490 PASBEN−41576721 PARZHA−chr14−52630082 PARXLM−chr4−166214463 PATGWT−29228585 PARXLM−chr1−160082488 PASWVY−chr19−39749837 PARXLM−chr1−43091105 PASLGS−21861767PASNZU−chr8−138985490 PARIRD−chr2−88681099 PASBEN−41576721 PARZHA−chr14−52630082 PAREGK−chr7−111036236 PARXLM−chr4−166214463 PARDCK−chr21−24825596PARXLM−chr1−160082488 PAPVFD−PARXLMchr19−−39749831chr1−43091105 PARIRD−chr2−88681099 PASCKI−21486755 A PAPVFD−chr13−46948725PAREGK−chr7−111036236 PAPTIPPATGWTPARDCK−−chr11chr21−29228585−24825596−83028989 PASWVY−chr19−39749837 PAPVFD−chr19−39749831 PASNZU−chr8−138985490PAPSKM−77834261−chr17−77834261 PAPVFD−chr13−46948725 PASBEN−41576721 PARZHA−chr14−52630082 PARXLM−chr4−166214463PANLET−chr6−41585241PAPTIP−chr11−83028989 PARXLM−chr1−160082488 PARXLM−chr1−43091105 PAPSKM−77834261−chr17−77834261 PARIRD−chr2−88681099 PANLET−chr6−41585241 PAREGK−chr7−111036236 PARKGJ PARDCK−chr21−24825596 41420000 41448800 PAPVFD41477600−chr19−39749831 41506400PASJYB−chr19−3536203 41535200 41564000 PAPVFD−chr13−46948725 41420000 41448800 PAPTIP41477600−chr11−83028989 41506400 PASLGS−2186176741535200 41564000 PAPSKM−77834261−chr17−77834261 PANLET−chr6−41585241

41420000 41448800 41477600 41506400 + MIR2117+41535200 41564000 LINC00910 + ARL4D + − − PASCKI−21486755 − − + + − − 4144880041420000 4144880041477600 4147760041506400 41506400 4153520041535200 4156400041564000 41420000 41448800 41477600 41506400 41535200 41564000 ChromosomeChromosome 17 17 PARKGJ Chromosome 17 21940000 21970400 22000800 22031200 22061600 PAISNS22092000−59540655 B

PASLGS PASLGS−21861767

PASCKI PASCKI−21486755 21940000 21970400 22000800 22031200 22061600 22092000 PARKGJ PARKGJ − 21940000 21970400 22000800− 22031200 22061600 22092000 CDKN2A − − − − − − − CDKN2B − +− + − + − + +− 21940000 21970400 + + 22000800 22031200 22061600 22092000 + + Chromosome 9 + + + PASJYB−chr19−3536203 2194000021940000 2197040021970400 2200080022000800 2203120022031200 2206160022061600 22092000 ChromosomeChromosome21940000 19 9 21970400 22000800 22031200 22061600 22092000 C 37190000 37232600 37275200 37317800 37360400 PAISNS−5954065537403000

PAISNS Chromosome 9 Duplicationsà59540655 PALGP PASJYBL (WGS) Interchromosomalàchr19:3536203 PASJYB (SNP)37190000 37232600 37275200 37317800 37360400− 37403000 + STAC2 − RPL19 − − PLXDC1 − + − − CACNB1 − − ARL5C− + + − 3719000037190000 3723260037232600 3727520037275200 3731780037317800 3736040037360400 37403000

ChromosomeChromosome 17 17 37190000 37232600 37275200 37317800 37360400 37403000

Supplemental Fig S16. Recurrently altered genes: LINC00910, CDKN2A and PLXDC Chromosome 17 Genomic visualization of breakpoints implicated in eQTL associations of (A) LINC00910, (B) CDKN2A and (C) PLXDC1 loci. Information was derived from sequence junctions in (A) and read-depth copy number for (B) and (C). Regions in blue indicate deletions or copy number loss, regions in red indicate duplications and gains. Translocations are represented as vertical line followed by the sample names and genomic coordinate of the destination region. 23 PATGWTà59540655

Supplemental Fig S17. Sanger sequencing validation of SHANK2 structural variants Primers and additional details described in Supplemental Table S8.

24

Supplemental Fig S18. Sanger sequencing validation of DLG2 structural variants Primers and additional details described in Supplemental Table S8.

25

Supplemental Fig S19. Transcriptional effect of structural variants: eQTL and gene fusions (A) Differential expression analysis of genes with samples harboring SVs compared to unaltered samples using a Wilcoxon rank sum test. Gene names colored in red map into Chr 2p24 region near MYCN; Genes colored in blue map into Chr 5.p15 near TERT. Asterisk indicates the p-value cut off (*** = P < 0.001; ** = P<0.01; *= P<0.05); n.a. indicates that the test couldn’t be done due to lack of data points or because the gene was not included in the HumanExon array probeset design. (B) Overlap across three different gene fusion callers from 153 RNA-seq neuroblastoma samples. (C) Overlap across three different gene fusion callers from RNA-seq with evidence of translocation events in 89 samples with both RNA-seq and WGS profiles, see also Supplemental Table S9.

26

27

Supplemental Fig S20. Pathway and disease enrichment of genes altered by structural variants. (A) Function enrichment analysis bar plots for genes recurrently altered based on SJ-BP analyses across 4 different TARGET tumors (ALL, AML, OS and WT). (B-D) Neuroblastoma enrichment analysis bar plots for genes recurrently altered based on proximal and intronic breakpoint analyses of (B) SJ-BPs, (C) RD-BPs and (D) CN- BPs.

28

Supplemental Fig S21. Neurodevelopmental pathways are down-regulated in high-risk neuroblastoma (A-B) Gene Set Enrichment Analysis of high- versus low-risk tumors from the RNA-seq data showing enrichment of (A) autism spectrum disorder genes and (B) neuronal and synapse part genes. (C) Volcano plot showing differential expression between high- and low-risk highlighting genes with recurrent SVs and their functional classification (D) Subtype specific high- versus low-risk differential expression analysis of 77 recurrently altered genes from Fig. 3I shown as scatter plot (MNA = x-axis, HR-NA = y axis). This figure replicates Fig. 4D-G (Affymetrix HuEx data) using the RNA-seq dataset (n=153 samples).

29

A A BB Neuroblastoma Primary Tumor Samples (n=249) MYCN amplified tumors NonAmpplified High Risk Stage 1

C

NM_012309

NM_133266

3’ Chromosome 11q 5’

D E 1.0 F 1.0 High SHANK2 (n=289) 0.9 0.9 C D High SHANK2 (n=296)E 0.8 0.8 1.0 1.0 HIGH SHANK2 (n=289) 200 2.7e-20 0.7 HIGH SHANK2 (n=296) 0.7 Low SHANK2 (n=33) 1.7e-11 2e-8 8.5e-7 0.80.6 0.8 0.6 LOW SHANK2 (n=33) 150 0.5 LOW SHANK2Low SHANK2 (n=202) (n=210) 0.5 0.6 0.6 100 0.4 0.4 0.4 0.4 0.3 0.3 Frequency Frequency Overall Survival Probability Overall Survival Probability 50 0.20.2 0.2 0.2 P-value = 3.77e-15 P = 2.4 x 10-12 P-value =P 6.99e= 2.3 -x08 10-5 0.1 Bonferroni = 0.1 0.0 1.87e-12 0.0 Bonferroni = 2.23e-05 0 0.0 0.0 SHANK2 expression (RPM) 00 24 200048 72 964000120 1446000168 192 216 0 0200024 48400072 966000120 144 168 192 216 St.1 St.2 St.3 St.4 St.4S Time (Months) Time (Months)

Figure 5. Disrusptions affecting SHANK2 target the long isoform,Overall NM_012309, survival expression(days) of which correlatesOverall with outcome.survival (days) A. Heatmap showing expression across probes spanning the whole of SHANK2, broken up by risk groups. B. ?? C. Exon map plotting probe allignment along length of both forms of SHANK2 . SupplementaD. SHANK expressionl Fig levelsS22 by. Low tumor stage.SHANK2 Lower levels expression of SHANK2 inare neuroblastomasignificantly associated withis associatedmore aggressive diseasewith poorand worse survival survival. E. KM analysis of 498 NB patients (all risk groups). F. KM of nonhigh-risk subset. (D-F sourced from publicly available data, R2 group, SEQC-498-custom data set. (A, B) Clustering analysis of SHANK2 exon level expression from Affymetrix HumanExon arrays in MNa, S4na and S1 tumors. The heatmap (A) shows higher exon expression level in S1 compared to MNa and S4na. The correlation matrix (B) shows two well-defined clusters associated with the two known coding isoforms of the gene. (C) SHANK2 long isoform (NM_012309) expression decreases in high INSS stage across 498 primary neuroblastomas (SEQC dataset; GSE62564). (D) Kaplan-Meier analysis of SHANK2 long isoform (NM_012309) expression shows association of low expression with poor outcome (SEQC dataset, all samples). (E) decreased SHANK2 expression (NM_012309) is associated with reduced overall survival within the low- and intermediate- risk subsets (excluding high-risk).

30

Supplemental Fig S23. Genes of the postsynaptic density are down-regulated in high-risk neuroblastoma “Postsynaptic membrane” genes were obtained from MsigDB Cellular Component gene set library and subjected to GSEA analysis of differential gene expression (DGE) signatures. DGE were obtained by comparing high-risk tumor’s expression against stage low and intermediate risk samples in different datasets: (A) TARGET Affymetrix Human Exon Stage 1 vs. High-Risk, (B) SEQC 498 RNA-seq Stage 1 vs. High-risk, (C) TARGET RNA-seq Stage 4S vs. High-risk and (D) SEQC 498 RNA-seq Stage 4S vs. High-risk.

31

Supplemental Fig S24. SHANK2 expression in neuroblastoma cell lines SHANK2 RNA-seq expression (FPKM) across 38 neuroblastoma cell lines.

32 A Be(2)C B SY5Y Confluence Confluence 100 60 90 50 80 70 40 60 30 50 40 20 30 10 20 Confluence (%) Confluence (%) 10 0 0 0 4 8 0 8 12 16 20 24 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 16 24 32 40 48 56 64 72 80 88 96 112 100 104 120 128 136 144 152 160 168 176 184 192 200 208 216 Time (hours) Time (hours) Vector Vector SHANK2 SHANK2 Vector Vector SHANK2 SHANK2 C DMSO 1uM ATRA DMSO 1uM ATRA D DMSO 5uM ATRA DMSO 5uM ATRA Total Neurite Length (mm) 14 Total Neurite Length (mm) 20 12 18 16 10 14 * 8 12 10 6 8 -35 4 6 *P = 1.62095 e 4 2 2 0 0 0 8 0 4 8 16 24 32 40 48 56 64 72 80 88 96 12 16 20 24 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 Total Neurite Length (mm) Total Neurite Length (mm) Total 112 104 120 128 136 144 152 160 168 176 184 192 200 208 216 100 Time (hours) Time (hours) Vector Vector SHANK2 SHANK2 Vector Vector SHANK2 SHANK2 E DMSO 1uM ATRA DMSO 1uM ATRA F DMSO 5uM ATRA DMSO 5uM ATRA ) ) 2 2 Neurite Length/Cell-Body Area (mm/mm2) Neurite Length/Cell-Body Area (mm/mm2) 70 120 -13 -30 *P78 = 3.08516 e **P96 = 2.37712 e 60 100 50 80 * 40 * ** 60 30 40 20 -17 20 *P = 2.36032 e 10

0 0 3 2 6 11 19 27 35 43 51 59 67 75 83 91 99 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 98 115 107 123 131 139 147 155 163 171 179 187 195 Hours post treatment Hours post treatment Vector Vector SHANK2 SHANK2 Vector Vector SHANK2 SHANK2 DMSO 1uM ATRA DMSO 1uM ATRA DMSO 5uM ATRA DMSO 5uM ATRA Neurite Length/Cell-Body Area (mm/mm Neurite Length/Cell-Body Neurite Length/Cell-Body Area (mm/mm Neurite Length/Cell-Body Be(2)C- vector, 1uM ATRA SY5Y- vector, 5uM ATRA

Supplemental Fig S25. SHANK2 accelerates differentiation of neuroblastoma cells (A, B) Confluence measures across SHANK2 expressing neuroblastoma cells, vector controls, and vehicle controls included for (A) Be(2)C and (b) SY5Y cell lines. (C, D) Total neurite length measurement in all cells over time for (C) Be(2)C and (D) SY5Y neuroblastoma cell lines. Arrow indicates time of ATRA introduction. (E, F) Neurite outgrowth normalized to cell body area over time in (E) Be(2)C and (F) SY5Y cell lines.

Be(2)C- SHANK2, 1uM ATRA SY5Y- SHANK2, 5uM ATRA

33

IV. SUPPLEMENTAL REFERENCES

1. Nilsen, G. et al. Copynumber: Efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics 13, 591 (2012). 2. Mermel, C.H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human . Genome Biol 12, R41 (2011). 3. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol 17, 122 (2016).

34