The Genetic and Epigenetic Basis of Posterior Fossa

by

Stephen Christopher Mack

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Laboratory Medicine and Pathobiology University of Toronto

© Copyright by Stephen C. Mack 2014

The Genetic and Epigenetic Basis of Posterior Fossa Ependymoma

Stephen Christopher Mack

Doctor of Philosophy (PhD) Department of Laboratory Medicine and Pathobiology University of Toronto

2014

Abstract

Ependymomas are childhood brain tumors that occur throughout the central , but are most common in the hindbrain, also known as the posterior fossa (PF). Current standard therapy comprises maximal safe surgery and radiation, but not cytotoxic as it does not further increase survival. Despite histological similarity, from throughout the neuroaxis likely comprise multiple independent entities, each with a distinct molecular pathogenesis. Transcriptional profiling of two large independent cohorts of ependymoma reveals the existence of two demographically, transcriptionally, genetically, and clinically distinct groups of posterior fossa ependymomas, termed Group A and B. Group A patients are younger, have laterally located tumors with a balanced genome, and are much more likely to exhibit recurrence, metastasis at recurrence, and death as compared to Group B patients.

In spite of balanced genomes observed in Group A PF ependymoma, whole-genome and whole- exome sequencing of 47 PF ependymomas reveals an extremely low rate, and zero

ii significant recurrent somatic single nucleotide variants (SNVs). While devoid of recurrent

SNVs and focal copy number aberrations, poor prognosis Group A ependymomas exhibit a CpG island methylator (CIMP). Transcriptional silencing driven by CpG methylation converges exclusively on targets of the polycomb repressive complex 2 (PRC2) that represses expression of differentiation through tri-methylation of H3K27. Group A-CIMP+ PF ependymomas are responsive to clinical drugs that target either DNA or H3K27 methylation both in vitro and in vivo. We conclude that epigenetic modifiers are the first rational therapeutic candidates for this deadly malignancy, which is epigenetically de-regulated but genetically bland.

iii

Acknowledgments

I would like to express my sincere appreciation and gratitude to my supervisor and mentor Dr.

Michael D. Taylor for teaching me how to think, write, speak, dream, and drink like a research scientist. My accomplishments as a PhD student are tied to the great training environment, leadership, funding, and collaborations that Michael has established in his laboratory. It truly has been an amazing experience, and one simply could not wish for a better supervisor and friend. On the same note, I express my special thanks to the Taylor lab members both past and present for their assistance, advice, and friendship throughout my training. I would also like to thank investigators of the Brain Tumour Research Centre namely Dr. Peter B. Dirks, Dr. James

T. Rutka, Dr. Eric Bouffet, and Dr. Abhijit Guha for their support and mentorship throughout my PhD. I would like to thank my committee members Dr. Rod Bremner and Dr. Dwayne

Barber for their advice and guidance throughout my training. Collaboration has been such an important aspect of Michael’s lab; I would like to thank the Heidelberg, Germany DKFZ ependymoma team led by Dr. Stefan M. Pfister, Dr. Hendrik Witt, and Dr. Andrey Korshunov for working with us over the past few years, which I think, has led to some great discoveries, and great times along the way. I would like to thank my family and friends for all of their encouragement; I wouldn’t have made it this far without their tireless motivation. I am grateful to my wife, Dr. Kelsey C. Bertrand, for understanding that science takes patience, for encouraging my research goals, and for inspiring me to make life an adventure. Finally, I would like to thank my parents and sister, Karl, Rebecca, and Stephanie Mack, for their unconditional support, and giving me the tools, foundation, and opportunities needed to pursue all of my aspirations.

iv

Table of Contents

ACKNOWLEDGMENTS ...... IV

TABLE OF CONTENTS ...... V

LIST OF ACRONYMS ...... IX

1 CHAPTER 1: INTRODUCTION ...... 1 1.1 THE CLINICAL BASIS OF EPENDYMOMA ...... 1 1.1.1 Incidence ...... 1 1.1.2 Histopathology ...... 1 1.1.3 Treatment ...... 3 1.2 THE GENETIC BASIS OF EPENDYMOMA ...... 5 1.2.1 Somatic copy number landscape ...... 5 1.2.2 Ependymoma expression profiling reveals significant inter-tumoral heterogeneity and putative cells of origin ...... 7 1.2.3 Cell line and animal models of ependymoma ...... 9 1.3 THE EPIGENETIC BASIS OF EPENDYMOMA ...... 10 1.3.1 Candidate gene approaches identify DNA hypermethylated gene promoters in ependymoma 10 1.3.2 Genome-wide epigenomic profiling identify pathways targeted by aberrant DNA methylation ...... 11 1.3.3 Potential applications of epigenetic modifiers for ependymoma treatment ...... 12 1.3.4 Future steps towards characterizing the ependymoma genome and epigenome ...... 13

2 CHAPTER 2: THESIS RATIONALE AND HYPOTHESES ...... 16

3 CHAPTER 3: IDENTIFICATION AND CHARACTERIZATION OF TWO CLINICALLY AND MOLECULARLY DISTINCT SUBGROUPS OF POSTERIOR FOSSA EPENDYMOMA 17 3.1 CHAPTER 3: SUMMARY ...... 17 3.2 CHAPTER 3: INTRODUCTION ...... 18

v

3.3 CHAPTER 3: RESULTS ...... 19 3.3.1 Ependymoma is comprised of three molecularly distinct diseases ...... 19 3.3.2 Group A and B posterior fossa ependymoma are demographically and clinically distinct . 20 3.3.3 Group A and Group B posterior fossa ependymoma are characterized by distinct copy number landscapes ...... 21 3.3.4 Group A and Group B ependymoma are characterized by distinct biological pathways .... 22 3.3.5 Validation of posterior fossa ependymoma subgroups ...... 24 3.4 CHAPTER 3: DISCUSSION ...... 27 3.5 CHAPTER 3: FIGURES AND TABLES ...... 32 3.6 CHAPTER 3: SUPPLEMENTAL FIGURES ...... 40 3.7 CHAPTER 3: METHODS ...... 45 3.7.1 Patients and tumour samples ...... 45 3.7.2 Nucleic acid isolation ...... 46 3.7.3 microarray processing and data filtering ...... 47 3.7.4 Array-based comparative genomic hybridization ...... 48 3.7.5 Statistical analysis of gene expression based subgroups ...... 49 3.7.6 Identification of gene expression derived subtypes ...... 50 3.7.7 Identification of biological pathways distinguishing PFA from PFB ependymoma ...... 51 3.7.8 Selection of candidate genes and gene signatures ...... 52 3.7.9 Statistical analysis of clinical parameters ...... 52 3.7.10 Construction of ependymoma tissue microarrays ...... 53 3.7.11 ...... 53

4 CHAPTER 4: SOMATIC MUTATIONAL AND EPIGENOMIC LANDSCAPE OF GROUP A AND GROUP B POSTERIOR FOSSA EPENDYMOMA ...... 54 4.1 CHAPTER 4: SUMMARY ...... 54 4.2 CHAPTER 4: INTRODUCTION ...... 55 4.3 CHAPTER 4: RESULTS ...... 56 4.3.1 The somatic mutational landscape of ependymoma ...... 56 4.3.2 DNA methylation profiles of ependymoma are regionally specified ...... 57 4.3.3 Group A posterior fossa ependymoma demonstrate a CpG island methylator phenotype ... 58 4.3.4 Convergence on PRC2 targets ...... 59 4.3.5 Whole genome DNA methylation profiling of ependymoma ...... 60 vi

4.3.6 Group A (CIMP+) ependymoma are highly and specifically inhibited by DNA methylation Inhibitors ...... 61 4.4 CHAPTER 4: DISCUSSION ...... 63 4.5 CHAPTER 4: FIGURES ...... 66 4.6 CHAPTER 4: SUPPLEMENTAL FIGURES ...... 71

4.7 CHAPTER 4: METHODS ...... 87 4.7.1 Patients and tumour samples ...... 87 4.7.2 DNA library preparation and Illumina sequencing ...... 87 4.7.3 DNA sequence data processing ...... 89 4.7.4 Single nucleotide variant (SNV) detection ...... 89 4.7.5 Small insertion and deletion (InDel) detection ...... 91 4.7.6 Computation of recurrently mutated genes ...... 92 4.7.7 Identification of rearrangements and generating of Circos Plots ...... 92 4.7.8 Identification of pathways affected by SNVs ...... 93 4.7.9 Generation of copy number profiles from Illumina 450K methylation data ...... 93 4.7.10 Methyl- binding domain 2 assisted recovery and sample preparation ...... 94 4.7.11 Methylation analysis of MBD2-chip data ...... 95 4.7.12 Illumina Infinium 450K methylation sample preparation and data analysis ...... 95 4.7.13 Subgroup analysis of gene expression and methylation data ...... 96 4.7.14 H3K27me3 and EZH2 ChIP-Seq profiling and analysis in PF ependymoma ...... 97 4.7.15 Sequenom analysis of ependymoma samples ...... 98 4.7.16 Subgroup stratification of ependymoma samples in a validation cohort ...... 98 4.7.17 Pathway analysis of DNA methylation data ...... 99 4.7.18 Whole-genome bisulfite sequencing, DNA preparation, and differentially methylated region analysis ...... 100 4.7.19 Whole-genome bisulfite sequencing DMR and PMD calling ...... 101 4.7.20 Ependymoma short-term primary cell culture and in vitro drug treatment ...... 101 4.7.21 Gene expression profiling of DAC and GSK343 treated cultures ...... 102 4.7.22 Western blot analysis ...... 102 4.7.23 Flank injections and in vivo treatments of immunodeficient mice ...... 103 4.7.24 Cerebellar xenografts and in vivo treatments of immunodeficient mice ...... 103 4.7.25 Limiting dilution assays (LDAs) of primary ependymoma patient samples or ependymoma xenografts ...... 104

vii

5 CHAPTER 5: FUTURE DIRECTIONS ...... 105

5.1 FURTHER EXAMINATION OF TUMOUR HETEROGENEITY IN EPENDYMOMA ...... 105 5.2 FURTHER INVESTIGATION OF GENOMICALLY BALANCED POSTERIOR FOSSA EPENDYMOMA .... 106 5.3 FOUNDATIONS FOR THE PFA EPENDYMOMA CIMP PHENOTYPE ...... 107 5.4 FURTHER ELUCIDATING THE EPIGENOMIC LANDSCAPE OF POSTERIOR FOSSA EPENDYMOMA .. 109 5.5 MOUSE MODELS AND CELLULAR ORIGINS OF POSTERIOR FOSSA EPENDYMOMA ...... 110 5.6 CLINICAL IMPLICATIONS FOR TREATMENT OF POSTERIOR FOSSA EPENDYMOMA ...... 111

6 ATTRIBUTIONS ...... 113

7 COPYRIGHT PERMISSIONS ...... 114

8 REFERENCE LIST ...... 115

viii

List of Acronyms

aCGH array comparative genomic hybridization ACTB beta actin ALU arthrobacter luteus AQP-1/3/4 aquaporin-1/3/4 ATRX ATP-dependent helicase, X-linked helicase II BIRC5 baculoviral IAP repeat containing 5 BLBP brain lipid binding protein ZMYD10 zinc finger, MYND-Type containing 10 CASP8 caspase 8, apoptosis-related cysteine peptidase CCN-B2/D1/G2 cyclin-B2/D1/G2 CDK2/4 cyclin-dependent kinase 2/4 CDKN-1C/2A/2B/2C cyclin dependent kinase inhibitor-1C/2A/2B/2C cHCL consensus hierarchical clustering CHI3L1 chitinase 3-like 1 CIMP CpG island methylator phenotype cNMF consensus non-negative matrix factorization CNS DAC 5'-aza-2'-deoxyazacytidine DAPK death-associated protein kinase 1 DKFZ das deutsche krebforschungszentrum DLK1 delta-like 1 homolog DMR differentially methylated region DNAJC15 DnaJ (Hsp40) homolog, subfamily C, member 15 DNMT-1/3A/3B DNA methyltransferase isoform-1/3A/3B DZNep 3-deazaneplanocin-A EED embryonic ectoderm development EFNA-A3/4 ephrin-A3/A4 EGFR epidermal growth factor EMA epithelial membrane antigen EPHB-2/3/4 ephrin-B2/B3/B4 ERRB-2/4 v-erb-b2 avian erythroblastic leukemia viral homolog 2/4 EZH2 enhancer of zeste homolog 2 FDR false discovery rate FHIT fragile histidine triad GAPDH glyceraldehyde-3-phosphate dehydrogenase GBM glioblastoma multiforme GFAP glial fibrillary acidic protein GO gene ontology GSEA gene set enrichment analysis GSTP1 glutathione S-transferase pi 1 GTR gross total resection H19 imprinted maternally expressed transcript (non-protein coding) H3F3A H3 histone, family 3A

ix

H3K27 histone 3 lysine 27 HDAC histone deactetylase HE hematoxylin and eosin stain HIC1 hypermethylated in 1 HIF-1a hypoxia inducible factor 1 alpha HOX-A7/A9/B6/B7/C6/C10 homeobox-A7/A9/B6/B7/C6/C10 IC50 inhibitory concentration 50 ID-1/2/4 inhibitor of DNA binding-1/2/4 IDH-1/2 isocitrate dehydrogenase-1/2 INDEL small insertion and deletion IRAK3 interleukin-1 receptor-associated kinase 3 IRF7 interferon regulatory factor 7 JNK c-Jun N-terminal kinases KCNN1 potassium intermediate/small conductance calcium-activated channel, subfamily N, member 1 KDM6A lysine (K)-specific demethylase 6A KEGG kyoto encyclopedia of genes and genomes KIF27 kinesin family member 27 LAMA2 laminin alpha 2 LDA limiting dilution assay LINE long interspersed nuclear elements LTR long terminal repeat MAD median absolute deviation MAPK mitogen-activated protein kinase MAPK10 mitogen-activated protein kinase-10 MBD2 methyl CpG binding domain 2 MGMT O-6-methylguanine-DNA methyltransferase MLL-2/3 mixed-lineage leukemia-2/3 NCAM neural cell adhesion molecule NCI national cancer institute NELL2 neural epidermal growth factor like-2 NF2 neurofibromin 2 NOD/SCID non-obese diabetic/severe combined immunodeficiency NOD2 nucleotide-binding oligomerization domain containing 2 OSM oncostatin M PARP poly (ADP-ribose) polymerase family member PCDH proto-cadherin PDGFR platelet derived growth factor receptor PF posterior fossa PFA posterior fossa group A ependymoma PFAM a large collection of protein families PFB posterior fossa group B ependymoma PI3 peptidase inhibitor 3 PLIER probe logarithmic intensity error PMD partially methylated domain PRC2 polycomb repressive 2 complex PSPH phosphoserine phosphatase PTEN phosphatase and tensin homolog x

PTPRN2 protein tyrosine phosphatase, receptor type, N polypeptide 2 RAB3A Ras-related protein Rab-3A RARB retinoic acid receptor, beta RAS rat sarcoma RASSF1A ras association domain family member 1 RGC ROI region of interest S100-A6 S100 calcium binding protein-A6 SCNA somatic copy number alteration SINE short interspersed nuclear elements SNV single nucleotide variant SP ST supratentorial STAG1 stromal antigen 1 STR sub total resection SUZ12 suppressor of zeste 12 protein homolog TCAG the centre for applied TERT TET-1/2 tet methylcytosine dioxygenase 1 TGF-Beta transforming growth factor beta THAP11 thap domain containing 11 TIMP3 TIMP metallopeptidase inhibitor 3 TMA tissue microarray TNC tenascin C TNFRSF-10C/10D tumour necrosis factor receptor superfamily-10C/10D TP53 tumour protein 53 TP73 tumour protein 73 TRAIL TNF-related apoptosis-inducing ligand TSG tumour suppressor gene VEGF vascular endothelial growth factor WHO world health organization

xi 1

1 Chapter 1: Introduction

1.1 The clinical basis of ependymoma

1.1.1 Incidence

Ependymomas are rare, chemo-resistant, central nervous system tumours, arising in both children and adults. The age-adjusted incidence of ependymoma between 2004-2009 was 0.41 per 100,0001, and follows a bimodal distribution with peaks at the average age of 5 and 34.

Ependymomas can arise along the entire neuro-axis occurring in the supratentorial (ST) brain comprising the cerebral hemispheres, the posterior fossa (PF) comprising the cerebellum and brain stem, and along the entire spinal cord (SP). In children, 90% of ependymomas arise intracranially, with nearly two-thirds occurring in the PF2. The incidence of relapse is significantly greater in the pediatric population, and is clinically variable with recurrences observed in some cases 10 – 15 years following treatment of the primary tumour.

1.1.2 Histopathology

Ependymomas were first recognized as a distinct entity by Cushing and Bailey in their 1926 histological description of brain tumours. To this date much of the same criteria is used in accordance with the 2007 classification of brain tumours as outlined by the World Health

2

Organization (WHO)2. The hallmark histological features of ependymoma include: 1)

Perivascular pseudorosettes: composed of ependymal cell processes radially arranged towards blood vessels, and 2) True ependymal rosettes: consisting of tumour cells arranged radially surrounding an empty lumen. In terms of immunohistochemical staining, ependymomas are commonly positive for glial fibrillary acidic protein (GFAP), neural cell adhesion molecule protein (NCAM), and epithelial membrane antigen protein (EMA), which allows for further delineation from other histologically similar brain tumours3.

The WHO recognizes three grades for ependymomas:

Grade I: Comprising and myxopapillary ependymomas, which are

easily recognizable histological entities associated with a better survival and increased

age. (Note: this histological type will not be discussed further in the current thesis).

Grade II: Pertains to ependymomas, which lack Grade III features described below.

Grade III: Also described as anaplastic, this classification corresponds to ependymomas

with “increased cellularity and brisk mitotic activity, often associated with microvascular

proliferation and pseudopalisading necrosis”.

While the Grade I criteria for ependymoma are relatively clear, the parameters used to distinguish Grade II and III ependymomas are highly debated as prognostic differences have

3 been observed in some tumour cohorts but not others3. This discrepancy occurs even after controlling for potentially confounding factors such as age or tumour location. The lack of reproducibility and reliability of histopathological is demonstrated in a systemic review of three independent European trial cohorts by five leading neuropathologists, in which a consensus agreement on the classification of Grade II or III ependymoma was reached in less than half of 221 total cases examined4. While the prognostic utility of histopathological grading is still debated, immunohistochemistry (IHC) and molecular markers have been proposed as potential solutions e.g. Telomerase (TERT) protein expression5-7, V-erb-b2 erythroblastic leukemia viral oncogene homolog 2/4 (ERBB2/4)8 protein expression, or fluorescence in situ hybridization (FISH) of recurrent chromosomal alterations9. These IHC and molecular markers remain to be validated in independent and prospective ependymoma cohorts, a challenge necessitating multi-centre collaborative efforts.

1.1.3 Treatment

To this date, treatment for ependymoma remains aggressive surgical intervention and adjuvant radiotherapy. In spite of the histological challenges in identifying high- versus low- risk ependymoma patients, the extent of surgical resection is the most frequently and consistently reported prognostic indicator of ependymoma patient survival. In other pediatric brain tumours, such as , radiotherapy is typically avoided in children less than 3 years of age due to increased risk of long-term neurological and neuroendocrine sequelae. However, the

4 aggressive nature of ependymoma in infants and young children combined with a lack of effective has provided rationale for the use of conformal and intensity- modulated radiotherapy in infants. Early evidence in prospective clinical trials has demonstrated that conformal radiation is both effective and associated with minimal short-term neurological side effects in the 5-years following treatment, as evidenced by no detectable verbal or visual learning decline, and relatively stable emotional and behavioral functioning10-12.

Whether these outcomes are maintained long-term awaits further evaluation.

Despite improvements in surgical techniques and advances in conformal radiation, recurrence rates remain high in ependymoma patients particularly in the pediatric population13. Although, chemotherapy has been used extensively in the treatment of children with intracranial ependymoma, clinical trial response rates to numerous single agents are less than 12%, with less than 5% of patients experiencing complete responses14. Results from several multi-centre ependymoma clinical trials suggest that there is little evidence that chemotherapy is effective in treatment of this tumour type14,15. As a result, current standard of care for patients with recurrent ependymoma is maximal-safe surgical resection followed by re-irradiation. Despite prolonged overall survival observed from re-irradiation of recurrent ependymoma, the risks of secondary tumours and neurological impairments have yet to be adequately assessed13.

The high recurrence rate of pediatric ependymomas, lack of prognostic histological and molecular markers, and dearth of chemotherapeutic avenues, underscores the importance of

5 understanding the biological basis of ependymoma such that rationale molecular targets can be identified and rapidly translated in the clinic.

1.2 The genetic basis of ependymoma

1.2.1 Somatic copy number landscape

Efforts to identify driver and tumour suppressor genes (TSG) of ependymoma began largely with characterization of these tumours at a DNA copy number level using cytogenetic approaches, DNA-based microarrays and, more recently, whole-genome and whole exome sequencing. Despite higher resolution array technologies, the vast majority of recurrent somatic copy number alterations (SCNA) are broad and involve losses of : 1p, 3, 6q, 9p,

10q, 13q, 16p, 17, 21 and 22q, and gains of chromosome: 1q, 4q, 5, 7, 8, 9, 12q, and 2016.

The most frequent and focal SCNA in ependymoma is a homozygous deletion encompassing the

CDKN2A/Ink4a locus, which is restricted to supratentorial ependymomas (ST). Other non- recurrent events include amplifications of EPHB2, THAP11, PSPH, PCDH cluster, KCNN1,

RAB3A, PTPRN2, and NOTCH1, and deletions of PTEN, STAG1, and TNRC6B. As one example, EPHB2 was validated as a bonafide ST ependymoma oncogene and was used to generate the first mouse model of ST ependymoma. In the case of posterior fossa (PF) ependymomas, recurrent and focal copy number alterations pinpointing driver genes have yet to

6 be discovered, highlighting the difficulty in understanding the biological basis, and identifying novel therapeutic targets for this anatomical subtype of ependymoma.

Chromosome 22 loss has been shown to be the most frequent gross copy number alteration in ependymoma with a frequency ranging from 26% to 71%16. Further, chromosome 22q loss has been observed preferentially in spinal versus intracranial ependymoma, and in adult versus pediatric cases16. The Neurofibromatosis II (NF2) gene is thought to be the candidate TSG of this region, as patients with Neurofibromatosis type II disorder develop a variety of central nervous system tumors including ependymoma, schwannoma, and meningionma16. However

NF2 is mutated exclusively in spinal ependymomas, thus suggesting alternate mechanisms of down-regulation, or another putative chromosome 22q TSG in the case of intracranial ependymoma.

Another broad chromosomal abnormality frequently observed in ependymoma is monosomy 17 with complete or partial loss on both chromosomal p and q arms. Further, several studies, albeit limited to small cohorts, have reported that loss of occurs more frequently on the p-arm in up to 50% of cases. A clear tumour suppressor gene on 17p has yet to be identified, and the consensus is that TP53 is not the candidate TSG of this region as TP53 are exceedingly rare in both adult and childhood ependymoma.

Chromosome 1q gain has also been consistently reported as a frequent genomic alteration occurring in nearly 22% of cases of intracranial ependymoma. Further, an increased incidence

7 of 1q gain has been observed in posterior fossa ependymoma and associated with poor clinical outcome16-18. It is thought that the chromosome 1q25 locus harbors a bonafide oncogene involved in the initiation, maintenance, or progression of ependymoma. Efforts have been made to correlate this region of chromosomal gain with gene expression and have identified CHI3L1 and a family of S100 genes as up-regulated and potential driver oncogenes, however these candidates remain to be functionally validated.

As seen in numerous cohorts, a large proportion of ependymoma, particularly of posterior fossa localization exhibit balanced genomic profiles with very few genomic abnormalities17-20. This has been observed consistently from low- to high- resolution copy number studies, and associated with young age and poor clinical outcome. Alternative mechanisms of gene regulation such as somatic mutations have been proposed to play a role in ependymoma, perhaps significantly in cases with balanced copy number profiles21. However, whole-exome sequencing across a small series of 8 intracranial ependymomas did not identify a single recurrently mutated gene.

1.2.2 Ependymoma gene expression profiling reveals significant inter-tumoral heterogeneity and putative cells of origin

Using gene expression profiling, and unsupervised clustering, ependymomas have been divided into three principal molecular subgroups, which are separated largely according to anatomical location: 1) Supratentorial (ST), 2) Posterior Fossa (PF), and 3) Spinal cord (SP)17. These three

8 subgroups have been further divided into molecularly and biologically distinct subtypes of ST and PF ependymoma as defined by distinct clinical features and copy number alterations18,20,22.

The genes distinguishing supratentorial, posterior fossa, and spinal ependymomas involve mainly families of genes regulating neural precursor cell proliferation and differentiation17.

Supratentorial ependymomas have elevated EPHB-EPHRIN (EPHB2/3/4 and EFNAA3/4),

NOTCH (JAGGED1/2), and cell cycle related genes (CYCLINB2/D1/G2, CDK2/4, and

CDKN1C2C), while posterior fossa ependymomas express many inhibitors of differentiation

(ID1/2/4), and the aquaporin family of genes (AQP1/3/4). Spinal ependymomas however are characterized by expression of various homeobox genes including HOXA7/9, HOXB6/7, and

HOXC6/10.

While these subgroup gene signatures revealed distinct tumorigenic pathways, Taylor et al.

(2005) proposed that these were potential signatures of anatomically distinct cells of origin giving rise to different subgroups of ependymoma17. They further suggested that ependymoma might originate from radial glial cells, a population of primitive neural and multi-potent precursor cells important for neurogenesis and neuronal migration. Indeed, tumour cells isolated from a series of ependymoma primary samples expressed characteristic markers of radial glial cells (RGC), such as BLBP and RC2. These ependymoma cells could be differentiated into neuronal, astrocytic, and oligodenrocytic lineages, thus highlighting the multi-potent capacity and functional similarity to RGCs. Further evidence implicating RGCs as cells of origin of ependymoma was demonstrated by Johnson et al. (2010), in which ex-vivo over-expression of

9

EPHB2 in /INK4A deficient forebrain RGCs led to the formation of the first mouse model of

ST ependymoma18.

1.2.3 Cell line and animal models of ependymoma

In the past, lack of clear driver alterations in ependymoma has hampered the ability to generate animal models of this disease. Laboratories have thus relied upon patient-derived ependymoma cultures grown in vitro and orthotopic xenograft models generated with limited success particularly in the case of PF ependymoma23,24. Identification of the putative cell-of-origin of ependymoma, and drivers of ST ependymoma, namely EPHB2 amplification and recurrent p16/INK4A deletion, has led to the first animal models of this disease8,17. As novel ependymoma targets are discovered, validation and prioritization of candidates will require accurate pre- clinical models of ependymoma. Atkinson et al., (2011) demonstrate the utility and promise of this approach in Group D ST ependymoma cultures, generated by EPHB2 over expression and

CDKN2A/Ink4a deletion in forebrain radial glia25. Here they performed a compound library screen of 7890 compounds, in both tumour and matched normal neural stem cells and identified inhibitors of thymidylate synthase and dehydrogenate reductase, namely 5-fluorouracil as highly and specifically active in mice bearing ST ependymomas. Findings such as these hold therapeutic relevance particularly in ependymoma with currently no chemotherapeutic options available to patients.

10

1.3 The epigenetic basis of ependymoma

1.3.1 Candidate gene approaches identify DNA hypermethylated gene promoters in ependymoma

Aberrant promoter methylation of CpG di-nucleotides is a well-recognized feature seen in numerous solid and liquid cancers26. As such, early studies have focused on candidate gene promoters, hypermethylated in ependymoma, which have also been reported to be methylated in other . RASSF1A has been shown to be the most frequently hypermethylated TSG reported in up to 100% of ependymomas, and occurring in all clinical and pathological subtypes27,28. HIC1 has also been reported to be commonly methylated, in up to 83% of ependymomas, with a higher incidence in intracranial tumours29. Furthermore, the

CDKN2A/INK4a locus, which is focally and recurrently deleted in supratentorial ependymoma18, has been shown to be hypermethylated in 21% of cases, followed by CDKN2B and in 32% and 33% of tumors, respectively30. To a lesser extent, putative TSGs found to be hypermethylated in ependymoma include ZMYD10, GSTP1, DAPK, FHIT, MGMT,

DNAJC15, RARB, TIMP3, THBS1, TP73 and the Tumour Necrosis Factor Related Apoptosis-

Inducing Ligand (TRAIL) gene family CASP8, TNFRSF10C and TNFRSF10D31-34. Despite the frequency of DNA methylation of these potential TSGs in ependymoma, their role and significance in tumour development remains unclear, requiring validation in independent cohorts, and functional investigation in appropriate ependymoma models. It raises the possibility, however, that other genes and pathways may be epigenetically altered in the

11 pathogenesis of ependymoma, and suggests that epigenome-wide examinations of DNA methylation in ependymoma are warranted.

1.3.2 Genome-wide epigenomic profiling identify pathways targeted by aberrant DNA methylation

In the last 5 years, the expansion of microarray and next generation sequencing technologies, has allowed for genome-wide investigations of DNA methylation and histone modifications at unprecedented resolution and throughput. Using the Illumina Golden Gate Methylation Cancer

Panel 1 (1505 CpG sites), Rogers et al., (2012) profiled a series of 73 primary and 25 recurrent ependymomas27. Here they reported that the DNA methylation profiles of ependymoma are distinguished largely according to their location in the central nervous system, supporting the notion that ependymomas arising from different anatomic compartments are molecularly distinct17,18,20,22. Furthermore, they demonstrated that ST and SP ependymomas, together, exhibited a larger number of hypermethylated and down-regulated genes in comparison to PF tumours. These changes in DNA methylation were shown to be associated with alterations in gene expression of de novo and maintenance DNA methyltransferases DNMT1, DNMT3A and

DNMT3B. Interestingly, genes involved in immune cell response (NOD2, IRF7, IRAK3, OSM and PI3), cell growth and death (MAPK10, and TP73), and the c-Jun N-terminal kinases (JNK) pathway were found to be hypermethylated. Understanding the contribution of epigenetic alterations in these pathways may reveal mechanisms of ependymoma tumourigenesis, and potential actionable targets for therapeutic intervention.

12

In contrast to hypermethylation of CpG island promoters in cancer, global hypomethylation is a trend observed in numerous tumour types and is associated with cancer progression. A global decrease in methylation has been observed predominantly at repetitive elements such as LINEs,

SINEs and LTRs, which are important for maintaining genomic stability35. To elucidate the contribution of DNA methylation alterations at repetitive sequences in ependymoma, Xie et al.,

(2010) developed a novel genome-wide approach to generate methylation profiles for thousands of Alu elements (The most abundant class of repetitive elements) and their flanking sequences36.

Here they demonstrated that while the majority of Alu elements and flanking sequences remain unaltered in ependymoma genesis, a small subset of Alu flanking sequences, with low CpG density, exhibited variable methylation patterns. These sequences tended to be hypermethylated in ependymoma at regions proximal to CpG islands and hypomethylated in intergenic regions.

Importantly, several of these patterns were shown to be associated with aggressive primary ependymomas and tumour relapse. The impact that these epigenetic alterations have on genomic stability and the pathways and biological processes that they impinge on remain to be elucidated, however point to alterations in the epigenome that may play an important role in the pathogenesis of ependymoma.

1.3.3 Potential applications of epigenetic modifiers for ependymoma treatment

Characterizing the epigenome of ependymoma may hold therapeutic promise, as these marks such as CpG DNA methylation and histone modification are generally reversible by

13 pharmacologic inhibition. Importantly, inhibitors of DNA methylation (decitabine) and histone deacteylation (Vorinostat) are FDA approved and have shown efficacy in haematological malignancies37,38. This would make them likely rapidly translatable if effective in appropriate ependymoma pre-clinical models. As an example, Milde et al., (2011) established a model reminiscent of high-risk molecular group C supratentorial ependymoma, by in vivo transplantation of primary tumour cells in NOD/SCID immunodeficient mice23. Like primary ependymomas, which are highly chemo-resistant, cell cultures from this model were refractory to temozolomide, vincristine, and cisplatin. However, these cells were sensitive to a panel of histone deactetylase inhibitors (HDACi) including Panobinostat, Entinostat, Valproic Acid, and

Vorinostat. Importantly, cell cultures treated with Vorinostat at therapeutically achievable doses, demonstrated decreased proliferation, cell cycle arrest, decreased self-renewal capacity, and increased neuronal differentiation. These findings were also supported by Rahman et al., (2010), demonstrating that the ependymoma cell line nEPN2 underwent apoptosis in response to treatment with the HDACi, Trichostatin-A39. Given the rapid development of novel pharmacologic inhibitors of epigenetic marks, it raises the question as to whether these, or at least some, epigenetic modifications are central to ependymoma pathogenesis and whether they may represent novel avenues for therapeutic inhibition.

1.3.4 Future steps towards characterizing the ependymoma genome and epigenome

14

Although genomic and transciptomic profiling efforts have identified distinct molecular subtypes of ependymoma and revealed potential drivers of the disease, the vast majority of ependymoma tumors are characterized by either large chromosomal alterations, hampering the identification of driver events, or are characterized by very few genomic abnormalities, observed in some of the most aggressive tumours, enriched in PF patients, and occurring in the youngest patient population18,20.

It remains to be seen whether recurrent somatic single nucleotide variants (SNVs) or structural rearrangements (i.e. fusion transcripts) may contribute to the pathogenesis of ependymoma, as reported in several other adult and pediatric CNS neoplasms. Indeed, somatic SNVs identified in

CNS cancers have been shown to potentially alter the entire epigenetic landscape during tumor formation, such as the IDH1 mutations leading to a CpG hypermethylator phenotype in

Glioblastoma Multiforme (GBM)40, histone H3F3A and ATRX mutations in pediatric GBM41, and a collection of chromatin associated genes shown to be mutated in medulloblastoma such as

MLL2, MLL3, and KDM6A42-45.

DNA methylation profiling efforts have also been important in the molecular stratification of

CNS tumors46, and have also shown promise in distinguishing the principle molecular subgroups of ependymoma and identifying pathways targeted by DNA hypermethylation. As a future step, expanding DNA methylation profiling to platforms with higher CpG coverage surrounding gene promoters might be warranted, and could reveal novel targets, pathways, and mechanisms of epigenetic alteration. Also, given the contributions of aberrant methylation near

15 repeat elements in the ependymoma epigenome, more global investigations beyond gene promoters might be needed, and could be readily examined with whole-genome bisulfite sequencing.

Ultimately, the contribution of genetic and epigenetic changes in ependymoma pathogenesis may not only improve our understanding of the biology of this disease, but also reveal actionable pathways that could be rapidly translated in the clinic. This is important especially in the case of PF ependymoma, a highly recurrent and chemo-resistant brain tumour, for which the mechanisms and drivers of the disease remain largely unknown.

16

2 Chapter 2: Thesis Rationale and Hypotheses

Given that pediatric ependymomas arise most frequently in the posterior fossa, are chemo- resistant, are associated with poor clinical outcome, and exhibit largely balanced genomes, I formulated two hypotheses:

1) Posterior fossa ependymomas may be characterized by significant inter-tumoural

heterogeneity, and potential driver oncogene or tumour suppressor genes may be

uncovered by delineation of gene expression signatures in large and independent primary

ependymoma cohorts.

2) Despite the paucity of focal copy number alterations, posterior fossa ependymomas may

harbor recurrent somatic mutations and epigenetic alterations, which may be revealed by

whole-genome sequencing, whole exome sequencing, and DNA methylome profiling.

This thesis is divided into two parts addressing the hypotheses above:

1) The first chapter details the identification and characterization of two clinically and

molecularly distinct subgroups of posterior fossa ependymoma termed Group A and

Group B.

2) The second chapter describes the somatic mutational and epigenomic landscape of

Group A and B posterior fossa ependymoma

17

3 Chapter 3:

Identification and characterization of two clinically and molecularly distinct subgroups of posterior fossa ependymoma

3.1 Chapter 3: Summary

Despite histological similarity, ependymomas from throughout the neuroaxis likely comprise multiple independent entities, each with a distinct molecular pathogenesis. Transcriptional profiling of two large independent cohorts of ependymoma reveals the existence of two demographically, transcriptionally, genetically, and clinically distinct groups of posterior fossa ependymomas (PF), termed Group A and B. Group A patients are younger, have laterally located tumours with a balanced genome, and are much more likely to exhibit recurrence, metastasis at recurrence, and death as compared to Group B patients. Identification and optimization of immunohistochemical markers for PF ependymoma subgroups allowed validation of our findings on a third independent cohort, using a human ependymoma tissue microarray (TMA), and provides a tool for prospective prognostication and stratification of PF ependymoma patients.

18

3.2 Chapter 3: Introduction

In spite of histopathological similarities, ependymomas are very heterogeneous tumors with disparate mRNA expression profiles, supporting the hypothesis that the histological entity

‘ependymoma’ in fact comprises a group of related diseases17,18. The genetic landscape of ependymoma is also heterogeneous, with subsets of tumors exhibiting frequent gross numerical chromosomal alterations, and others displaying only single focal aberrations, or even a balanced genome16. A recurrent observation in several tumour cohorts is that up to 50% of PF cases have balanced genomic profiles18,19,47-49. A molecular model for prognostication of pediatric and adult ependymomas was recently published based upon tumors with specific genomic aberrations, or the absence of copy number alterations19. Further addressing the genetic heterogeneity of ependymoma, Johnson et al (2010), reported a comprehensive study cataloguing DNA copy-number alterations in subgroups of ependymoma defined by messenger

RNA (mRNA) and microRNA (miRNA) profiles18. Remarkably, they generated a mouse model of supratentorial ependymoma, and present supporting evidence that the cell-of-origin for supratentorial ependymoma resides in the radial glial lineage17,18. Evidence for a molecular

‘driver’ alteration and a cell-of-origin for PF ependymoma are still lacking. We have now applied genomic methodologies to three independent cohorts of human ependymomas in order to uncover the extent of inter-tumoral molecular heterogeneity, and the nature of the clinically relevant subgroups of PF ependymoma.

19

3.3 Chapter 3: Results

3.3.1 Ependymoma is comprised of three molecularly distinct diseases

In order to delineate the genetic basis of ependymoma we performed gene expression profiling on two non-overlapping primary ependymoma cohorts, which were histologically diagnosed as

Grade II or Grade III2. 102 ependymomas were analyzed in Toronto, Canada using the

Affymetrix Exon 1.0 ST gene expression arrays, and 75 ependymomas were analyzed in

Heidelberg, Germany on the Agilent two-color 4x44K gene expression arrays. We then examined the subgroup structure within both datasets, applying analytical methods as described by The Cancer Genome Atlas, asking whether the same subgroups were observed using two distinct bioinformatic methods in two independent datasets50. The 1000 most varying genes in each dataset were used to perform consensus hierarchical clustering (cHCL) from 2 to 10 subgroups (Figure 3.1A-B and Figure S3.1)51. Despite randomizing the initial number of input genes, we consistently found three stable clusters in both datasets that were generated using distinct microarray technologies (Figure S3.1).

To support the subgroup stratification by cHCL, we performed consensus clustering using non- negative matrix factorization (cNMF)52, and concluded that both cHCL and cNMF subgroup assignments were highly concordant (Figure 3.1C). We then used an algorithm called SigClust to perform all pairwise comparisons between HCL-defined subgroups and demonstrated that the three subgroups were statistically significant and distinct (Figure 3.1D)53. Silhouette analysis

20 demonstrated that 96% of samples in the Toronto and Heidelberg datasets had positive silhouettes, and thus were highly representative of their subgroups (Figure 3.1E)54. This method was also used to identify and remove poorly classified samples, which had negative silhouette values as described previously50. Therefore, using both cHCL and cNMF, we identified 3 principal subgroups of ependymoma divided largely anatomically into ependymomas arising from: 1) the supratentorial region (ST), 2) the posterior fossa (PF), and 3) a third group that included a mixture of both spinal cord and posterior fossa (PF + SP) tumors (Figure S3.1). We next evaluated whether a significant number of the same genes defined the same three subgroups in both datasets. To answer this question, we identified common subtypes across the two datasets by Subclass mapping, and verified that the subgroups identified in both datasets were defined by the same transcriptional signature (Figure 3.1F)55. We conclude that there are three transcriptionally defined principal subgroups of ependymoma, and that a subset of posterior fossa ependymomas is more similar to spinal tumors. The subgroup comprised purely of PF ependymomas was designated Group A (PFA), while PF tumors that clustered with spinal ependymomas in the previous analysis (PF + SP) were designated Group B (PFB, Figure 3.2A-

B). We next examined the demographic and clinical differences between Group A and B PF ependymoma.

3.3.2 Group A and B posterior fossa ependymoma are demographically and clinically distinct

21

In the Toronto and Heidelberg datasets, both independently and combined, Group A tumors developed in younger patients (median age of 2.5 years), while Group B tumors occurred predominantly in older patients (median age of 20, Figure 3.2C). Anatomically, 67% of Group

A tumors occurred laterally, invading into surrounding cranial nerves and arteries, while 95% of

Group B ependymomas occurred along the midline (Figure 3.2D). Additionally, Group A tumors demonstrated a propensity to invade into the cerebellum (Figure 3.2E). Patients with 5- years of clinical follow-up, harbouring Group A ependymomas, exhibited a significantly increased incidence of tumour relapse (Figure 3.2F), and mortality (Figure 3.2G). 5-year progression-free and overall survival rates were 47% and 69% for Group A tumors (Figure 3.2H

- left panel), Figure S3.2), and 79% and 95% for Group B tumors (Figure 3.2H - right panel), respectively. Detailed clinical information for Group A and B tumors of the Toronto and

Heidelberg cohort can be found in Witt and Mack et al (2011)20. We conclude that the two subgroups of PF ependymoma identified through transcriptional profiling are both molecularly and clinically distinct, and that their differences are highly clinically significant and relevant.

3.3.3 Group A and Group B posterior fossa ependymoma are characterized by distinct copy number landscapes

We analyzed all samples from both cohorts, studied by gene expression profiling, for which sufficient amounts of DNA were available (n=152), by array comparative genomic hybridization (aCGH). Group A and Group B ependymomas were identified as outlined above, yielding 75 tumors where both subgroup assignment and aCGH data were available. Group A

22 tumors exhibited a largely balanced genomic profile, with an increased occurrence of chromosome 1q gain compared to Group B (Figure 3.3A-B). Normal cell contamination in

Group A was a less likely explanation for the balanced genomic landscape observed as the majority of samples were composed of tumor cells (≥80%) by Hematoxylin and Eosin staining

(Figure S3.3), and as chromosome 1q gains, when present, were found in the majority of cells as shown by FISH (Figure S3.3). Lastly we compared the copy number intensities derived from the aCGH signals for chromosome 1q gain, 6q loss, and 9q gain, and demonstrated that chromosomal aberrations in Group A had similar intensities as aberrations in Group B (Figure

S3.3). This is in line with previous reports that a large proportion of PF ependymoma exhibit a balanced genomic profile18,19,47-49. In contrast, Group B ependymomas exhibited numerous cytogenetic abnormalities involving whole or chromosomal arms, including loss of , 2, 3, 6, 8, 10, 14q, 17q, 22q, and gain of 4, 5q, 7, 9, 11, 12, 15q, 18, 20, 21q

(Figure 3.3A-B). Chromosomal aberrations also classified tumors into the recently described cytogenetic risk groups19. Group A ependymomas included predominantly high-risk groups 2 and 3 (Figure 3.3A), while the vast majority of Group B tumors were classified into the low-risk group 1. We conclude that PF Group A and Group B ependymomas have distinct somatic copy number events in addition to the transcriptional and clinical differences described above.

3.3.4 Group A and Group B ependymoma are characterized by distinct biological processes and signaling pathways

23

In order to identify the biological basis distinguishing Group A and B we used Gene Set

Enrichment analysis56. Gene sets were collected and compiled from the National Cancer

Institute (NCI), Kyoto Encyclopedia of Genes and Genomes (KEGG), Protein Families

(PFAM), Biocarta, and Gene Ontogeny (GO) pathway databases. In order to visualize significant gene sets as interaction networks we used Cytoscape and the Enrichment Map plug- in (Figure 3.4)57.

Group B ependymomas were characterized by gene sets largely involved in ciliogenesis and microtubule assembly, and to a lesser extent mitochondria and oxidative metabolism. In order to validate this pathway, we selected Kinesin Family Member 27 (KIF27), an established marker of cilia signaling in other brain neoplasms58,59, and demonstrated in an independent TMA cohort, that 91% of Group B tumors stained positive, while 17% of Group A tumors showed a pattern of positivity (Figure S3.4).

Group A ependymomas were defined by numerous cancer-related networks namely (HIF-1α signaling, VEGF signaling, cell migration), PDGF signaling, MAPK signaling, EGFR signaling, TGF-β signaling, integrin signaling, extracellular matrix assembly, tyrosine-receptor kinase signaling, and RAS/Small GTPase signaling. Since extracellular matrix signaling was the most significantly enriched Group A pathway. We therefore selected

Tenascin-C (TNC), a type of matrix glycoprotein, which has been reported to be up-regulated in pediatric ependymoma, as a marker of ECM signaling60. 94% of Group A tumours stained positive for TNC, while only 11% of Group B tumors showed positivity (Figure S3.4). We

24 conclude that the marked difference in active signaling pathways between Group A and Group

B supports their distinct natures, and suggests possible leads for future subgroup specific targeted therapies.

3.3.5 Validation of posterior fossa ependymoma subgroups

In order to validate the two subtypes of PF ependymoma, an orthogonal technology was applied to an additional and independent cohort of ependymomas. We selected IHC markers up- regulated in each PF subgroup, and stained a TMA comprised of an entirely non-overlapping cohort of 265 PF ependymomas. Using a Goeman’s Global test, subgroup markers were identified from the transcriptional data by their ability to distinguish between the two pre- defined (A vs B) subgroups (Figure 3.5A, Figure S3.4). Laminin alpha-2 (LAMA2), which exhibited an increase in expression in Group A versus B, was selected as a marker of Group A

(Figure 3.5B). Neural Epidermal Growth Factor Like-2 (NELL2), which exhibited an increase in expression in Group B versus A, was selected as a marker of Group B (Figure 3.5B). Further, to assure clinical applicability of our markers, we prioritized selection and optimization of commercially available antibodies to LAMA2 and NELL2 as markers of Group A and Group B

PF ependymoma, respectively (Figure 3.5C). We then proceeded to stain our TMA validation cohort consisting of 265 PF ependymomas, in which 32% stained positive for LAMA2 and negative for NELL2, 52% stained positive for NELL2 and negative for LAMA2, 8% stained positive for both markers, while 8% stained negative for both markers (Figure 3.5D, Figure

25

S3.4). Detailed clinical information on tumors staining for LAMA2 or NELL2 in the validation cohort is summarized in Table S3 of Witt and Mack et al., (2011)20.

We also observed numerous clinical disparities between Group A and B patients of the IHC validation cohort consonant with the differences observed in the gene expression cohorts.

Patients with Group A ependymomas were significantly younger (median age 4 versus 39 years)

(Figure 3.6A), more commonly male (Figure 3.6B), and were more frequently classified as

WHO grade III ependymomas (Figure 3.6C). Furthermore, patients with Group A ependymomas had a higher incidence of metastases at the time of recurrence, and had a significantly poorer prognosis as compared to Group B patients (Figure 3.6D-F). Progression- free and overall survival rates were of 24% and 48%, for Group A tumors and 92% and 98% for

Group B tumors (Figure 3.6E-F). Interestingly, when cases stained for both or none of the two markers, these tumors had no association with metastasis at time of recurrence, and formed two intermediate patient survival groups, suggesting possible additional inter-tumoral heterogeneity within PF ependymoma. We conclude that Group A tumors are clinically and molecularly distinct from Group B tumors, have a worse prognosis, and can be identified through routine

IHC staining of formalin fixed paraffin embedded sections for LAMA2 and NELL2.

The most frequently reported prognostic indicator in ependymomas is the extent of surgical resection61,62. In line with these reports, the extent of resection, in our datasets was an important and independent variable on a Cox proportional hazard model for the entire current cohort of PF tumors when not accounting for subgroups (Table 3.1 and Figure S3.5)61-68. Because Group A

26 ependymomas are more commonly located laterally in the posterior fossa, we asked whether the observed differences in outcome were secondary to or confounded by a decreased incidence of gross total resection. However, we found that patients with Group B tumors, amenable to complete resection, displayed a 5-year progression-free and overall survival rate of 91% and

100%. Importantly, patients with completely resected Group A tumors, still exhibited worse progression-free and overall survival rates of 18% and 52%, respectively (Figure 3.6G-H).

These findings demonstrate that the difference in prognosis between PF ependymoma subgroups is not solely based on an increased incidence of incompletely resected Group A tumors.

We next evaluated to combination of numerous prognostic factors within Group A and B using a Cox proportional hazard model that included extent of resection, age, gender, WHO grade, use of radiotherapy, NELL2 staining (Group B), and LAMA2 staining (Group A). This analysis revealed that histological markers of Group A and B were the strongest and independent predictors of both overall and progression-free survival (Table 3.2) in patients with PF ependymoma. Indeed, after accounting for PF subgroup, level of resection was no longer prognostic for overall survival and had a lower influence on progression-free survival prediction

(Table 3.2), as compared to an analysis that did not include subgroup assignment (Table 3.1).

Although these data will need to be validated in the setting of a prospective clinical trial, the markers of Group A and B presented represents an effective and clinically applicable tool for rapid and robust risk stratification of PF ependymoma.

27

3.4 Chapter 3: Discussion

Our data supports the existence of two molecularly distinct groups of PF ependymoma that although histologically similar, differ in their demographic, transcriptional, clinical, and outcome characteristics. The strength of this hypothesis is supported by our approach using two non-overlapping cohorts of PF ependymomas, studied in two geographic locations, with two distinct expression array technologies, and then subsequent validation using an orthogonal technology, on a third independent cohort of PF ependymomas. Our study of 583 ependymomas represents the largest cohort of ependymomas analyzed to date.

A prior publication studying a smaller cohort of ependymomas (29 total tumors) demonstrated that ependymomas from different regions of the nervous system (supratentorial, posterior fossa, and spine) had regionally specific transcriptional profiles and somatic copy number alterations suggesting that ependymomas from different regions of the nervous system were separate entities17. It is noteworthy that in this publication, all of the supratentorial tumors cluster together, as do all of the spinal ependymomas. However, the spinal tumors cluster between two groups of posterior fossa ependymomas, in keeping with the findings of the current manuscript where some PF ependymomas (Group B) are more similar to spinal ependymomas than they are to the other subgroup of PF ependymomas (Group A).

28

A recent publication described a mouse model of supratentorial ependymoma, which was generated by transplanting neural stem cells harboring a combination of deletion of Ink4a/Arf-/- and/or overexpression of Ephb218. These mice developed supratentorial tumors that showed similar patterns of gene expression and cytogenetic events to 1 of 4 subgroups of supratentorial ependymoma identified from studying a total of 83 human ependymomas. The same study also indicated molecular heterogeneity within PF tumors, but this aspect was not a focus of the study, and no correlation with clinical outcome was presented for the PF tumors. Inspection of the subgroups from Johnson et al. (2010)18 reveals a subgroup that includes both PF and spinal ependymomas, that is genomically unstable, that includes many adults, (Johnson et al. subgroup

F) and therefore is similar to PF Group B in the current manuscript. The Johnson et al. study also describes three subgroups comprised almost exclusively of PF ependymomas (Johnson et al. subgroups G, H, and I), which show few cytogenetic aberrations other than gain of chromosome 1q, includes a large number of infants, and are therefore similar to the Group A described in the current manuscript. In line with previous reports, we conclude that a subset of ependymomas, labeled in this study Group A, exhibit a largely balanced chromosomal profile9,18,47,69,70. The high degree of genomic instability in Group B and association with an improved prognosis is a paradox that has been observed in other neoplasms of the breast, stomach, and lung71.

In our study, loss was one of the most frequent genomic alterations, occurring often in Group B PF tumors and rarely in Group A tumors. Neurofibromin-2 (NF2), located at chromosome 22q12.2, is thought to be the candidate of this region

29 because patients with NF2 mutations develop numerous neuro-epithelial neoplasms including spinal ependymoma. However, NF2 mutations have been found exclusively in ependymomas of the spinal cord, highly suggesting the existence of another chromosome 22q tumor suppressor gene in the case of PF ependymoma72,73.

Examination of molecular pathways characterizing the two PF ependymoma subgroups revealed two diverse patterns of alteration, again suggesting the existence of two biologically distinct classes of posterior fossa ependymoma. In Group B ependymomas only two pathways: ciliogenesis/microtubule assembly and mitochondrial/oxidative metabolism were exclusively deregulated. A more heterogeneous picture of pathway alteration was seen in Group A tumors, including several canonical cancer-associated pathways. Among them were Angiogenesis (HIF-

1α signaling, VEGF pathway), PDGF signaling, MAPK signaling, EGFR signaling, TGF-β signaling, Tyrosine-receptor kinase signaling, RAS signaling, and Integrin/ECM signaling.

Molecular research into the pathways driving Group A and B may yield targets for subgroup specific therapy. Because Group A patients experience poor outcome, and because there are no currently known effective chemotherapeutic regimens in ependymoma, the therapeutic threshold would be very low in this underserved patient population. Also, the highly distinct Group A and

B signaling pathways may also be reflective of different cells of origin, in keeping with a mechanism that has recently been shown for supratentorial ependymoma18 and for Wnt subgroup medulloblastoma74. The transcriptional profiles of Group A and B presented in this manuscript will serve as a resource to help guide future attempts to pinpoint possible alternative cells of origin giving rise to PF ependymoma.

30

The most highly differentially expressed genes in Groups A and B, in both the Toronto and

Heidelberg datasets, revealed candidate marker genes for distinguishing the two groups; the most striking being up-regulation of LAMA2 in Group A and NELL2 in Group B. Other markers of Group A included previously reported biomarkers of poor patient outcome including

CHI3L1, TNC, VEGF, EGFR, ERRB4, BIRC5, and S100A6 (Figure S1.4)8,47,75-77. This evidence, in addition to the lack of prognostic significance for chromosome 1q gain in Group A patients, suggests that some previously reported markers of poor outcome may have been surrogate markers for Group A.

As Group B ependymomas are much less likely to recur, metastasize, or result in the death of the patient, validation of our results in additional cohorts of patients would suggest that Group B patients could be treated less aggressively than Group A patients. Conversely, the poor outcome for Group A patients underlines the need for rapid development of adjuvant therapies for these patients. We anticipate that analysis of additional cohorts of PF ependymoma will further support the existence of at least two divergent molecular variants that are demographically, genetically, transcriptionally, and clinically distinct22. The antibodies described for LAMA2 and

NELL2 are both commercially available, and therefore should be widely available across the globe for validation of our results, and eventually for use in prognostication and stratification of

PF ependymoma patients. We would also suggest that future clinical trials should prospectively validate IHC staining for LAMA2 and NELL2 on formalin-fixed, paraffin-embedded tumor material. Importantly, to further improve our understanding of the molecular biology of these

31 posterior fossa subgroups, prospective investigations into the cell-of-origin and genetic driver mutations are desperately needed, including modeling of PF ependymoma in the mouse. Finally, the distinct molecular characteristics of these two groups of PF ependymoma suggest that subgroup-specific targeted therapies against subgroup-specific deregulated pathways are needed in future treatments of these tumors.

32

3.5 Chapter 3: Figures and Tables

Figure 3.1 A D TORONTO HEIDELBERG

n=102 0.5 n=75

0.5 TORONTO HEIDELBERG 0.4

0.4 SigClust: p-values SigClust: p-values

0.3 ST PF PF + SP ST PF PF + SP 0.3 ST ------ST ------0.2 0.2 PF 3.4E-14 -- -- PF 1.9E-25 -- -- Change in area under CDF curve 0.1

0.1 PF + SP 2.1E-19 1.5E-17 -- PF + SP 8.9E-56 5.3E-48 -- 0.0 2 4 6 8 10 0.0 2 4 6 8 10 Number of clusters Number of clusters B E TORONTO HEIDELBERG TORONTO HEIDELBERG n=102 n=75 PF PF

PF + SP PF + SP

ST ST

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Silhouette width Silhouette width ST PF PF + SP PF ST PF + SP C F TORONTO HEIDELBERG n=102 n=75 PF + SP PF HEIDELBERG ST

ST PF + SP PF PF + SP PF ST PF + SP PF ST p-value (FDR corrected) Rand Index=0.96, p<0.0001 Rand Index=0.85, p<0.0001 TORONTO 0.0 0.2 0.4 0.6 0.8 1.0

Identification of Three Primary Molecular Classes of Ependymoma

(A) Area under empirical cumulative distribution plots (k=2 to k=10), generated from consensus hierarchical clustering of 102 Toronto and 75 Heidelberg samples identifies strongest statistical support for the existence of three primary subgroups of ependymoma. (k denotes the number of clusters) (B) Consensus HCL heatmaps displaying the three robust subgroups of ependymoma defined by gene expression. (C) Consensus Non-Negative Matrix Factorization (NMF) of 102 Toronto and 75 Heidelberg samples at k=3, demonstrates significant concordance with the consensus HCL subgroup classification. Significance of similarity was determined by a Rand index and permutation testing of the Toronto sample labels (see experimental procedures). (D) Significance of HCL subgroup classifications in both datasets determined by pairwise comparisons between all clusters using SigClust. (E) Silhouette analysis identifies “core” samples defined as tumors with positive silhouette values. (F) Submap analysis demonstrates that the HCL-defined clusters identified in the Toronto cohort are nearly identical to the HCL-defined clusters defined in the Heidelberg cohort. Significance of similarity measured by FDR-corrected p-value.

33

Figure 3.2 A B TORONTO HEIDELBERG C TORONTO 40 TORONTO Group A Group B Group A Group B n=46 p=0.0002 LAMA2 TNC 30 CALB1 PLAG1 20 Age MOXD1 SERPINE2 10 ALDH1L1 CYP1B1 0 COL4A1 Group A Group B Group A HEIDELBERG RELN Group B HCN1 HEIDELBERG NELL2 70 MAL2 n=37 p=0.0001 ELMOD1 60 EYA4 50 AGBL2 40

LINGO2 Age +2.0 TSGA10 30 KCNJ16 20 LGR6 0.0 10 ROPN1L EFHB 0 -2.0 Group A Group B D E F G PF Location Cerebellar Invasion Recurrence after 5 years Mortality after 5 years HEIDELBERG HEIDELBERG HEIDELBERG HEIDELBERG 5% 5% 5%

25% 33% 33% 44% 35% 67% 56% 65% 95% 67% 95% 75% 95%

Group A Group B Group A Group B Group A Group B Group A Group B lateral yes yes yes medial n=37 p<0.0001 no n=37 p=0.04 no n=35 p=0.0424 no n=35 p=0.0164

HEIDELBERG HEIDELBERG 0 0 . . 1 H 1 Group B 8 8 . . 0 0 Group B 6 6 . . 0 0 Group A l probability l probability 4 4 a a . . v v 0 Group A 0 i i v v r r u u 2 S S 2 . . 0 0 0 0 . p=0.017 . p=0.0048 0 0 0 12 24 36 48 60 72 84 96 108 0 12 24 36 48 60 72 84 96 108 PFS (months) OS (months)

Transcriptome Analysis Distinguishes Two Distinct Subgroups of Posterior Fossa Ependymoma

(A) Graphical illustration of global differences between the transcriptomes of Group A and Group B PF ependymomas determined by principal component analysis. Individual tumor samples are represented as spheres, (red = Group A, blue = Group B) and ellipsoids display two standard deviations around each subgroup. (B) Heatmap of the top 100 most discriminating genes between Group A and Group B ependymomas, in both datasets, as calculated by Goeman’s Global Test statistic. (C) Box plots demonstrating the association of Group A ependymomas with younger age at diagnosis, and Group B ependymomas with older age at diagnosis. Median age is also indicated. p-values were determined by a Mann- Whitney test. (D) - (G) Pie charts demonstrating an association of Heidelberg Group A ependymomas with lateral localiza- tion, cerebellar invasion, increased recurrence and death within 5 years from diagnosis as compared with Group B tumors. Statistical significance of PF location, and incidences of recurrence and death was determined by Fisher’s Exact test. (H) Kaplan-Meier survival curves demonstrating a worse progression-free and overall survival in Group A versus Group B ependymomas. Statistical significance was determined by a log-rank test.

34

Figure 3.3 A Group A Group B Gender Age Cyto Risk Gr 1 p < 1 Cyto Risk Gr 2 p < 1 Cyto Risk Gr 3 p = 23 1p Loss p = 3 1q Gain p = 37 1q Loss p = 225 2p Loss p = 225 2q Loss p = 29 3p Loss p = 3 3q Loss p = 3 p Gain p = 5 q Gain p = 23 5q Gain p = 3 6p Loss p = 93 6q Loss p < 1 7p Gain p = 3 7q Gain p = 23 p Loss p = 3 q Loss p = 3 9p Gain p = 1 9q Gain p < 1 1p Loss p = 1 1q Loss p = 5 11p Gain p = 7 11q Gain p = 3 12p Gain p = 2 12q Gain p < 1 1q Loss p = 29 15q Gain p < 1 17q Loss p = 1p Gain p < 1 1q Gain p < 1 2p Gain p = 3 2q Gain p < 1 21q Gain p = 29 22q Loss p = 65

Gender Age Group Aberration B Group A Female < years No No Aberration 1 years 2 6 12 16 22 Male Yes Aberration > 1 years NA action gained or lost r

Identification of Subgroup Specific Copy age F

r Number Alterations in the Posterior Fossa e v

Ependymoma Genome A 1 3 5 7 9 11 13 15 17 19 21 Chromosome (A) Copy number profiling of 75 PF ependy Group B 2 6 12 16 22 disparate genetic landscapes between Group berg copy number datasets have been com

heatmap also displays the association of tumors to cytogenetic risk groups 1, 2 and 3 are also displayed between both subgroups, action gained or lost r (B) Median age F r e v tumors plotted against their chromosomal A 1 3 5 7 9 11 13 15 17 19 21 Chromosome

35

Figure 3.4 Group A

Ciliogenesis and Microtubule Assembly Group B Mitochondrial/Oxidative Metabolism Shared Genes

Integrin Signaling

Cell Adhesion Extracellular Matrix Assembly

Tyrosine Receptor Kinase Signaling Cell Migration

VEGF Signaling

PDGF Signaling EGFR Signaling

Angiogenesis Pathways in Cancer

TGF-B Signaling

Response to Hypoxia and HIF-1a signaling MAPK Signaling

Cell Morphogenesis

RAS Signaling

Gene Set Enrichment Analysis Delineates Biological Pathways and Processes that Define Two Distinct Variants of Posterior Fossa Ependymoma

Gene Set Enrichment Analysis (GSEA) comparing Group A (red) against Group B (blue) PF ependymoma in the Toronto dataset, illustrating distinct pathways and biological processes between both subgroups. (3.5 % FDR, p=0.01) Cytoscape and Enrichment Map were used for visualization of the GSEA results. Nodes represent enriched gene sets, which are grouped and annotated by their similarity according to related gene sets. Enrichment results were mapped as a network of gene sets (nodes). Node size is proportional to the total number of genes within each gene set. Proportion of shared genes between gene sets is represented as the thickness of the green line between nodes. This network map was manually curated removing general and uninformative sub-networks resulting in a simplified network map shown in Figure 4.

36

Figure 3.5 A TORONTO Fold Group A Group B Rank Influence Change LAMA2 2 603.8 5.1 NELL2 1 1588.7 8.1 HEIDELBERG Fold Group A Group B Rank Influence Change LAMA2 1 782.5 9.9 NELL2 12 347.0 3.2 -2.5 0.0 +2.5

B LAMA2-TORONTO LAMA2-HEIDELBERG NELL2-TORONTO NELL2-HEIDELBERG 12 12 5 5 10 10 0 0 8 8 6 Expression (Log2) Expression (Log2) 6 Fold Change (log2ratio) Fold Change (log2ratio)

4 p<0.001 p<0.001 p<0.001 p<0.001 Group A Group B Group A Group B Group A Group B Group A Group B C D Group A 8% 8% 100 µM 100 µM

32%

LAMA2 positive LAMA2 negative 52%

Group B n=265 100 µM 100 µM

NELL2 negative - LAMA2 negative NELL2 positive - LAMA2 negative NELL2 negative - LAMA2 positive NELL2 positive - LAMA2 positive NELL2 positive NELL2 negative

Selection and Optimization of PF Ependymoma Group A and Group B Specific Immunohistochemical Markers

(A) Subgroup-specific expression patterns of selected markers, LAMA2 and NELL2, illustrated by heatmaps in both data sets. Candidate genes were identified using the Goeman’s global test, which assigns a score to each gene based upon its degree of discrimination between defined classes: Group A and Group B. (B) Box plots derived from mRNA expression data displaying overall differences between markers representing Group A (LAMA2) and Group B (NELL2) in the Toronto and Heidelberg cohorts. Comparisons were performed using an unpaired t-test. (C) Representative immunohistochemi- stry (IHC) staining of LAMA2 and NELL2 on an ependymoma tissue microarray (TMA) composed of 265 PF ependymo- mas. (D) Pie chart illustrating the distribution of TMA staining for NELL2 and LAMA2. 84% of posterior fossa ependymo- mas stain positive for a single marker.

37

Figure 3.6 A B C Age Gender Grade 80 p<0.0001 100 p=0.0076 100 p<0.0001 n=265 n=265 n=265 21% 80 80 60 30% 79% 70% 60 48% 60 e

g 40 39 A 52% 58% Percent Percent 40 40 42% 20 20 20 Male Grade 3 4 Female Grade 2 0 NELL2 - NELL2 + NELL2 - NELL2 + NELL2 - NELL2 + LAMA2 + LAMA2 - LAMA2 + LAMA2 - LAMA2 + LAMA2 - D E F None NELL2+/LAMA2- 1 1 1 Both 0.8 0.8 0.8 NELL2+/LAMA2-

0.6 None 0.6 NELL2+/LAMA2- 0.6 None 0.4 0.4 0.4 Both NELL2-/LAMA2+ Both 0.2 0.2

Survival probability 0.2 Survival probability n=265 n=265

Survival probability n=265 p<0.0001 p<0.0001 NELL2-/LAMA2+ p<0.0001 NELL2-/LAMA2+ 0 0 0 0 12 36 60 84 108 132 156 180 0 12 36 60 84 108 132 156 180 0 12 36 60 84 108 132 156 180 Time to Metastasis PFS (months) OS (months) G H Gross-Totally Resected Cases Gross-Totally Resected Cases 1 1

NELL2+/LAMA2- 0.8 0.8 Both NELL2+/LAMA2- 0.6 0.6 Both None

0.4 None 0.4 Survival probability Survival probability 0.2 0.2 n=152 n=152 p<0.0001 NELL2-/LAMA2+ p<0.0001 NELL2-/LAMA2+ 0 0 0 12 36 60 84 108 132 156 180 0 12 36 60 84 108 132 156 180 PFS (months) OS (months)

Clinical and Cytogenetic Characteristics Distinguishing IHC Defined Group A and B Subgroups in a Third Non-Overlapping Posterior Fossa Ependymoma Cohort

(A) - (F) Comparing subgroup-specific demographic and clinical Information in the validation cohort illustrated by: (A) Box plots for age indicating NELL2-/LAMA2+ tumors represent a significantly younger population than NELL2+/LAMA2- tumors, (B-C) Bar graphs demonstrating that NELL2-/LAMA2+ tumors are over-represented by males and WHO Grade III tumors. (D-F) Kaplan-Meier plots demonstrate that NELL2-/LAMA2+ tumors have an earlier time to metastasis, and have a poorer progression-free and overall survival than NELL2+/LAMA2- tumors. Statistical significance of age was determined by a Mann-Whitney test; gender and grade by a Fisher’s Exact test, and time to metastasis and survival by a Log-Rank test. (G–H) Limiting to gross-totally resected cases, Kaplan-Meier curves demonstrating that NELL2-/LAMA2+ tumors have a significantly poorer survival than NELL2+/LAMA2- tumors. Statistical significance was determined by a Log-rank test.

38

Table 3.1 Cox Proportional Hazards models of overall survival and progression-free survival examining clinical variables of posterior fossa ependymomas (TMA Validation Cohort)

Hazard Hazard Ratio Hazard Ratio Variable p-value* Ratio (CI 5%) (CI 95%)

Overall Survival

Age, years (4-18 vs < 4) 0.88 0.29 2.66 0.83

Gender (male vs female) 1.74 0.86 3.52 0.12

Age, years (≥ 18 vs < 4) 0.36 0.11 1.22 0.10

Radiotherapy (yes vs no) 0.45 0.17 1.19 0.11

Resection (GTR vs STR) 0.42 0.21 0.86 0.02

Histology (grade 3 vs grade 2) 3.35 1.37 8.18 0.008

Progression-Free Survival

Age, years (4-18 vs < 4) 0.85 0.39 1.86 0.69

Radiotherapy (yes vs no) 0.72 0.38 1.37 0.31

Gender (male vs female) 1.30 0.83 2.04 0.25

Age, years (≥ 18 vs < 4) 0.42 0.18 1.00 0.05

Histology (grade 3 vs grade 2) 2.74 1.55 4.86 < 0.001

Resection (GTR vs STR) 0.42 0.26 0.67 < 0.001

Note. No. of patients with fossa posterior ependymoma = 265; CI, Confidence Interval; * Wald Test.

39

Table 3.2 Cox Proportional Hazards Model s for Overall Survival and Progression-Free Survival of Posterior Fossa Ependymoma of TMA Validation Cohort

Hazard p- Variable 95% CI Ratio value*

Overall Survival

Gender (male vs female) 1.42 0.67 - 3.04 0.3591

Resection (GTR vs STR) 0.66 0.32 - 1.34 0.2470

Age, years (4-18 vs < 4) 1.81 0.77 - 4.25 0.1746

Histology (grade 3 vs grade 1.92 0.79 - 4.68 0.1486 2)

Age, years (> 18 vs < 4) 0.44 0.16 - 1.24 0.1202

Radiotherapy (yes vs no) 0.29 0.11 - 0.77 0.0135

NELL2 (positive vs negative) 0.12 0.03 - 0.44 0.0012

LAMA2 (positive vs 10.55 2.81 - 39.60 0.0005 negative)

Progression-Free Survival

Gender (male vs female) 1.17 0.74 - 1.85 0.5037

Radiotherapy (yes vs no) 0.80 0.44 - 1.47 0.4744

Age, years (4-18 vs < 4) 0.68 0.34 - 1.35 0.2686

Histology (grade 3 vs grade 1.42 0.79 - 2.57 0.2423 2)

Age, years (> 18 vs < 4) 1.80 0.99 - 3.27 0.0535

Resection (GTR vs STR) 0.53 0.34 - 0.83 0.0061

NELL2 (positive vs negative) 0.32 0.17 - 0.61 0.0005

LAMA2 (positive vs 8.45 4.08 - 17.49 < 0.0001 negative)

40

3.6 Chapter 3: Supplemental Figures

Figure S3.1

A B TORONTO TORONTO HEIDELBERG 1.0 1.0 0.8 0.8

2 3 4 0.6 2 0.6 CDF CDF 3 HEIDELBERG 4 0.4 5 0.4 6 2 7 7 3 8 0.2 8 0.2 4 9 9 5 10 10 6 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 consensus index consensus index

TORONTO 0 C D . 800genes 1000genes 1300genes 2000genes 1 8

. PF+SP 0 e v 6 r . 0 0.4 l probability 0.4 0.4 PF 0.4 4 a . v i 0 v r 0.2 0.2 2 0.2 0.2 ST u . 0 e change in area S v

under CDF cu p=0.0034 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 relati 12 36 60 84 108 k k k k PFS(months) 0 HEIDELBERG . 1 PF+SP 8

800genes 1000genes 1300genes 2000genes . 0 e

v PF r 6 . 0.4 0.4 0 0.4 0.4 l probability 4 a . v 0 i v r 0.2 0.2 0.2 0.2 2 . u ST e change in area 0 v S

under CDF cu p=0.0014 relati 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 12 36 60 84 108 k k k k OS(months) E TORONTO HEIDELBERG n=29 n=60 n=32 n=28 n=18 n=29 100 3% 2% 100 7% 14% 4% 28% 80 41% 80

60 60 95% 100%

Percent Percent 83% 40 83% 44% 40 69% SP SP 20 PF 20 PF 16% 3% ST 3% ST 0 0 ST PF PF+SP ST PF PF+SP (A) Empirical cumulative distribution function (ECDF) plots illustrating that subgroup stability is achieved at a maximal number of three subgroups in both Toronto and Heidelberg datasets. (B) Heatmaps from k=2 to k=4 depicting the three principal subgroups from consensus hierarchical clustering in both Toronto and Heidelberg datasets. Consensus hierar- chical clustering was performed with the top 1000 genes exhibiting the greatest median absolute deviation. (k denotes number of clusters) (C) Area under the ECDF plots demonstrating that the principal number of subgroups in both datasets is three despite varying the number of input genes from 800 to 2000. (D) Kaplan-Meier curves comparing the progression-free and overall survival for the three gene-expression defined subgroups identified in the Heidelberg data set. (E) Box plots illustrating the anatomical distribution of ependymomas (ST, PF, SP) for each gene-expression defined subgroup. While two groups were composed of largely ST or PF ependymomas, the third group was comprised of a mixture of PF and SP ependymomas.

41

Figure S3.2

A B

1.0 1.0 1.0 1.0

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 < 18 years < 18 years RT No RT No 0.2 0.2 > 18 years Survival probability > 18 years Survival probability Survival probability 0.2 RT Yes Survival probability 0.2 RT Yes p=0.2297 p=0.6171 p=0.2297 p=0.6171

12 36 60 84 108 12 36 60 84 108 12 36 60 84 108 12 36 60 84 108 Group B - PFS (months) Group B - OS (months) Group B - PFS (months) Group B - OS (months) C D

1.0 1.0 1.0 1.0

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 GTR GTR < 4years < 4years 0.2 STR 0.2 STR Survival probability Survival probability Survival probability 0.2 > 4years Survival probability 0.2 > 4years p=0.1762 p=0.4631 p=0.7297 p=0.3514

12 36 60 84 108 12 36 60 84 108 12 36 60 84 108 12 36 60 84 108 Group B - PFS (months) Group B - OS (months) Group A - PFS (months) Group A - OS (months) E F

1.0 RT No 1.0 RT Yes 1.0 1.0 0.8 p=0.0291 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 RT No 0.4 0.4 GTR GTR 0.2 0.2 RT Yes Survival probability Survival probability Survival probability p=0.2161 Survival probability 0.2 STR 0.2 STR p<0.0001 p=0.0051 12 36 60 84 108 12 36 60 84 108 12 36 60 84 108 Group A - PFS (months) Group A - OS (months) 12 36 60 84 108 Group A - PFS (months) Group A - OS (months) G

1.0 1.0

0.8 0.8

0.6 0.6 0.4 0.4 Lateral Lateral

Survival probability 0.2 Medial Survival probability Survival probability 0.2 Medial p=0.0263 p=0.0931 12 36 60 84 108 12 36 60 84 108 Group A - PFS (months) Group A - OS (months)

(A) Kaplan-Meier curves of Heidelberg mRNA-defined Group B tumors, demonstrating no prognostic significance between patients <18years versus >18years for both PFS and OS. Statistical significance for all Kaplan-Meier curves in Figure S2 was calculated using a log-rank test. (B) Kaplan-Meier curves of Heidelberg mRNA-defined Group B tumors, demonstra- ting no prognostic significance in PFS and OS in patients who received or did not receive radiotherapy. (C) Kaplan-Meier curves of Heidelberg mRNA-defined Group B tumors, demonstrating no prognostic significance in PFS and OS between ependymomas with sub-total and gross-total resection. (D) Kaplan-Meier curves of Heidelberg mRNA-defined Group A tumors, demonstrating no prognostic significance between patients <4years versus >4years for both progression-free survival (PFS) and overall survival (OS). (E) Kaplan-Meier curves of Heidelberg mRNA-defined Group A tumors, demons- trating that Group A patients who received radiotherapy had an improved PFS compared to those who did not receive radiotherapy. No prognostic significance was observed in the case of overall survival. (F) Kaplan-Meier curves of Heidel- berg mRNA-defined Group A tumors, demonstrating that sub-total resection is associated with poor PFS and OS compa- red to gross-totally resected tumors. (G) Kaplan-Meier curves of Heidelberg mRNA-defined Group A tumors, demonstra- ting an improved PFS in patients with midline occurring ependymomas, as compared to lateral tumors. No prognostic significance was observed in the case of OS.

42

Figure S3.3 A Sample 2ep6 Sample 2ep27 Sample 2ep36

100 µM 100 µM 100 µM

B 20 µm 20 µm 20 µm

1q25 1q25 1q25 Sample 2ep6 Sample 2ep27 Sample 2ep36 1q25 C Chromosome 1q Chromosome 6q Chromosome 9q OUP B R G OUP A R G

2ep6 2ep36 2ep27

-1 0 1

Log2ratio

(A) – (C) Assessing the presence of normal cell contamination in a series of Group A tumors. (A) Haematoxylin and Eosin (B) Representa- (C) aCGH heatmaps comparing the intensi-

43

Figure S3.4 A TORONTO HEIDELBERG TORONTO HEIDELBERG CHI3L1 CHI3L1 TNC TNC 15 4 15 4

2 2 10 10 0 0 Log2 ratio 5 Log2 ratio 5 -2 -2 Log2 expression Log2 expression p=0.0128, rank 82 p=0.0026, rank 82 p<0.0001, rank 2 p=0.0004, rank 2 0 -4 0 -4 Group A Group B Group A Group B Group A Group B Group A Group B TORONTO HEIDELBERG TORONTO HEIDELBERG VEGF VEGF EGFR EGFR 14 4 12 2 10 12 2 9 1 10 0 8 Log2 ratio 7 Log2 ratio 0 8 -2 Log2 expression Log2 expression 6 p=0.0008, rank 104 p=0.0002, rank 104 p=0.0283, rank 284 p=0.0382, rank 284 6 -4 -1 Group A Group B Group A Group B Group A Group B Group A Group B

TORONTO HEIDELBERG TORONTO HEIDELBERG ERBB4 ERBB4 BIRC5 BIRC5 2 12 9 6 0 4 10 -2 8 8 2 -4 7 Log2 ratio 6 Log2 ratio -6 0 Log2 expression Log2 expression p=0.0157, rank 76 p=0.0004, rank 76 p=0.0297, rank 281 p=0.0166, rank 281 4 -8 6 -2 Group A Group B Group A Group B Group A Group B Group A Group B B PF - GROUP - A PF - GROUP - B 11% (2/18) 6% (1/18) 0% (0/19) 0% (0/19)

n=18 n=19 89% (16/18) 94% (17/18) p=6.45e-7 100% (19/19) 100% (19/19) p=5.66e-11 Positive Positive Negative Negative IHC: NELL2 IHC: LAMA2 IHC: NELL2 IHC: LAMA2

C Ciliogenesis/Microtubule Assembly D ECM / Integrin Signaling E CILIOGENESIS / MICROTUBULE ASSEMBLY (KIF27 positivity) (TNC positivity) 100 µM 100 µM 9% 6% 100 Yes 100 Yes No No 80 80 KIF27 POSITIVE t t KIF27 NEGATIVE 60 60 en en c 84% 91% c 94% 89% ECM / INTEGRIN SIGNALING er 40 er P P 40 100 µM 100 µM

20 20 n=221 n=222 17% p<0.0001 11% p<0.0001 NELL2- NELL2+ NELL2- NELL2+ TNC POSITIVE TNC NEGATIVE LAMA2+ LAMA2- LAMA2+ LAMA2-

(A) Box plots of selected Group A markers as defined by gene expression in both datasets. Markers selected have been previously reported to be associated with poor patient outcome in ependymoma. Statistical significance was determined using an unpaired t-test. Also shown are the ranks for each gene, which were determined by genes exhibiting the highest ‘Influence Score’ in both datasets as determined by the Goeman’s global test. (B) Pie graphs demonstrating a greater than 94% concordance between gene expression (microarray) and protein expression (IHC) for NELL2 and LAMA2. Statistical significance was determined using a Fisher’s Exact Test. (C) – (E) Validation of the most significantly represented signaling pathways in each PF Group (Group A: Extracellular Matrix (ECM)/Integrin Pathway; Group B: Ciliogenesis Pathway) in the TMA validation cohort, by selection of representative pathway markers. (C) Ciliogenesis involvement in Group B tumors could be demonstrated by positive staining of 91% Group B tumors by a Ciliogenesis marker KIF27, in contrast 17% of Group-A tumors showed pattern of positivity. Statistical significance of pathway markers was determined using a Fisher’s Exact Test. (D) 94% of Group-A tumors showed ECM/Integrin pathway activation as determined positive expression of Tenascin C, a representative candidate ECM/Integrin protein. In comparison, only 11% of Group-B tumors showed activation of this pathway. (E) Representative IHC staining for KIF27 and TNC illustrating the clear difference between positive and negative staining cases

44

Figure S3.5

A NELL2-/LAMA2+ NELL2-/LAMA2+ B ALL TMA PF TUMORS ALL TMA PF TUMORS 0 0 . .

No 1q gain 0 No 1q gain 0 . 1 1 . 1 1q gain 1q gain 1 8 8 . . 8 8 . 0 0 p=0.2194 .

p=0.1642 0 0 6 6 . . 6 6 . . 0 0 0 0 l probability l probability l probability l probability 4 4 a a a a . . 4 4 . v . v v v i 0 0 i i i 0 0 v v v v r r r r GTR GTR u u u u 2 2 . . 2 2 S . S S S . 0 0 STR STR 0 0 p=0.0003 p=0.0432

12 36 60 84 108 132 12 36 60 84 108 132 12 60 108 156 12 60 108 156 PFS (months) OS (months) PFS (months) OS (months)

C NELL2-/LAMA2+ NELL2-/LAMA2+ D NELL2+/LAMA2- NELL2+/LAMA2- 0 0 . 0 0 . . . 1 1 GTR

GTR 1 1 STR STR 8 8 8 8 . . . p=0.02 . 0 p=0.31 0 0 0 6 6 6 6 . . . . 0 0 0 0 l probability l probability 4 l probability l probability 4 4 4 . . . . a a a a 0 0 0 0 v v v v i i i i v v v v r r r r GTR u u GTR 2 u u 2 2 . . 2 . S S . S S 0 0 0 STR

0 STR p=0.81 p=0.34 0 0 0 0 . . . . 0 0 0 12 36 60 84 108 12 36 60 84 108 132 12 36 84 132 180 0 12 36 84 132 180 PFS (months) OS (months) PFS (months) OS (months)

E NELL2-/LAMA2+ NELL2-/LAMA2+ F NELL2+/LAMA2- NELL2+/LAMA2- 1 1 1 <4 years <4 years 1 >4 years >4 years 0.8 0.8 0.8 0.8 p=0.2902 p=0.5229 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 <18 years <18 years 0.2 0.2 0.2 0.2 >18 years >18 years Survival probability Survival probability Survival probability Survival probability p=0.8352 p=0.5001 12 36 60 84 108 12 36 60 84 108 132 12 60 108 156 204 12 60 108 156 204 PFS (months) OS (months) PFS (months) OS (months)

G NELL2-/LAMA2+ NELL2-/LAMA2+ H NELL2+/LAMA2- NELL2+/LAMA2- 1 RT no RT no 1 1 1 RT yes RT yes 0.8 0.8 0.8 p=0.7285 p=0.1599 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 RT no RT no 0.2 0.2 0.2 RT yes 0.2 RT yes Survival probability Survival probability Survival probability Survival probability p=0.1166 p=0.2867

12 36 60 84 108 12 36 60 84 108 12 60 108 156 12 60 108 156 PFS (months) OS (months) PFS (months) OS (months) (A) Kaplan-Meier curves of NELL2-/LAMA2+ tumors demonstrating no prognostic difference in PFS and OS between tumors with or without chromosome 1q gain. Chromosome 1q gain was determined by fluorescent in situ hybridization. Statistical significance for all Kaplan-Meier curves in Figure S6 was calculated using a Log-Rank test. (B) Kaplan-Meier curves of the TMA tumor cohort demonstrating that PF ependymoma patients with sub-total resection have a statistically poorer outcome than patients with gross-total resection. (C) Kaplan-Meier curves of NELL2-/LAMA2+ tumors demonstra- ting that level of resection is prognostically significant only in the case of PFS, and not OS. (D) Kaplan-Meier curves of NELL2+/LAMA2- tumors demonstrating that level of resection is not prognostically significant in both PFS and OS. (E) Kaplan-Meier curves of NELL2-/LAMA2+ tumors demonstrating no prognostic difference in PFS and OS between patients <4years and >4years. (F) Kaplan-Meier curves of NELL2+/LAMA2- tumors demonstrating no prognostic difference in PFS and OS between patients <18years and >18years. (G) Kaplan-Meier curves of NELL2-/LAMA2+ tumors demonstrating no prognostic difference in PFS and OS between patients who received or did not receive radiotherapy. (H) Kaplan-Meier curves of NELL2+/LAMA2- tumors demonstrating no prognostic difference in PFS and OS between patients who received or did not receive radiotherapy.

45

3.7 Chapter 3: Methods

3.7.1 Patients and tumour samples

Clinical samples and data were utilized in accordance with research ethics board approval from both The Hospital of Sick Children (Toronto, Ontario) and DKFZ (Heidelberg, Germany).

Informed consent was obtained from all patients. Patient clinical details can be found in

Publication Table S120 of expression profiling data sets, and in Publication Table S320 for the

TMA validation cohort. Only patients with WHO II and III grade ependymomas were analyzed in this study, while all other variants including subependymomas, ependymoblastomas, and myxopapillary ependymomas were excluded. Heidelberg sample cohort: Snap-frozen primary tumor samples (n=75) for gene expression analysis, and paraffin-embedded samples (n=406) for a tissue microarray analysis were collected from a single-center study at the Burdenko

Neurosurgical Institute in Moscow, Russia. All tumors were banked at the time of primary diagnosis between 1993 and 2007, and diagnosed as M0 at the time of surgical resection. At least 80% of tumor cell content was estimated in all Heidelberg and TMA samples by staining cryosections (~5um thick) of each sample with hematoxylin and eosin. Diagnoses were confirmed by histopathologic assessment by at least two neuropathologists, including a central review that utilized the 2007 WHO classification for CNS tumors. The treatment strategy is described below in brief: Patients with grade 2 tumors after complete tumor resection received no adjuvant postoperative treatment and were left under observation. Patients aged 4 years or older, who had grade 3 tumors or grade 2 tumors after incomplete resection received

46 radiotherapy. Evaluation of the extent of surgical resection was based on neurosurgical assessment, and postoperative contrast magnetic resonance imaging within 48 hours of tumor resection. Tumor removal was evaluated as complete with no visible tumor tissue remaining (i.e.

Gross-Totally Resected), or incomplete (i.e. Sub-Totally Resected). All remaining patients younger than 4 years received adjuvant chemotherapy. The anatomical location of tumors within the posterior fossa was divided into those occurring along the midline (i.e. protruding into the 4th ventricle) and those that occurred laterally growing along the cerebellopontine angle. Median and lateral classification was determined by neuro-radiological observation, in which only final reports were used. Tumors in which a significant portion of the tumor (>33%) had protruded through the Foramen of Luschka into the cerebellopontine angle were identified as lateral ependymomas. A tumor with a large medial component, and a small tongue of tumor protruding out of the Foramen of Luschka would be classified as a medial tumor. Toronto sample cohort:

The tumor cohort analyzed in Toronto was composed of 102 snap-frozen samples, which were received from numerous centers.

3.7.2 Nucleic acid isolation

Extraction of high molecular weight DNA from frozen tumor samples of Heidelberg cohort was carried out as previously described78. Briefly, genomic DNA from peripheral blood mononuclear cells of healthy donors (pool of ten male and female donors, respectively, age 25-

40 years) used as a control was isolated by use of the Qiagen DNA Blood Midi-Kit (Hilden,

Germany). Regarding the Heidelberg cohort, RNA was extracted from biopsy samples after

47 milling of the frozen sample in a Micro-Dismembrator S (B. Braun, Melsungen, Germany).

Frozen cell powder was immediately disintegrated in Trizol (Invitrogen, Carlsbad, USA) solution and extracted by RNeasy mini columns (Qiagen, Hilden, Germany) following the manufacturer’s instructions. Extraction of high-molecular weight DNA and high-quality RNA from frozen tumor samples of the Toronto cohort was carried out as previously described79.

3.7.3 Gene expression microarray processing and data filtering

Toronto ependymoma samples were analyzed on the Affymetrix Human Exon 1.0 ST Gene

Chip at the London Regional Genomics Centre (London, Ontario). Sample library preparation, hybridization, and quality control were performed according to Affymetrix recommended protocols. CEL files were imported into Affymetrix Expression Console (Version 1.1) and gene level analysis (CORE content) was performed. Arrays were Quantile normalized (sketch) and summarized using PLIER with PM-GCBG background correction. Probesets were annotated according to the human genome build HG19 (GRCh37).

Processing and hybridization of RNA isolated from 75 ependymomas of the Heidelberg cohort was processed and hybridized to the 4x44K feature Agilent Whole Human Genome Oligo

Microarray according to the manufacturer’s instructions. All hybridization experiments were investigated by labeling tumor sample against reference probes normal brain pool, and scanned in a two-color Agilent Scanner G25505B (Agilent, Santa Clara, USA) according to the manufacturer’s specification. Array raw data were generated from scanned images using Feature

48

Extraction 9.1 Software (Agilent, Santa Clara, USA). Pre-processing of the data, and quality control were conducted with our in-house developed ChipYard framework for microarray data analysis (http://www.dkfz.de/genetics/ChipYard/) using R80 and Bioconductor81 packages.

Feature signals had to fulfill the following criteria to be considered for further analysis: a minimal signal to background ratio ≥ 1.5 in at least one channel; a minimal raw signal of 200; a mean to median spot intensity ≤ the 75% Quantile + 3 times the interquartile range of all features on the array; and a feature replicate standard deviation ≤ 0.25 per array. Normalization of raw signals was performed using vsn82. Based on BLASTing the probes sequence information against the genome, biological annotations were retrieved from EnsEMBL (version 54, NCBI

Build 36 of the human genome reference sequence). Gene expression data for both Toronto

(GSE27279) and Heidelberg (GSE27287) datasets can be found at the NCBI Gene Expression

Omnibus repository.

3.7.4 Array-based comparative genomic hybridization

Array- (or Matrix-) CGH83 at an average probe spacing of 0.4 Mb was carried out as previously described19 for the Heidelberg and Toronto data sets. Microarray data analysis was performed as previously reported84. The extracted raw data was preprocessed and normalized with the R software (R Development Core Team, 2010) using the microarray data analysis framework

ChipYard. To eliminate low-quality spots, three different quality criteria were applied, described elsewhere19. A k-state hidden Markov model was fitted to the normalized data to identify genomic breakpoints. The median absolute deviation (MAD) of balanced regions was used to

49 define the sample experimental variation (SDa) per array. Genomic alterations were defined as log2-ratios larger than 2 times SDa (gains) or smaller than 2 times SDa (losses). Clones with incomplete mapping information, missing data in more than 20% of patient samples or known to harbor copy-number polymorphisms according to the Database of Genomic Variants (February

2007, http://projects.tcag.ca/variation/project.html) were excluded from further analyses85. This resulted in a total of 10471 evaluable clones per each tumor sample. Normalized log2-ratios and raw data sets are available at the NCBI Gene Expression Omnibus (GSE27287) for the Toronto data set and the DKFZ Heidelberg data set19.

3.7.5 Statistical analysis of gene expression based subgroups

Missing values in the Agilent gene expression dataset were imputed using k-nearest neighbor algorithm86. In both datasets, duplicate probesets were filtered to the highest variant probesets.

For HCL consensus clustering, datasets were reduced to 1000 probesets exhibiting the largest median absolute deviation (MAD). For NMF consensus clustering, datasets were reduced to

5000 probesets exhibiting the largest MAD scores. Since the Heidelberg dataset contained negative values, probesets were further preprocessed as previously described87.

50

3.7.6 Identification of gene expression derived subtypes

To detect robust sample clusters, we used hierarchical clustering with agglomerative average linkage as our method for consensus clustering. (R package: ConsensusClusterPlus; 88. Datasets were median-centered and the distance measure was computed as 1 minus the Pearson´s correlation coefficient. Clustering was performed over 1000 iterations at a sub-sampling ratio of

0.8. SigClust was used to compute the significance of each of the identified clusters in a pairwise fashion (R package: SigClust)53. To establish whether samples were representative of their subtype, silhouette analysis was performed to identify “core” samples (R package: cluster)30. Only members with a positive silhouette value were retained for further analysis as representative samples of their subgroup assignment. Subclass mapping using the Submap module (version 3), within GenePattern software was performed to assess the commonality of the subtypes identified in the two data sets55,89. Non-negative matrix factorization (NMF) was used to assess the sample memberships at a 3-subgroup classification. NMF (R package: NMF version 0.5.02) was performed on each dataset for 1000 re-sampling iterations using the parameters described previously52. Sample memberships were then compared against the HCL analysis using a Rand index. The significance of the Rand index was assessed by permutation of the Toronto sample labels and calculating the Rand Index over 10,000 iterations in order to generate a null distribution of Rand index values.

Principal components analysis was performed within Partek Genomics Suite (Partek Inc.) to compare Group A and Group B posterior fossa subtypes with the same 1000 genes used for

51

HCL consensus clustering. Both groups were highlighted by their HCL cluster membership, and encircled by ellipsoids measuring 2 standard deviations around the center of each subgroup.

3.7.7 Identification of biological pathways distinguishing group A from group B posterior fossa ependymoma

Gene Set Enrichment Analysis56, as visualized in Cytoscape (version 2.7.0)90, and the

Enrichment Map software57, were used to identify the biological processes discriminating Group

A from Group B ependymomas. Gene sets were compiled from NCI91, KEGG92, PFAM93,

Biocarta: Pathway Resource (www.biocarta.com), and Gene Ontology

(http://www.geneontology.org) databases. Using the Toronto dataset, GSEA analysis was performed using gene-set permutations with a FDR cutoff of 0.035 and p-value cutoff of 0.01.

The network map was manually curated removing general and uninformative sub-networks and nodes resulting in a simplified network map shown in Figure 4. The complete network map for the Toronto dataset can be found in Witt and Mack et al., Figure S4A20. Due to differences in microarray platforms, different GSEA parameters were needed in order to generate the network map comparing Group A and Group B in the dataset from Heidelberg (Publication supplemental

Figure S4B; FDR<0.10, p-value<0.01)20. GSEA results for both Toronto and Heidelberg datasets can be found in Publication Supplemental Table S220.

52

3.7.8 Selection of candidate genes and gene signatures

We computed the test statistic of Goeman’s Global test for logistic regression to derive the contributions made by each of the genes on discriminating between the two groups94. Individual transcripts were then ranked by their influence. Finally, we selected 100 candidate genes with highest ranks in both datasets, as representative subgroup markers for Group A and Group B.

3.7.9 Statistical analysis of clinical parameters

Estimation of survival time distribution was performed by the method of Kaplan and Meier. For comparisons of two or more survival curves, the log-rank test was used. Comparisons of binary and categorical patient characteristics between age groups were performed by use of a two-sided

Fisher’s exact test. To evaluate the status of recurrence, secondary metastasis and mortality 5 years after diagnosis, we selected patients with a 5-year follow-up within the expression and validation data set. Wilcoxon rank sum test was used to perform two-sample location tests for at least ordinal covariates. Multivariate Cox regression analysis was used to evaluate the impact of subgroup markers together with prognostically relevant clinical and histopathological factors.

The result of a test was always judged as statistically significant when the corresponding two- sided p-value was smaller than 5%. The prognostic value of clinical and molecular factors was assessed by their estimated hazard ratios including 95% confidence intervals. All statistical computations were performed with the statistical software environment R, version 2.9.080 using

53 the R packages Design, version 2.2-0, and aCGH, version 1.18.0, from the Bioconductor project.

3.7.10 Construction of ependymoma tissue microarrays

Hematoxylin and eosin (HE) stained sections from all 406 paraffin blocks were prepared to define representative tumor regions. In addition, 10 samples of non-tumor brain tissues were included as a control. All tissue specimens were arrayed into a recipient block as previously described 95.

3.7.11 Immunohistochemistry

Antibodies against the following antigens were used: LAMA2 (Abnova, Taipei City, Taiwan;

H00003908-M0; dilution 1:2500), NELL2 (Abcam, Cambridge, MA; ab80885; dilution

1:1000), KIF27 (Altas, Stockholm, Sweden; HPA022246; dilution 1:500) and TNC (Novus

Biologicals, Cambridge, UK; NB110-68136; dilution: 1:1000). TMA staining was performed, evaluated and scored as published96

54

4 Chapter 4:

Somatic mutational and epigenomic landscape of Group A and B posterior fossa ependymoma

4.1 Chapter 4: Summary

Ependymomas are common childhood brain tumors that occur throughout the nervous system, but are most common in the pediatric hindbrain. Current standard therapy comprises surgery and radiation, but not cytotoxic chemotherapy as it does not further increase survival. Whole- genome and whole-exome sequencing of 47 hindbrain ependymomas reveals an extremely low mutation rate, and zero significant recurrent somatic SNVs. While devoid of recurrent SNVs and focal copy number aberrations, poor prognosis hindbrain ependymomas exhibit a CpG island methylator phenotype (CIMP). Transcriptional silencing driven by CpG methylation converges exclusively on targets of the polycomb repressor complex 2 (PRC2) that represses expression of differentiation genes through tri-methylation of H3K27. CIMP-positive (CIMP+) hindbrain ependymomas are responsive to clinical drugs that target either DNA or H3K27 methylation both in vitro and in vivo. We conclude that epigenetic modifiers are the first rational therapeutic candidates for this deadly malignancy, which is epigenetically de-regulated but genetically bland.

55 4.2 Chapter 4: Introduction

Ependymomas are malignancies that occur throughout the nervous system, but are more common in the hindbrain in children, as opposed to supratentorial and spinal cord tumors, which are more frequently diagnosed in adulthood. Despite being histologically identical, ependymomas from different regions of the nervous system are biologically and clinically distinct1,2. Current therapy for all ependymoma patients consists of maximal safe surgical resection, followed by radiation therapy3. While adjuvant chemotherapy is routine for most children with malignant brain tumors, it is not part of the current standard of care for ependymoma patients as multiple clinical trials have failed to show any survival benefit after cytotoxic chemotherapies4,5. Even at the time of disease recurrence, chemotherapy has not been shown to be effective for ependymomas; therefore, many children with recurrent ependymoma undergo a full but palliative second course of cranial irradiation6-8. Indeed, while treatment protocols for many other childhood malignancies have changed and improved in the past two decades, ependymoma therapy remains stagnant. The mechanisms underlying the chemo- resistance of ependymoma are not known.

Within each anatomic compartment (supratentorial/hindbrain/spinal), there is additional inter- tumoral heterogeneity in the form of well-documented molecular subtypes of ependymoma9-11.

Ependymoma subtypes are clinically and functionally relevant, as rational therapies may only be effective in a single subtype of the disease12. Ependymomas are thought to arise from the regionally distinct radial glial cells (RGCs). Differences between those RGC populations are likely carried forward in the neoplasm, and may account for a portion of the observed heterogeneity1,9. Hindbrain ependymomas occur within the posterior fossa of the skull, and are clinically referred to as ‘posterior fossa’ (PF) ependymomas. There are two clear and distinct

56 subtypes of PF ependymoma; one that occurs in older children and adults with very good prognosis (Posterior Fossa Group B, or PFB), and another found predominantly in infants, which is associated with poor prognosis in spite of maximally aggressive therapy (Posterior

Fossa Group A, or PFA)10,11.

4.3 Chapter 4: Results

4.3.1 The somatic mutational landscape of ependymoma

To uncover the biology of PF ependymomas, we undertook whole genome sequencing of tumor and matching germline DNA from five ependymomas (3 PFA, 2 PFB), and whole exome sequencing of an additional 42 PF ependymomas and their matching germline DNA (24 PFA, and 18 PFB) (Figure 4.1A-B, Supplemental Figure S4.1, Publication Supplemental Tables S1-

2)97. Unlike some other childhood malignancies, the rate of somatic SNVs did not correlate significantly with the age at diagnosis (Supplemental Figure S4.1)13. Further, the rate of somatic SNVs was extremely low in PF ependymomas, with an average of 5.0 somatic non- synonymous SNVs per exome across the entire cohort (Figure 4.1B), and low in both PFA (4.6

SNVs per tumor) and PFB ependymomas (5.6 SNVs per tumor, Publication Supplemental Table

S3)97. Perhaps the most surprising result was that there were zero significant recurrent mutations across the cohort of 47 PF ependymomas as detected by two different algorithms

MUTSIG14 and MUSIC15 (Figure 4.1C, Publication Supplemental Tables S3-6)97. Despite the absence of significant recurrent SNVs, PFB harbored frequent and recurrent large-scale copy number alterations (CNAs) indicative of chromosomal aneuploidy (Supplemental Figure S4.2).

57 Compared to other malignancies, PF ependymomas have a very low rate of SNVs per megabase, and the lowest number of recurrent significant SNVs, making PF ependymoma the first malignancy for which genome sequencing across a broad cohort (n = 47) has failed to identify any significantly and recurrently mutated genes (Figure 4.1C, Publication Supplemental Tables

S3-7)97.

4.3.2 DNA methylation profiles of ependymoma are regionally specified

A number of other childhood nervous system malignancies, including medulloblastoma, retinoblastoma, glioblastoma, atypical teratoid/rhabdoid tumor, and neuroblastoma, have recently been demonstrated to harbor a paucity of recurrent mutations, with a significant proportion of the recurrent events converging on epigenetic mechanisms13,16-26. Due to the absence of recurrent and significant SNVs and CNAs, we hypothesized that PFA ependymomas could be driven by epigenetic mechanisms. We studied DNA methylation patterns in a discovery cohort of 79 ependymomas using methyl-binding domain-2 (MBD2) protein recovery followed by hybridization to Nimblegen 385K CpG island promoter plus microarrays (MBD2- chip). Unsupervised consensus clustering of CpG methylation profiles yielded three distinct subgroups, composed of supratentorial, posterior fossa, and mixed spinal/posterior fossa tumors respectively, in a pattern highly similar to that yielded by unsupervised clustering of gene expression profiles (Figure 4.2A, Supplemental Figure S4.3)10. The group of pure posterior fossa tumors corresponds to PFA ependymomas, while the PFB ependymomas cluster with the spinal ependymomas. We validated our discovery cohort findings through study of a non- overlapping cohort of 48 PF ependymomas using an orthogonal technology (Illumina Infinium

450k methylation arrays). In these validation experiments, the DNA methylome of PFA

58 ependymomas was very distinct from PFB tumors (Figure 4.2B, Supplemental Figure S4.4).

Unsupervised clustering of CpG methylation signatures was very robust, supporting two major molecular subtypes, even after applying a number of distinct bioinformatic and biostatistical techniques (Supplemental Figure S4.4). We conclude that PFA and PFB ependymomas have very distinct methylomes, and that epigenetic biomarkers could be used to develop a clinically relevant molecular classification of PF ependymomas. To this end, we identified three genes that exhibited increased CpG methylation in most PFA tumors, but not in PFB tumors

(Supplemental Figure S4.5). We determined the presence of CpG hypermethylation representing PFA tumors using a mass spectrometry based technology (Sequenom) on our training cohort (Supplemental Figure S4.5-4.6, and Publication Supplemental Table S8). We were able to validate our Sequenom based biomarker panel on an independent cohort of ependymomas using FFPE tissues to predict both progression free and overall survival

(Supplemental Figure S4.5-4.6 and Publication Supplemental Table S8)97. We conclude that division of PFA from PFB ependymomas using a mass spectrometry based biomarker should be feasible in a clinical setting.

4.3.3 Group A posterior fossa ependymoma demonstrate a CpG island methylator phenotype

We next compared the extent of promoter CpG methylation in PFA ependymomas to that of

PFB ependymomas and found that PFA tumors have a much higher extent of CpG island methylation (Figure 4.2C-E, Supplemental Figure S4.7-4.8, and Publication Tables S9-14)97. In comparison to PFB ependymomas, PFA tumors have more methylated CpG sites (Figure 4.2C), more genes with significant CpG methylation (Figure 4.2D), and more genes that are

59 transcriptionally silenced by CpG hypermethylation (Figure 4.2E). We conclude that PFA ependymomas exhibit a ‘CpG Island Methylator’ or ‘CIMP’ phenotype, and suggest that PFA ependymomas be referred to as PFA CIMP positive (PFA-CIMP+) ependymomas, and PFB as

PFB CIMP negative (PFB-CIMP-)27 (Supplemental Figure S4.9-4.10, Publication Supplemental

Table S15)97. To determine the mechanism by which CpG hypermethylation driving transcriptional silencing promotes the pathogenesis of PFA ependymoma, we performed a pathway analysis in our discovery cohort of PFA and PFB ependymomas studied by MBD2- chip (Figure 4.3A, Publication Supplemental Table S16)97. Although olfactory signaling was the only significant pathway enriched in PFB ependymomas, genes CpG methylated in PFA ependymoma showed a remarkable convergence on genes documented as silenced in embryonic stem (ES) cells by the polycomb repressive complex 2 (PRC2). In our non-overlapping, independent validation dataset studied by Illumina Infinium 450K arrays, we observed no significant pathways in the PFB ependymomas, while the PFA tumors exhibited the same convergence on gene targets that are silenced by PRC2 in ES cells (Figure 4.3B, Publication

Supplemental Table S17)97. The PRC2 complex contains the histone methylase EZH2, which tri-methylates H3K27, thereby driving gene silencing. Genes known to be required for differentiation, and which are silenced by PRC2 have been documented to frequently undergo cancer specific CpG methylation, and it is described that both DNA and histone methylation contribute to ongoing gene silencing in these cancers28.

4.3.4 Convergence on PRC2 targets

We next sought to validate these pathway findings by performing H3K27me3 ChIP-seq in 11 primary PF ependymomas. Our findings demonstrate distinct H3K27me3 signatures in PFA-

60 CIMP+ versus PFB-CIMP- ependymomas (Figure 4.3C, Supplemental Figure S4.11-4.12,

Publication Supplemental Table S18). Furthermore, the gene expression of H3K27me3 target genes can robustly stratify PFA-CIMP+ from PFB-CIMP- tumours, thus highlighting the distinct epigenetic differences between these subgroups (Supplemental Figure S4.11).

Examination of differential H3K27me3 targets demonstrated a convergence and significant overlap with PRC2 targets in ES cells observed exclusively in PFA-CIMP+ tumours (Figure

4.3D). Further, a significant proportion of shared PFA-CIMP+ and ES cell H3K27me3 targets were CpG hypermethylated exclusively in PFA (CIMP+) tumours, a pattern that was not detected in PFB-CIMP- ependymoma (Figure 4.3E-F, Publication Supplemental Tables S18-

19)97. We hypothesize, therefore, that hyperactivity of the PRC2 complex leading to tumor suppressor gene silencing with subsequent gene silencing by DNA CpG hypermethylation contributes to the pathogenesis of PFA-CIMP+ ependymoma.

4.3.5 Whole genome DNA methylation profiling of ependymoma

We next sought to expand our global analysis of CpG methylation by performing whole-genome bisulfite sequencing in 3 PFAs, 3 PFBs, 3 fetal normal brains, and 3 adult normal brains

(Publication Supplemental Tables S20-22)97. Here we observed the same patterns of increased

CpG methylation at CpG islands occurring specifically in PFA-CIMP+ tumours consistent with a CpG island methylator phenotype (Figure 4.4A,C, Supplemental Figure S4.13). In line with other solid tumours we identified additional cancer-specific epigenome patterns including hypomethylation of repetitive elements (LINEs, SINEs, and LTRs) restricted to PFA-CIMP+ ependymoma and subgroup specific partially methylated domains (Figure 4.4B,D, Supplemental

Figure S4.13-4.14) These findings illustrate genome-wide DNA methylation alterations in PFA-

61 CIMP+ ependymoma, concurrent with a silent genome exhibiting few CNAs and no significant and recurrent somatic SNVs.

4.3.6 Group A (CIMP+) ependymoma are highly and specifically inhibited by DNA methylation Inhibitors

While our genomic and epigenomic data suggest that over-activity of the PRC2 complex, and/or subsequent promoter CpG hypermethylation may be involved in driving the pathogenesis of

PFA-CIMP+ ependymomas, they do not address whether or not these mechanisms continue to be necessary for tumor maintenance, and would therefore constitute an effective target for therapy. Functional assessment of CpG and histone methylation in ependymoma is harshly limited by the complete lack of established ependymoma cell lines, xenografts, or transgenic mouse models1,9,10. To this end we established four short-term, patient-derived primary ependymoma cultures from two PFA-CIMP+ tumors, and two childhood ST ependymomas.

We were unable to grow any PFB-CIMP- ependymomas in vitro. Treatment of PFA-CIMP+ cultures with the DNA de-methylating agents 5-aza-2’-deoxycytidine (decitabine) resulted in marked de-repression of genesets enriched in EZH2 targets and known DNA hypermethylated genes in other solid cancers (Figure 4.4E-F) Furthermore, compared to ST ependymoma primary cultures in vitro, Decitabine demonstrated significant anti-neoplastic effect on both

PFA-CIMP+ tumors at low dose nanomolar levels (Figure 4.4G). In order to model the effects of Decitabine on PFA-CIMP+ cultures as early, and representative of the patient tumour as possible, we derived a passage zero ex-vivo culture from a PFA-CIMP+ metastasis. In this ex- vivo culture we demonstrate significant impairment of neurosphere colony formation upon

DNA methylation blockade (Figure 4.4H, Supplemental Figure S4.15) Because decitabine is

62 FDA approved for the treatment of hematopoietic malignancies, we propose that it could be rapidly re-purposed in a clinical trial for children with PFA-CIMP+ ependymoma29. In addition, we observed additive effects on combining decitabine and an FDA approved HDAC inhibitor

(suberoylanilide hydroxamic acid, SAHA) against PFA-CIMP+ ependymoma (Supplemental

Figure S4.16).

The tool compound 3-Deazaneplanocin-A (DZNep) is known to target the PRC2 complex and result in diminished tri-methylation of H3K27 through degradation of PRC2 complex proteins30.

Treatment of PFA-CIMP+ ependymoma, but not ST ependymoma with DZNep is highly effective in the nanomolar range in vitro (IC50 for E517 = 95 nm, E520 = 262 nm) (Figure

4.5A). We also observed additive effects between DZNep and SAHA, and DZNep and

Decitabine (Supplemental Figure S4.16). Treatment of PFA-CIMP+ ependymomas with

DZNep, compared to controls, results in decreased expression of EZH2, decreased tri- methylation of H3K27, and increased cleavage of PARP (Figure 4.5B). In vivo treatment of established xenografts of human PFA-CIMP+ ependymoma with DZNep using either a flank model (Figure 4.5C), or an orthotopic intra-cerebellar xenograft model (Figure 4.5D) results in decreased tumor volume and improved survival. Furthermore, PFA-CIMP+ ependymoma cells isolated from tumor xenografts treated in vivo with DZNep have a markedly reduced colony forming ability compared to controls, suggesting that the compound targeted ependymoma cells with clonogenic or tumor initiating potential (Supplemental Figure S4.15). In addition, treatment with a recently published, and extremely potent, highly selective, S-adenosyl- methionine competitive small molecular inhibitor of EZH2 (GSK-343) results in significant de- repression of gene expression in PFA-CIMP+ ependymoma including genes which are known targets of PRC2 in ES cells (Figure 4.5E-F)31. We found that treatment with GSK343 but not an inactive compound with the same molecular backbone (GSK669) resulted in diminished levels

63 of H3K27me3, and had a potent anti-neoplastic effect against PFA-CIMP+ ependymoma

(Figure 4.5G, Supplemental Figure S4.15). These findings are further supported in a passage zero PFA-CIMP+ ex-vivo culture treated immediately with GSK343, which significantly impaired neurosphere colony formation (Figure 4.5H). We therefore hypothesize that ongoing hypermethylation of promoter CpG islands and H3K27 contribute to the maintenance of PFA-

CIMP+ ependymoma, and that targeting these epigenetic mechanisms represents the first identified rational targets for this chemotherapy resistant, epigenetically dysregulated, and genetically bland childhood cancer.

4.4 Chapter 4: Discussion

Our findings demonstrate that Group A posterior fossa ependymoma, which exhibit no recurrent focal copy number alterations, are also characterized by a stable mutational landscape97. While a number of other pediatric malignancies have a very low incidence of recurrent somatic mutations, we are unaware of any other malignancies with zero significant recurrently mutated genes. The silent genomic architecture of posterior fossa ependymoma was validated in a study led by Parker et al., (2014), in which whole-genome sequencing and RNA-sequencing of 62 PF ependymomas, failed to identify a recurrently mutated gene or fusion transcript98. The possibility of somatic alterations in non-coding space still remains, and may be adequately explored in the future as the cost of high-coverage whole-genome sequencing decreases. To this date, the most notable and frequently reported example is TERT promoter mutation as shown in numerous solid cancers99-101. TERT promoter mutations, however, are non-recurrent and highly infrequent in ependymoma99-101.

64 A number of recent seminal publications have demonstrated that childhood nervous system tumors harbor very few recurrent genetic events, and that many of the recurrent events converge on genes important in epigenetic processes such as CpG hypermethylation, post-translational modification of histones, and even mutation of the actual histone genes13,16-20,22-24. While mutation of IDH1, IDH2, TET1, TET2, and/or DNMT3a have been documented in other types of cancer with a CIMP phenotype, we did not observe any such mutations in PF ependymomas32-34

(Publication Supplemental Table S23)97.

Subgroups of patients with a CIMP phenotype have a better prognosis for some cancer histologies35, but not others2,27,36-40, suggesting that CIMP positive tumors represent distinct subgroups of disease, but that the CIMP phenotype itself is not intrinsically benign or responsive to therapy. PFA-CIMP+ ependymomas have a nearly normal genetic code and a very poor prognosis in comparison to aneuploid PFB (CIMP-) ependymomas where 5-year overall survival exceeds 95% of patients. Many of the cytotoxic chemotherapeutics currently used clinically function through promoting damage to the genomic DNA, which subsequently induces cancer cells with deranged and disorganized genomes to undergo apoptosis. In light of the nearly normal genetic code found in PFA-CIMP+ ependymomas, perhaps it is not surprising that a therapy based on DNA damage has not shown efficacy in clinical trials.

A recurring theme in several solid cancers studied to date, is a convergence of gene expression and DNA methylation networks upon genes known to be silenced by the PRC2 complex in ES cells, particularly in poorly differentiated cancers102. Our data demonstrate hyperactivity of

DNA CpG methylation and disparate PRC2 H3K27me3 signatures in poor prognosis PF ependymomas that may be necessary for tumor maintenance. The cause and effect of this ES cell signature remains unclear, however Kim et al., (2010) suggest that the similarities between

65 cancer and ES cells may be attributed to an active MYC transcriptional network103. Indeed, MYC gene expression has been shown to be up-regulated preferentially in PFA-CIMP+ ependymoma as compared to PFB-CIMP- tumours (Data unpublished). We hypothesize that PF ependymomas may arise from primitive neural precursors, arising in the radial glial cell lineage, which may fail to undergo terminal differentiation during neurogenesis. Evaluating this effect may involve over-expression of active DNA and/or H3K27 methyltransferases in hindbrain radial glial cells during murine development.

Given that aggressive infant ependymomas have no chemotherapeutic options available, the threshold is low for rapid translation of a rationale, novel, and effective drug therapy. Our findings demonstrate that inhibitors that target DNA CpG methylation, PRC2/EZH2, and/or

HDAC inhibitors represent the first rational strategies for therapy of this untreatable disease, and should be considered for testing in future clinical trials for children with PFA-CIMP+ ependymoma.

66 4.5 Chapter 4: Figures

Figure 4.1

A Group A Group B 1 2 5 7 3 4 8 9 3 5 1 5 7 8 6 8 3 7 0 1 2 3 4 6 8 1 2 6 1 2 4 5 6 5 0 4 1 1 1 1 2 2 2 2 3 5 5 1 1 1 1 2 1 1 2 2 2 3 3 3 3 4 6 1 1 1 1 2 2 2 2 3 3 3 3 3 7 1 6 1 2 2 5 P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 7 7 7 7 Age in years 11 58 9 3 31 6 4 2 2 5 26 8 2 9 9 3 5 5 6 5 54 2 3 1 12 3 2 55 41 21 45 29 49 11 23 10 64 27 27 29 53 2 3 29 62 54 17 Gender Recurrence 1 1 1 1 1 1 1 1 1 1 1 1 NA Death 1 NA WHO grade III III III III III III III III III III III III III III III III III III III III III III III III III Sequencing wgs wgs wgs wgs wgs Tumor coverage 158 162 168 159 109 43 166 155 164 164 139 155 469 28 36 57 137 138 176 137 145 172 286 135 142 154 184 182 157 176 94 155 115 117 113 173 169 155 159 156 177 39 35 172 119 145 186 # Somatic SNVs 8 3 6 0 3 8 17 0 7 4 7 9 1 3 2 2 8 5 7 2 10 1 2 1 3 4 1 4 12 0 10 15 16 1 2 0 4 6 1 14 8 2 1 1 4 7 3 p = n.s # Somatic CNVs 3 0 2 2 14 0 5 0 0 2 2 NA NA 2 0 0 NA NA NA NA NA NA NA NA NA NA NA 13 0 20 2 7 16 13 20 17 5 10 7 4 2 9 0 NA NA NA NA p < 0.0001 Female Yes Grade III wgs Male No Grade II wes Number of significant and recurrently mutated genes B C 10 30 50 70 Reference Summary of Posterior Fossa Ependymoma Sequencing Glioblastoma Multiforme 42 291 Brennan et al., 2013 Breast Cancer 0 547 Koboldt et al., 2012 Total # of SNVs 235 Colon and Rectal 0 224 Muzny et al., 2012 Small-Cell Lung Cancer 2 29 Peifer et al., 2012 Average # of SNVs per tumour 5.0 T Cell Precursor ALL 12 94 Zhang et al., 2012 Minimum # of SNVs per tumour 0 Squamous Cell Lung Cancer 19 0 Hammerman et al., 2012 Melanoma 0 121 Hodis et al., 2012 Maximum # of SNVs per tumour 17 Multiple Myeloma 38 0 Chapman et al., 2011 # of recurrent SNVs 0 Medulloblastoma 39 21 Jones et al., 2012 Lung Adenocarcinoma # of group A specific SNVs 124 23 159 Imielinski et al., 2012 Ovarian Cancer 0 316 Bell et al., 2011 Average # of group A SNVs per tumour 4.6 Neuroblastoma 18 240 Pugh et al., 2013 # of group A recurrent SNVs 0 Prostate 7 0 Berger et al., 2011 Pediatric Rhabdoid Cancer 0 35 Lee et al., 2012 # of group B specific SNVs 111 Retinoblastoma 4 0 Zhang et al., 2012 Average # of group B SNVs per tumour 5.6 Ependymoma 5 42 Current Study 0 2 4 6 8 10 12 14 # # # of group B recurrent SNVs 0 wgs wes Mutations identified per megabase

Somatic single nucleotide variants are rare in the posterior fossa ependymoma genome

(A) Summary of clinical and genomic details of posterior fossa ependymomas stratified according to Group-A and Group-B ependymoma(Wilcoxon rank sumtest). (B) Bar graphs summarizing the numbers and frequencies of SNVs detected by whole-genome and whole-exome sequencing of posterior fossa ependymomas. (C) Comparison of numbers of significant and recurrently mutated genes, and mutation rates, in several whole-genome and whole-exome sequencing studies of adult and pediatric cancers (FDR < 0.1).

67

Figure 4.2

A Discovery cohort B Validation cohort DNA methylation (MBD2-chip) DNA methylation (Illumina Infinium 450K) ST Group B+SP Group A Group A (CIMP+) Group B (CIMP-) Age p=0.0014 Gender p=n.s.

under 4 years n=79 Male 0.0 1.0 Female 4 - 18 years above 18 years Beta Value Methylation Level C D E

Group A (CIMP+) Group B (CIMP-) p<0.0001 p<0.0001 80

8 p<0.0001 800 855 73 60 6 600 4 differentially differentially 40 400 2 Significance q=0.05 20 methylated genes 21 200 0 233 Significant

-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 # # of significant differentially # of significant differentially -log10(FDRcorrected p-value)

Group AGroup B methylated and silenced genes Group A Group B Increased methylation Increased methylation in Group A (CIMP+) in Group B (CIMP-) (CIMP+) (CIMP-) (CIMP+) (CIMP-) Average Beta (Group B - Group A)

DNA methylation profiles suggest that Group A ependymoma demonstrate a CpG island methylator phenotype

(A) Unsupervised hierarchical clustering of 79 ependymoma DNA methylation profiles. (B) Heatmap of 48 PF epend- ymoma DNA methylation profiles. Group-A and Group-B clinical differences were assessed using a two-sided Fisher’s Exact Test. (C) Volcano plot comparing the number of significant methylated CpG sites between Group-A and Group- B(p<0.05,Wilcoxon-test,FDR-corrected). Differences in number of methylated genes (D) and methylated/silenced genes (E) in Group-A versus Group-B (p<0.0001,binomial-distribution-test).

68

Figure 4.3

A B C D

Discovery cohort Validation cohort MBD2-chip methylation profiling Illumina 450K methylation profiling 16 Group A specific 3 Group A Group B H3K27me3 target gene

12 Group B specific H3K27me3 target gene 2

8 1114 Significance Significance

1 968 4 239

q=0.25 p<2.2e-16

-1*log10 (FDRcorrected p-value) 399 0 0 14 q=0.25

ferential H3K27me3 ChIP-seq sites p=n.s. f Embryonic Di n = 5915 PRC2 target gene E F Group A specific

Pyrimine catabolism DNA hypermethylated gene

RNA polymerase I opening 97 Olfactory signaling pathway Olfactory signal transduction p<1.0e-12 577 Group B specific Mammary gland development 115 H3K27me3 marked in ES cells Myc occupied targets in cancer p<2.2e-16 DNA hypermethylated gene EED occupied targets in ES cells EED occupied targets in ES cells PRC2 occupied targets in ES cells PRC2 occupied targets in ES cells SUZ12 occupied targets in ES cells SUZ12 occupied targets in ES cells 43 23 19 p=0.0248 Down-regulated in multiple myeloma Marked by H3K27me3 in liver cancer H3K27me3 marked genes in ES cells p=0.0005 194

CpG islands hypermethylatedin leukemia 653 NANOG, OCT4, SOX2 targets in ES cells Down-regulated in metastatic LNM35 cells Up-regulated in cells transformedby RHOA 952 0 Embryonic Stem Cell 1066 H3K27me3 target gene 972 Methylated in cancer and marked by H3K27me3 Up-regulated in keratinocytes after UVB irradiation 307p<2.2e-16

Regulating RHO-GTPase mediated actin cytoskeleton Group A specific Group B specific 33 H3K27me3 target gene Embryonic Stem Cell H3K27me3 target gene p=n.s. H3K27me3 target gene

Group A (CIMP+) and Group B (CIMP-) ependymoma are distinguished by CpG hypermethylated and H3K27 tri-methylated genes related to PRC2 occupancy in ES cells

(A-B) CpG methylated pathways in Group-A(CIMP+) and Group-B(CIMP-) ependymomas in a discovery and validation cohort. (C) Differential H3K27me3 binding sites distinguishing Group-A and Group-B (p<0.01(MACSv2.0), p<0.05(R:DiffBind)). Venn diagrams comparing: (D) Group A and B H3K27me3 genes with ES cell PRC2 genes (E) Group A H3K27me3 and DNA hypermethylated genes with ES cell H3K27me3 genes. (F) Group B H3K27me3 and DNA hyper- methylated genes with ES cell H3K27me3 genes (binomial-distribution-test).

69

Figure 4.4 A B C D CpG Island Methylation Repetitive Elements Methylation CpG Islands Repetitive Elements

p < 2.2e-16 p < 2.2e-16 p < 2.2e-16 p < 2.2e-16 0.16 Group A Adult NB Group B Fetal NB Adult NB Group A Fetal NB Group B 0.08 0.12 0.06 0.08 0.04 0.04 0.02 LINE SINE LTR LINE SINE LTR 0.00 Group A Group B 0.00 Group A Group B Proportion of Differentially Methylated Regions Proportion of Differentially (CIMP+) (CIMP-) Methylated Regions Proportion of Differentially (CIMP+) (CIMP-) 0 100 Hypermethylated Percent DNA Methylation Hypomethylated E F G 0

5-Aza-2’- 1. 8

Deoxycytidine 0. Top 10 genesets enriched upon Up-regulated 6

5-Aza-2’-Deoxycytidine treatment Down-regulated 0.

of E517 and E520 4 E479 (ST1) p<0.0001 0. E478 (ST2) Pathway Geneset Enrichment 2 E520 (PF1) p = 0.05 210 0.

EZH2 targets 0 E517 (PF2) Normalized cell survival 0. ESR1 targets 200 p<0.0001 1 2 3 4 5 RB1 targets senescent 5-Aza-2'-Deoxycytidine Stem cell cultured vs fresh 147

150 (log10(concentration[nM]) SATB1 targets H Bladder Cancer Cluster 2B 1.0 p = 4.18e-10

TNF response not via P38 100 78 MLL targets

Silenced by methylation in pancreatic cancer 50 Silenced by methylation in multiple myeloma 12 Media alone p<1.00E-15 0

# Significant Differentially Expressed Genes DMSO E517 E520 Fraction Negative 5-aza-2’-Deoxycytidine 1 0. 0 500 1000 1500 2000 Number of Cells

Whole-genome bisulphite sequencing validates a CpG island methylator phenotype in Group A ependymoma

(A-B) Heatmap of DNA methylation at CpG islands and repetitive regions in Group-A(CIMP+) versus Group-B(CIMP-). (C-D) Proportion of hyper-methylated versus hypo-methylated regions at CpG islands and repetitive elements in Group-A and Group-B(p<2.2e-16,binomial-distribution-test). Top 10 pathways up-regulated (E) and differentially expressed (F) upon 5-aza-2’-deoxycytidine (DAC) treatment of E517-PF2 and E520-PF1(p<0.0001, binomial-distribution-test). (G) Survival analysis of E478-ST2, E479-ST1, E517-PF2 and E520-PF1 cells treated 7d with DAC(p=0.05,two-sided-t- test,error-bars=s.e.m., technical:n=6 over biological:n=2). (H) Limiting dilution assay of zero-passage Group-A cells treated 2wks with DAC(p=4.18e-10,chi-square-test).

70

Figure 4.5

A B C D 0

0 Vehicle 1. 1. E479 (ST1) E520 (PF1) 3-Deazaneplanocin-A 8 8 DZNep: - + - + 1200 0. 0. EZH2 6 6 0. 0. Alpha-tubulin DZNep: - + - + 800 4 4 p=0.0087 0. 0. H3K27ME3 E479 (ST1) n=10 H3K4ME3

2 E478 (ST2) 2 400 Fraction Survival p=0.0330 p < 0.0001 0. E520 (PF1) Alpha-tubulin 0. Vehicle Tumour volume (mm3) Tumour Normalized cell survival

0 E517 (PF2) DZNep: - + - + n=18 0 3-Deazaneplanocin-A Flank xenograft (E520-PF1) 0. 0

cleaved-PARP 0. 1 2 3 4 5 19 21 23 25 27 29 31 0 10 20 30 3-Deazaneplanocin-A Alpha-tubulin Number of days following injection Days following Intracranial Engraftment log10(concentration[nM])

E F G H

GSK343 treatment Top 10 genesets enriched upon

GSK343 Treatment of E517 and E520 p<0.0001 0

2. E479 (ST1) Media Up-regulated 1.0 Pathway Geneset Enrichment 214 Down-regulated E520 (PF1) DMSO

5 GSK343

EED Targets 1. GSK343-active 200 SUZ12 Targets p<0.0001 GSK669-inactive 0

EZH2 Targets 134 1. GSK669-inactive Photodynamic Therapy Stress 150

Breast Cancer Basal 5 0.

Adult Tissue Stem Module Fraction Negative 100 Brain HCP With H3K4me3 and H3K27me3 Normalized cell survival p = 0.0022 1

0 GSK343-active p = 2.83e-5

Breast Cancer Basal 0. 36 0. RB1 Targets Confluent 50 1 2 3 4 0 100 200 300 ESR1 Targets GSK EZH2 inhibitor log10(concentration[nM]) Number of Cells

Expressed Genes (FDR<0.01) 0 0 Number Significant Differentially p<1.00E-15 E517 E520

Modulation of H3K27 methylation has anti-neoplastic effects against Group A ependymoma

(A) Survival of E479-ST1, E478-ST2, E520-PF1 and E517-PF2 cells treated 7d with 3-Deazaneplanocin- A(DZNep,p<0.0001,two-sided-t-test,error-bars=s.e.m.,biological:n=3). (B) EZH2, H3K27me3, H3K4me3, and cleaved- PARP protein expression in E520-PF1 and E479-ST1 cells treated 7d with DZNep(500 nM). (C) E520-PF1 flank tumor volumes following treatment with DZNep(p=0.0087,Wilcoxon-test,error-bars=s.d.) (D) Survival of E520-PF1 posterior fossa tumour bearing mice treated with DZNep(p=0.033,log-rank-test). Top 10 pathways up-regulated (E) and differentially expressed (F) upon treatment of E517 and E520 with GSK343. (G) Cell proliferation of E520-PF1 and E479-ST1 cells treated 11d with GSK343(active-inhibitor) or GSK669(inactive-inhibitor) (p=0.0022,two-sided-t-test,error-bars=s.e.m., technical:n=9 over biological:n=3). (H) Limiting dilution assay of passage-zero Group-A cells treated 2wks with GSK343(p=2.83e-5,chi-square-test).

71 4.6 Chapter 4: Supplemental figures

Figure S4.1 A Group A Group A Group A

y y y

x 1 x 1 x 1 22 22 22 6EP2 21 6EP17 21 6EP18 21 20 2 20 2 20 2 19 19 19 18 18 18 3 3 3 17 17 17

16 16 16 4 4 4 15 15 15

14 5 14 5 14 5

13 13 13 6 6 6 12 12 12

7 7 7 1 1 Duplication 1 1 1 1 8 8 Deletion 8 10 10 10 9 9 9 Translocation Group B Group B Inversion

y y

x 1 x 1 6EP6 22 6EP16 22 21 21 B 20 2 20 2 20 19 19 R² = 0.1036 18 18 p = n.s. 3 3 17 17 15

16 16 4 4 10 15 15

14 5 14 5 5

13 13

6 6 Number of non-silent SNVs 0 12 12 0 20 40 60 7 7 1 1 1 1 Age 8 8

10 9 10 9

(A) Circos plots depicting whole genome sequencing profiles in three Group A and two Group B ependymomas. From outer track to inner track depicts the following: 1) Chromosomal location, 2) Copy number alterations, and 3) Chromo- somal translocations and focal insertions, deletions, and duplica- tions. Inter-chromosomal translocations are shown in green, intra-chromosomal translocations show in purple, focal deletions in blue, and focal duplications/insertions in red. (B) Dot plot demonstrating the lack of correlation of non-silent somatic single nucleotide variants with age. An R2 and was computed using a Pearson product moment correlation, which follows a t-distribution for p-value calculation.

72

Figure S4.2 Discovery A Agilent 10K a CGH

n = 79

k2 k3 k4 k5 Group B Group A B Discovery C Validation Agilent 10K a CGH Illumina Infinium 450K

15 n = 48 p=0.0002 50 n=82 p<0.0001 40 Mann-Whitney 10

30 Above Background 20 5

10 fected f Number of Chromosomes A 0 0 GROUP A GROUP B GROUP A GROUP B Number of chromosomal arm aberrations

D Group A PF ependymoma - UHS40 E Group B PF ependymoma - GER-1E31 Copy Number Ratio Copy Number Ratio

Chromosome Chromosome (A) Unsupervised consensus non-negative matrix factorization of a discovery cohort consisting of 79 primary PF epend- ymoma copy number profiles generated by 10K array comparative genomic hybridization (Agilent). Shown are subgroup profiles at various subgroup classification (k) ranks. (B) Comparison of the numbers of chromosomal arm alterations in Group A versus Group B PF ependymoma in a discovery cohort. Statistical significance was assessed using a Wilcoxon- rank sum test. (C) Comparison of the numbers of chromosomal alterations in Group A versus Group B in a validation cohort of 48 PF ependymoma, in which copy number profiles were generated by Illumina Infinium 450K methylation arrays. Statistical significance was assessed using a Wilcoxon-rank sum test. (D-E) Copy number profile of a representa- tive Group A (UHS40) and Group B (GER-1E31) PF ependymoma. Green highlight indicates a significant copy number change above noise. The red indicates a copy neutral ratio of zero.

73

Figure S4.3

A B C Gene expression (Affymetrix Exon 1.0ST)

Group B + SP ST Group A DNA methylation Gene expression 0.5 0.4 0.4 0.3 0.3 dCDF dCDF 0.2 0.2 0.1 0.1 0.0 2 4 6 8 10 0.0 2 4 6 8 10 k clusters k clusters n=47 D E F DNA methylation Gene expression Gene expression (Affymetrix Exon 1.0ST)

Group B Group A ST PF-A PF-A + SP Group A 14 3 0 PF-B+SP PF-B+SP Group B + SP 0 13 1 ST ST ST 0 1 15 methylation (MBD2-chip) A

DN Rand index = 0.8704903, p<0.0001, n=47 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 Silhouette width Silhouette width

(A) Unsupervised consensus hierarchical clustering (HCL) of 47 primary ependymoma gene expression profiles (Affymetrix Exon 1.0ST), which also had matched DNA methylation profiles illustrated in Fig. 2.2A. Clustering was performed by median centering probes, using the average linkage method, and with a subset of 1000 genes exhibiting the greatest standard deviation. (B-C)A methylation (MBD2-chip) and gene expression (Affymetrix Exon 1.0ST) clustering results demonstrating the different levels of statisti- cal confidence for differing number of subgroups. k denotes the number of subgroups. (D) Overlap analysis between DNA methylation and gene expression based subgrouping. The degree of overlap was analyzed using the Rand index with 10,000 sample permutations of sample labels to measure statistical significance. (E-F) Silhouette analysis identifies “core” samples defined as samples with positive silhouette widths for both DNA methylation (MBD2-chip) and gene expression (Affymetrix 1.0ST) based consensus HCL.

74

Figure S4.4 Validation cohort A C KMEANS NMF HCL SOM Group A Group B Group A Group B Group A Group B Group A Group B B

A

n=48 DNA methylation

B D 0.4 0.4 B 0.4

0.2 0.2 0.2

A 0.0 0.0 0.0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 Gene expression k clusters k clusters k clusters E Group B Group A Group B Group A Group B Group A K2 K3 K4 K-MEANS

F Lorenz curve G Change in Gini 1.0 2 clusters, Gini = 0.476 3 clusters, Gini = 0.544 4 clusters, Gini = 0.599 0.8 5 clusters, Gini = 0.682 6 clusters, Gini = 0.704 0.4 7 clusters, Gini = 0.729 8 clusters, Gini = 0.752 0.6 9 clusters, Gini = 0.762 10 clusters, Gini = 0.768 0.3 L(p) (Gini)

0.4 0.2

0.2 0.1

0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 8 10 p # clusters (A-B) Principal components analysis of gene expression and DNA methylation profiles between Group A and Group B PF ependymomas. DNA methylation profiles were generated on the validation cohort using the Illumina 450K methylation platform. (C) Subgroup analysis distinguishing PFA versus PFB ependymoma by DNA methylation profiles (Illumina) using K-means, NMF, HCL and SOM consensus clustering. (D) Change in the cumulative distribution function plots (dCDF) of DNA methylation clustering results (C) demonstrating the different levels of statistical confidence for differing number of subgroups. k denotes the number of subgroups. (E) Unsupervised k-means consensus clustering of 48 PF ependymoma DNA methylation profiles generated by Illumina Infinium 450K methylation arrays. Shown are heatmaps from rank k2 to k4. (F) Lorenz curves for DNA methylation cluster analysis performed from k2 to k10. (G) Change in Gini values for cluster analysis demonstrating two principle subgroups of PF ependymoma defined by DNA methylation

75

Figure S4.5 A CRIP1 CYP26C1 PKP1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 DNA Methylation 0.2 0.2 0.2 0.0 0.0 Group A (CIMP+) Group B (CIMP-) Group A (CIMP+) Group B (CIMP-) Group A (CIMP+) Group B (CIMP-) B Number of genes C Number of genes 3 3 3 3 3 3 3 3 3 3 3 2 1 1 1 3 3 3 3 3 3 3 3 3 3 3 2 1 1 1 0.8

0.8 Group A

0.4 Group B 0.4 x x x x x x x x x x x x x x x x x x x x x 0.0 0.0 Misclassification Error Misclassification Error 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Value of threshold Value of threshold D Training cohort E Training cohort (DNA from frozen tissue) (DNA from frozen tissue) 1.00 1.00 Group B (CIMP-)

Group B (CIMP-) 0.75 0.75 0.50 0.50 Group A (CIMP+) Group A (CIMP+) Probability of OS (%) 0.25 0.25 Probability of PFS (%)

P = 0.0033 P = 0.0134 0.00 0.00 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Time Since Diagnosis (years) Time Since Diagnosis (years) A 13 11 6 6 5 5 3 3 1 0 0 A 13 13 12 9 9 6 3 3 1 0 0 B 12 12 12 11 10 10 7 3 1 0 0 B 12 12 12 12 12 12 9 5 2 0 0 F Validation cohort G Validation cohort (DNA from frozen + FFPE tissue) (DNA from frozen + FFPE tissue) 1.00 1.00 Group B (CIMP-) 0.75 0.75 Group B (CIMP-) 0.50 0.50

Group A (CIMP+)

Group A (CIMP+) Probability of OS (%) 0.25 0.25 Probability of PFS (%)

P = 0.0032 P = 0.0061 0.00 0.00 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Time Since Diagnosis (years) Time Since Diagnosis (years) A 42 31 23 15 11 10 8 7 7 7 5 A 46 40 36 32 27 21 17 12 12 10 7 B 14 14 11 9 8 7 2 2 1 1 0 B 16 16 15 14 13 11 5 5 4 1 0 (A) DNA methylation dot plots for the top 3 methylated genes (CRIP1, CYP26C1, and PKP1) for which a Sequenom assay could be designed and optimized for robust stratification. (B) Calculation of misclassification errors based upon a value of threshold and number of genes selected for the predication analysis of microarrays (PAM) algorithm. (C) Calculation of misclassification errors based upon a value of threshold and number of genes selected for the predication analysis of microarrays (PAM) algorithm relative to subgroup. (D-E) Kaplan-Meier curves are shown for training samples classified into Group A (CIMP+) and Group B (CIMP-) for both progression-free and overall survival. Number of cases in each subgroup is shown below the graph. Statistical significance for all Kaplan-Meier curves was measured using a log-rank test. (F-G) Using the markers identified in (A), class prediction was performed on a non-overlapping cohort of 62 PF ependymomas (DNA isolated from frozen or FFPE). Samples of the predicted dataset were provided by The Hospital for Sick Children, MD Anderson Cancer Centre, Children’s Hospital Boston and the University of Michigan. Depicted are Kaplan-Meier curves for progression-free and overall survival, with significance measured using a Log-rank test. Number of cases in each subgroup is shown below the graph.

76

Figure S4.6 A B Training Cohort Validation Cohort 50 80 p=0.0046 p<0.0001 40 60

30 40 Age 20 Age 20 10

0 0 Group A (CIMP+) Group B (CIMP-) Group A (CIMP+) Group B (CIMP-) C D Validation Cohort Validation Cohort Gross Totally Resected Tumours Gross Totally Resected Tumours

100 100 Group B (CIMP-)

80 Group B (CIMP-) 80

60 60

40 40 Group A (CIMP+) Group A (CIMP+) Percent Survival Percent Survival 20 20 p=0.0398 p=0.0164 0 0 0 50 100 150 200 0 50 100 150 200 Progression - Free Survival (Months) Overall Survival (Months) E

(A) Age distribution of Group A (CIMP+) versus Group B (CIMP-) PF ependymo- mas in a training cohort stratified by the Sequenom 3 gene biomarker panel. (B) Age distribu- tion of Group A (CIMP+) versus Group B (CIMP-) PF ependymomas in a validation cohort stratified by the Sequenom 3 gene biomarker panel. (C) Progression-free survival for Group A (CIMP+) versus Group B (CIMP-) PF ependymoma for gross- totally resected tumours stratified by the 3 gene biomarker panel. (D) Overall survival for Group A (CIMP+) versus Group B (CIMP-) PF ependymoma for gross-totally resected tumours stratified by the 3 gene biomarker panel. (E) Multivariate analysis of posterior fossa ependymoma progression free survival including radiation, grade, level of resection and 3-marker subgroup stratification as potential prognostic factors.

77

Figure S4.7 A B Group B + SP Group A Group B + SP Group A Location %! #$ %! %! %! %& %! %& %& #$ %! %! %! %! %! %& !" !" !" !" !" !" !" !" %! %& #$ !" !" #$ #$ !" !" !" !" !" !" !" !" !" !" #$ !" !" !" !" !" !" !" !" !" !" Gender #$ #$ ' " " ' #$ ' ' " ' ' ' ' ' " ' ' #$ " ' #$ #$ ' " ' #$ " " #$ #$ " ' ' ' " ' " " ' " #$ #$ ' " ' " " " ' #$ ' 5 Age ((( #$ ((( ((( ((( ((( #$ ((( ((( +,- ((( +,- +,- +,- #$ +,- )* +,- #$ )* ((( #$ #$ +,- ((( )* #$ )* )* #$ #$ )* ((( ((( )* )* )* ((( ((( )* ((( #$ #$ ((( )* ((( )* )* ((( )* #$ ((( 4 3 2

Significance FDR-pvalue=0.05 1 -1*log10(FDRcorrected-pvalue) 0

-2 -1 0 1 2 MBD2chip-Average-Batman (A) minus MBD2chip-Average-BatmanValue (B) PF <4 years Male p<0.05 (FDRcorrected) PFB+SP 4-18 years Female Methylation Difference ST >18 years Decreased -1 +1 Increased methylation methylation ACEVEDO_LIVER_ C D CANCER_WITH_ H3K27ME3_DN

p < 0.0001 400 REACTOME_RNA_ BERENJENO_ POLYMERASE_I_ TRANSFORMED_BY_ 350 PROMOTER_OPENING RHOA_FOREVER_UP 360 NOUZOVA_ 300 METHYLATED_IN_APL BENPORATH_SUZ12_ BENPORATH_EED_ TARGETS TARGETS REACTOME_ 250 PYRIMIDINE_ CATABOLISM 200 222 BENPORATH_NOS_ 150 TARGETS

100 MCBRYAN_PUBERTAL_ BREAST_4_5WK_UP 50 0 BENPORATH_PRC2_ BENPORATH_ES_WITH_ TARGETS Group B + Spinal Group A H3K27ME3 Number of Significantly Methylated Regions (A) Significant regions of differential DNA methylation between Group A KEGG_OLFACTORY_ Group A TRANSDUCTION (CIMP+) and Group B (CIMP-) as determined by significant analysis of microarray (SAM) analysis (FDR < 0.05) and depicted as a heatmap. Group B + Spinal Clinical associations regarding tumour location, gender, and age are Shared Genes shown in the top panel. (B) Volcano plot demonstrating an over- REACTOME_ representation of significantly methylated regions of interest (ROI) in Group OLFACTORY_ FDR q-value < 0.25 A (CIMP+) versus Group B (CIMP-). Significance of methylation differ- SIGNALING_ ences was determined using a Wilcoxon-rank sum test with FDR correc- PATHWAY tion using the Benjamini-Hochberg method. (C) Bar chart demonstrating a significantly greater number of methylated ROIs in Group A (CIMP+) versus Group B (CIMP-) as measured by a binomial distribution test. (D) Biological processes identified in Group A (CIMP+) and Group B (CIMP-) as determined by a pathway association test using genesets C2 and C5 provided by MSigDB (Broad Institute).

78

Figure S4.8

A B FDR p-value =0.05 6 n=26 1.0 Methylated in Group A 4 0.8 2 0.6 0 0.4 -2 FDR p-value =0.05 Gene Expression (CIMP+) methylation 0.2 A -4 Methylated in Group B

0.0 Group A (CIMP) -6

Group 0.0 0.2 0.4 0.6 0.8 1.0 -6 -4 -2 0 2 4 6 8 Group B (CIMP-) methylation DNA Methylation C D E p<0.0001 2500

2500 800 2172

2000 600 1500 1500 400 1000 ferentially methylated CpG sites f 200

500 500 di

377 p=0.0027 p=0.0196 with silencing per tumour # Methylated CpG sites associated Significant # (CIMP+) # Methylated and Silenced Genes per tumour (CIMP+) (CIMP+) A A A Group B (CIMP-) Group Group B (CIMP-) Group B (CIMP-) Group Group

(A) Group A (CIMP+) DNA methylation profiles plotted against Group B (CIMP-) methylation profiles. Red dotted line denotes equivalent levels of DNA methylation between Group A and B. (B) Starburst plot comparing significant differen- tially methylated genes versus significant differentially expressed genes. Significant genes or methylated probes were identified using a Wilcoxon-Rank sum test with FDR correction of p-values using the BH method. (C) Bar graph compar- ing the differences in number of significantly methylated CpG sites. (D-E) Dot plots demonstrating the differences between methylated and silenced CpG sites and genes, within a given tumour, between Group A (CIMP+) and Group B (CIMP-) ependymoma. Significant differences were calculated using a Wilcoxon-rank sum test with FDR (BH correction).

79

Figure S4.9

A Glioblastoma Multiforme D Posterior Fossa Ependymoma

G-CIMP Proneural Normal Brain Group B Normal Brain Group A (CIMP+)

1

0.8

0.6

0.4

0.2

0 B E Illumina 450K CpG probes

1.0 alue V Beta

0.0 Methylation Level C < 2.2e-16 F < 2.2e-16 < 2.2e-16 < 2.2e-16 0.8 0.8 0.4 0.4

GBM Group B Group A Proneural Normal Brain Normal Brain (CIMP+) (CIMP-) (CIMP+) (A) Unsupervised NMF consensus clustering of CpG probes exhibiting a standard deviation greater than 0.2 is shown at subgroup rank 3 for normal brain, Proneural-GBM, and CIMP+ GBMs tumours. (B) Heatmap of CpG probes demonstrat- ing increased methylation of CpG sites in GBM-CIMP and Proneural-GBM versus Normal Brain. (C) Boxplot of CpG probes demonstrating increased methylation of CpG sites in GBM-CIMP and Proneural-GBM versus Normal Brain. (D) Unsupervised NMF consensus clustering of CpG probes exhibiting a standard deviation greater than 0.2 is shown at subgroup rank 3 for normal brain, Group A (CIMP+), and Group B (CIMP-) PF ependymomas. (E) Heatmap of CpG probes demonstrating increased methylation of CpG sites in Group A (CIMP+) PF ependymoma versus Group B (CIMP-) and Normal Brain. (F) Heatmap of CpG probes demonstrating increased methylation of CpG sites in Group A (CIMP+) PF ependymoma versus Group B (CIMP-) and Normal Brain.

80

Figure S4.10

A Hypermethylated Genes B Hypermethylated and silenced Genes Breast Cancer GBM Breast Cancer GBM CIMP CIMP CIMP CIMP

193 4 1707 1038 97 232

69 0 196 220 1 6

948 132 P = 1.164402e-12

PF Ependymoma CIMP PF Ependymoma CIMP

C D

Hypermethylated ES cell PRC2 Target Genes Group A specific Hypermethylated or K27me3 marked Group A specific Breast Cancer GBM p = 0.03368665 Hypermethylated or K27me3 marked CIMP CIMP p = n.s.

13 220 27 58 4 17 14 59 24 434 81

COSMIC cancer gene consensus PF Ependymoma CIMP

(A) Overlap of genes significantly DNA hypermethylated in Breast, Group A Ependymoma, and GBM CIMPs (FDR < 0.05). Signifi- cance of overlap was calculated by comparing observed versus expected proportions using the Binomial distribution test. (B) Overlap of genes significantly DNA hypermethylated and silenced in Breast, Group A Ependymoma, and GBM CIMPs (FDR < 0.05). (C) Overlap of ES cell PRC2 regulated genes significantly DNA hypermethylated in Breast, Group A Ependymoma, and GBM CIMPs (FDR < 0.05). (D) Overlap of genes DNA hypermethylated or specifically marked by H3K27me3 in Group A or B against the COSMIC cancer gene consensus list.

81

82

83

Figure S4.13

22 B M

21 0 1

0MB

5 0MB 50MB

100MB

20 0MB 50MB

150MB

0MB 19 50MB 200MB

A 0MB 0MB B 18 50MB

0MB 2 100MB 17

150MB 0MB

200MB ESRRG 16

0MB

0MB chr1: 100MB

50MB 217.307 mb 217.309 mb 217.311 mb 217.313 mb 217.315 mb 15 3

217.308 mb 217.31 mb 217.312 mb 217.314 mb 217.316 mb 0MB

100MB 150MB 1.0

14 % Methylation 0MB 0.8 0MB

50MB

100MB 0.6

4 13

150MB 0.4 0MB 0 100

100MB 0MB 0.2 PFA 50MB 50MB

12 PFB

5 0.0 0MB Proportion of Methylation

100MB 150MB CpG island:

0MB 50MB 11 ESRRG

B

M

50MB 0 5

0MB 1

100MB

B

50MB 200MB

M 0

6 100MB

50MB 150MB Group A

10 0MB 0MB

50MB Group B 100MB

100MB 50MB

150MB 7 0MB 0MB 9 50MB

100MB Fetal NB 8 Adult NB C D E DMR DMR < 2.0 e-16 < 2.0 e-16 < 2.0 e-16 < 2.0 e-16 < 2.0 e-16 < 2.0 e-16 < 2.0 e-16 P P P P P CpG Islands Repetitive Regions P P 1.0 1.0 *** 5

*** 4 0.8 0.8

*** 3 0.6 0.6 *** *** vs Group B 2

0.4 A

0.4 1

% Methylation 0.2 % Methylation 0.2 Genomic Enrichment p < 0.0001 p < 0.0001 Group 0.0 0.0 0 CGI CGI- LINE

Group A Group B Group A Group B SINE TFBS Shore

(CIMP+) (CIMP-) (CIMP+) (CIMP-) DNase- Cluster Promoter F G Repetitive Regions CpG Islands Hypermethylated Hypomethylated p < 2.2e-16 Hypermethylated p < 2.2e-16 p < 2.2e-16 Hypomethylated p < 2.2e-16 p < 2.2e-16 p < 2.2e-16 p < 2.2e-16 0.08 0.16 p < 2.2e-16 p < 2.2e-16 p < 2.2e-16 0.06 0.12

p < 2.2e-16 0.04 0.08 ferentially Methylated Regions ferentially Methylated Regions f f 0.02 0.04 p = n.s.

0 0 TR TR TR TR TR TR L L L L L Group A (CIMP+) Group B (CIMP-) Group A (CIMP+) L LINE LINE LINE LINE LINE LINE SINE SINE SINE SINE SINE SINE Proportion of Di versus versus versus Proportion of Di Group A (CIMP+) Group B (CIMP-) Group A (CIMP+) Fetal Normal Adult Normal Brain Group B (CIMP-) versus versus versus Fetal Normal Adult Normal Brain Group B (CIMP-)

(A) From outer to inner track, chromosomal location, then circos plots depicting whole genome DNA methylation profiles in 3 Group As (CIMP+), 3 Group Bs (CIMP-), 3 fetal normal brains, and 3 adult normal brains. DNA methylation was averaged in 10kb windows and shown as a heatmap with hypometh- ylation (pale white) and hypermethylation (dark blue). (B) Example of Group A (CIMP+) specific DNA methylation accumulating at the promoter of ESRRG compared to Group B (CIMP-), Fetal normal brain, and adult normal brain. Location of CpG island is shown by a green bar, and direction of transcription by a purple arrow. (C) Comparing the DNA methylation levels detected in DMRs between Group A (CIMP+) and Group B (CIMP-) at CpG islands. Statistical significance was assessed using a Mann-Whitney Test. (d) Comparing the DNA methylation levels detected in DMRs between Group A (CIMP+) and Group B (CIMP-) at repetitive regions (LINES, SINES, and LTRs). Statistical significance was assessed using a Mann-Whitney Test. (E) Direct comparison of DMRs between Group A (CIMP+) and Group B (CIMP-) at various genetic loci, generated by WGBS, validating the CIMP identified in Figure 2. Statistical significance was determined using the binomial distribution test. (F) Comparison of the proportion of hypermethylated versus hypomethylated CpG island differentially methylated regions (DMRs) in Group A versus Fetal Normal Brain, Group B versus Adult Normal Brain, and Group A versus Group B. Statistical significance was determined using the binomial distribution test. (G) Comparison of the proportion of hypermethyl- ated versus hypomethylated repeat region DMRs in Group A versus Fetal Normal Brain, Group B versus Adult Normal Brain, and Group A versus Group B. Statistical significance was assessed using a binomial distribution test of proportions.

84

Figure S4.14

A chr18 7,173 kb 0 kb 1,000 kb 2,000 kb 3,000 kb 4,000 kb 5,000 kb 6,000 kb 7,000 kb

Group A (CIMP+)

AIRE Group B (CIMP-) F

Group A (CIMP+)

WGBS Group B (CIMP-)

CETN1 LINC00470 Partially Methylated Domains METTL4 LPIN2 TGIF1 DLGAP1 C18orf42 LOC645355 MIR4317 ARHGAP28

B C D

Rank Ordered Genes in Group A PMDs

5 IGF2BP1

Group A specific PMDs Group B specific PMDs 4 4 5 C13orf33 3 n p < 0.0001 p < 0.0001 n o o i RAGE i s 2 s LMX1B s s

e 4 SPRR1B e r r 3 1 FDR < 0.25 RET p Significance p x x E 0 E

3 Low High ene

2 ene f G f G 2 E o o

e e Rank Ordered Genes in Group B PMDs c c

n 1 n FAM50B ia 1 ia r r a a 2 V V 0 0 KRT86 MBP Group A Group B Group A Group B FRMD4B 1 (CIMP+) (CIMP-) (CIMP+) (CIMP-) FDR < 0.25 TMEM16A

Significance PTDSS1 0 Low High

(A) Example at chromosome 18 demonstrating partially methylated domains identified specifically in Group A, which are associated with areas of closed chromatin identified by FAIRE-seq in PF ependymomas. (B-C) Variance of gene expres- sion for genes identified in Group A (CIMP+) and Group B (CIMP-) partially methylated domains. Statistical significance of variation difference was assessed using a Mann- Whitney test (D-E) Ranked list of most genes found within partially methylated domains with the most variable gene expression. Statistical significance of variant gene expression was detected using an F-test with FDR correction using the BH method.

85

Figure S4.15

A B Group A specific H3K27me3 target gene Group A specific DNA hypermethylated gene 1124 42 1319 270 11 7 43 26 597 1092 Significantly Up-Regulated Significantly Up-Regulated by 5-aza-2’deoxycytidine by GSK343 in E517 or E520 479 515 in E517 or E520 Significantly Up-Regulated 30 Group A specific by 5-aza-2’deoxycytidine H3K27me3 target gene in E517 or E520 119 C D Vehicle 1.0 3-Deazaneplanocin-A Treatment of E520-PF1 cells with GSK 343 (500 nm)

Days: D0 D1 D3 D5 D7 D9 H3K27ME3

Alpha-tubulin Fraction Negative p < 0.0001 n = 3 0.1 0 20 40 60 80 100 120 Dose (number of cells) E F

Goodness of fit tests. These test whether the log-dose slope equals 1. Confidence intervals for Rejection of the tests may be due either to batch effects (heterogeneity in 1/(stem cell frequency) the stem cell frequencies or assay success rate) or to a failure of the stem cell hypothesis. Group Lower Estimate Upper Estimated slope is 0.661 dec 2705 1469 798.2 Test Chisq DF P Value Likelihood ratio test of dmso 289 172 102.1 19.1 1 1.25E-05 single-hit model gsk 1046 618 364.7 Score test of heterogeneity 168 1 1.91E-38

(A) Venn diagrams comparing the overlap of genes up-regulated in Group A (CIMP+) cultures (E517, E520) compared to Group A specific DNA hypermethylated genes or Group A specific H3K27me3 marked genes. (B) Comparison of genes up-regulated in E517 or E520 by GSK343 or 5-aza-2’-deoxycytidine with Group A specific H3K27me3 marked genes. (C) Limiting dilution colony forming assay for ependymoma cells isolated from E520-PF1 flank tumours, which were treated in vivo with DZNep (3mg/kg/d) or vehicle. A statistical comparison was made between DMSO (black), and DZNep (orange) using a chi-square test. (D) Western blot analysis of H3K27me3 protein levels following treatment of E520-PF1 cells with GSK343 (500nm) over 9 days. Alpha-Tubulin protein expression was used to assess equal protein loading. (E) Confi- dence intervals for stem cell frequencies associated with LDA data shown in Figure 2.5i and 2.5j. (F) Goodness of fit test for LDAs shown in Fig 2.4h and 2.5h.

86

Figure S4.16

E520 - PF ependymoma 0 DZNep 10uM 0 DZNep 10uM SAHA Decitabine

10uM 10uM

EP1NS - Metastatic Tumour 0 Decitabine 20uM 0 DZNep 20uM SAHA Carboplatin

20uM 20uM

E479 - ST ependymoma 0uM Cisplatin 25uM 0uM SAHA 25uM Decitabine Decitabine

25uM 25uM

E517 - PF ependymoma 0uM Cisplatin 20uM 0uM SAHA 25uM Decitabine Decitabine

25uM 25uM

E520 - PF ependymoma 0uM Cisplatin 20uM 0uM SAHA 25uM Decitabine Decitabine

25uM 25uM 0 100 Percent Survival

Combinations of epigenetic modifiers and chemotherapeutic compounds, used to treat a panel of ependymoma primary cultures over a 7 day course. Experiments were performed in 96-well plates and cell survival was assessed using an MTS assay.

87 4.7 Chapter 4: Methods

4.7.1 Patients and tumour samples

Tumour samples, clinical information, and animal studies were processed in approval with local ethics board from both the institutions as described previously12. From all patients informed consent was obtained, as described previously12. No patient underwent chemotherapy or radiotherapy prior to the surgical removal of the primary tumor. This study included only primary samples for analysis, and further excluded WHO grade I histological variants of ependymoma. Detailed clinical description of patient characteristics is shown for the sequencing cohort in Publication Supplemental Table S297 and for the methylation cohort in Supplementary

Table S8. Tumour subgrouping was based on gene expression profiling or immunohistochemical analysis as described previously12. At least 80% of tumor cell content was estimated in all tumour samples of the sequencing cohort by staining cryosections (~5um thick) of each sample with hematoxylin and eosin. Diagnoses were confirmed by histopathologic assessment by at least two neuropathologists, including a central pathology review that utilized the 2007 WHO classification for CNS tumors2.

4.7.2 DNA library preparation and Illumina sequencing

Tumor and control samples were individually processed, in every case thorough histological examination proved that each tumor consisted of >80% tumor cells, in most cases it was >95%.

DNA from tumor and control samples (blood) were prepared and sequenced individually. The

Agilent SureSelect Human All Exon 50 Mb target enrichment (v3 initially switched to v4

88 subsequently) was used to capture all human exons for deep sequencing, using the vendor’s protocol v2.0.1. The SureSelect Human All Exon Kit targets regions of 50 Mb in total size, which is approximately 1.7% of the human genome. Briefly, 3µg of genomic DNA were sheared with a Covaris S2 to a mean size of 150bp. 500ng of library were hybridized for 24h at

65°C with the SureSelect baits. The captured fragments of the tumor samples and controls were sequenced in 105bp single end mode on an Illumina HiSeq2000 deep sequencing instrument

(based on Illumina, Inc., v3 sequencing chemistry). Median coverage of whole-exome sequenced tumor samples was 157-fold (range 43-469×) and for control samples (blood DNA)

146-fold (range 80-222×). In addition, whole genome libraries (before the exome hybridization step) were sequenced three lanes each in paired end 105bp mode on the HiSeq2000, as described earlier by Jones et al15.

To increase the coverage of the samples for whole exome-sequencing we used the following strategy: Exome capture was initially carried out with Agilent SureSelect (Human All Exon

50 Mb) in-solution reagents using the default Illumina adapters (without barcode). To introduce

Illumina Multiplex barcodes into the existing libraries at a later stage, 15ng final exome- enriched library (without barcode) were used as a template in a 50ul PCR reaction. The

Herculase II Fusion enzyme (Agilent) was used together with the NEBNext Universal PCR primer for Illumina and NEBNext Index primer (NEB #E7335S) with the following conditions:

The initial denaturation step for 2min at 98°C was followed by 4 cycles of 30sec 98°C, 30sec

57°C, 1min 72°C, and a final 10min at 72°C step. 6-7 barcoded samples were then sequenced on the Hiseq2000 in 2x100bp paired end mode.

89 4.7.3 DNA sequence data processing

For each sequencing lane, read pairs were mapped to the human reference genome (hg19, NCBI build 37.1, downloaded from the UCSC genome browser at http://genome.ucsc.edu/) using

BWA43 version 0.5.9-r16 with default parameters and maximum insert size set to 1 kb. We used

SAMtools44 to generate a chromosomal coordinate-sorted BAM file. Post-processing of the aligned reads included merging of lane-level data and removal of duplicate read pairs per sequencing library using Picard tools (version picard-1.48, http://picard.sourceforge.net). Lane, library and sample information was captured in the read group tag in the header of the merged final BAM file. Only uniquely aligned reads (minimum mapping quality of 1) were considered for downstream mutation analysis. Coverage calculations following duplicate removal considered all informative bases of the reference genome (excluding Ns). A mean Phred-scaled base quality of at least 25 across the length of the read was required. For target capture sequencing, only bases of reads overlapping the targets +/-100 bases were considered for coverage calculations. Sequencing statistics are given in Publication Supplemental Table 1 and

297.

4.7.4 Single nucleotide variant (SNV) detection

Our analysis pipeline for single nucleotide variant (SNV) detection integrates publicly available tools with custom in-house software and applies several filtering and annotation steps. SNV calling is based on SAMtools mpileup44 and bcftools44 (version 0.1.17), using parameter adjustments to allow calling of somatic variants. Default settings of bcftools are designed for diploid samples, but due to tumour heterogeneity, polyploidy and normal cell contamination,

90 tumour genomes often have a significantly lower mutant allele frequency than that seen in normal diploid genomes. Therefore, somatic SNVs are often not called by standard tools designed for detection of single nucleotide polymorphisms, e.g. in population studies such as the

1000 Genomes Project (http://www.1000genomes.org/). Initial SNV candidates were identified by using SAMtools mpileup for each tumour sample, considering only reads with a minimum mapping quality of 30 and bases with a minimum base quality of 13, after application of the extended base alignment quality (BAQ) model. BAQ is the Phred-scaled probability of a read base being misaligned45, and it is designed to reduce false SNV calls caused by misalignments.

After the pile up of high quality bases at each position of the input BAM file, bcftools applies the prior and performs the actual SNV calling. We changed the default probability of calling a variant if P(ref|D) < 0.5 to 1.0, which results in all positions containing at least one high quality non-reference base to be reported as a variant. Therefore, this initial set of SNV candidates contains a high fraction of false positive calls, but ensures that true somatic mutations with low allele frequency (well below the expected 50% allele frequency) are reported. This initial SNV call set is then subjected to various filters. SNVs covered by fewer than three reads in the tumour and control sample, with somatic allele frequency <10%, or with only one read supporting the variant were excluded. Additionally, a minimum of 10 high quality reads available at the corresponding position in the control sample were required, in order to be able to distinguish somatic from germline variants. Local sequence context can lead to incorrect base calls, but typically involves reads sequenced from a single strand only. Thus, if the variant call was supported by reads from only one strand, the +/-10 bases around the SNV were automatically screened for Illumina specific error profiles46 and excluded if a profile was matched. For all tumour SNV calls the pipeline generates a pileup of the bases in the normal sample considering only uniquely mapping reads. SNV calls were categorized as germline or somatic according to whether there was evidence for the same event at the same locus in the

91 BAM file of the tumour-matched control sample. Filtered calls were annotated with RefSeq gene annotations, dbSNP build 135 and variants from the 1000 Genomes project. Calls matching the position of known dbSNP (up to version 131) or known 1000 Genome variants were excluded from the high-confidence somatic call set (calls matching the position of dbSNP version>131 but not the position of 1000 Genome variants were retained because cancer relevant somatic mutations, such as several TP53 mutations. have been included in more recent dbSNP versions). Additionally, we filtered out SNVs that were found in at least 1% of the control samples or at least 1% of a set of 162 unrelated controls from other studies44, because they constitute likely unannotated, naturally occurring SNPs and/or false positives, e.g. Artifacts related to sequencing and mapping. The pipeline integrates Annovar47

(http://www.openbioinformatics.org/annovar/) to determine whether the observed amino acid change has synonymous, non-synonymous, nonsense, or splice-site changing properties on the encoded protein. Variants were further annotated with genes listed in the Cancer Gene Census

(http://www.sanger.ac.uk/genetics/CGP/Census/) and entries from the Catalogue of Somatic

Mutations version 57 (COSMIC, http://www.sanger.ac.uk/genetics/CGP/cosmic/), in addition to the full RefSeq gene summary, full gene name, and genomic size. A subset of sequence variants and Indels were validated by capillary sequencing by Sanger using purified PCR products.

Primer sequences are available upon request.

4.7.5 Small insertion and deletion (InDel) detection

Small insertions and deletions were identified with SAMtools and bcftools104. The InDel discovery pipeline is similar to the SNV pipeline (as described above), but using default bcftools parameters, to reduce the known high false positive rate associated with current InDel detection

92 methods for deep sequencing data. To call an indel a germline event, we only required one

InDel supporting read in the matching normal sample, again to reduce the high fraction of false positive somatic InDel calls. Calls overlapping simple repeat or microsatellite regions were excluded as such regions are commonly observed to yield false positive calls. Annotation of

InDels was identical to SNV annotation.

4.7.6 Computation of recurrently mutated genes

To search for genes mutated at significant frequency, we applied the MutSig algorithm, a method that corrects for background mutation rate and gene length. Details can be found at

Broad CGA tools website (http://www.broadinstitute.org/cancer/cga/mutsig) including previously published studies48,49.

4.7.7 Identification of rearrangements and generating of Circos Plots

Structural rearrangements, namely deletions, tandem duplications, inversions, and translocations, were detected using DELLY50, which is based on paired-end mapping51. The structural rearrangement calls were filtered using the corresponding ependymoma germline samples, germline data of additional medulloblastoma samples15, and phase I 1000 Genomes

Project (http://1000genomes.org) genome data to exclude germline structural variants as well as rearrangement calls caused by mapping artefacts. We only considered those rearrangements for further analysis, which were present in at most 0.5% of the 1000 Genomes Project samples assessed and not in the additional germline samples. Two rearrangement calls were considered to be the identical, hence constituting a likely germline variant if they displayed an overlap in

93 terms of genomic coordinates with their end coordinates differing by less than 5 kb.

Furthermore, rearrangement calls with less than 10 supporting pairs as well as supporting pairs with average mapping quality less than 20 were excluded for further analysis. The circular whole-genome plots were generated using Circos52.

4.7.8 Identification of pathways affected by SNVs

A Pathway Association Test was used identify groups of functionally related genes that contained a greater than expected number of SNVs in one or the other ependymoma subtype. For each gene set, SNVs were stratified by subtype (PFA vs. PFB). The number of

SNVs in all genes and the number of SNVs observed in a given gene set were totaled for each subtype. A Fisher's exact test was performed with the null hypothesis that the frequency of

SNVs in a given gene set was equal in the two subtypes. To correct for multiple testing, gene names were randomly shuffled and the analysis repeated to obtain a null p-value for a given gene set. Randomization was done 10,000 times and, for each gene set, the percentage of null p-values that were the same or lower than that obtained from the actual data was used as an estimate of the false discovery rate (FDR).

4.7.9 Generation of copy number profiles from Illumina 450K methylation data

Low-resolution (450K probes) copy number variations were detected from the 450k Infinium methylation array in a custom approach using the sum of both methylated and unmethylated signals. Probes found to be highly variant in the six normal cerebellum samples were excluded from the analysis according to the following criteria: Removal of probes not within the 0.05 and

94 0.85 Quantile of median summed values or over the 0.8 Quantile of the median absolute deviation. Log-ratios of samples to the median value of control samples were calculated, and sample noisiness was determined as the median absolute deviation of adjacent probes. Probes were then combined by joining 20 adjacent probes, and resulting genomic windows less than

100kb in size were iteratively merged with adjacent windows of smaller size. Windows of more than 5Mb were excluded from analysis, resulting in a total of 8,654 windows throughout the genome. For each window, the median probe value was calculated and shifted to minimize the median absolute deviation from all windows to zero for every sample. Segmentation was performed by applying the circular binary algorithm53, as published previously21.

4.7.10 Methyl- binding domain 2 assisted recovery and sample preparation

Genomic DNA was isolated according to previous methods12. DNA (6µg) was immunoprecipitated using the methyl- binding domain 2 protein and quantified using the Qubit fluorometer (Invitrogen). Enrichment was assessed by quantitative PCR for positive controls:

RASSF1A, DLK1, H19, and negative controls: ACTB and GAPDH. Bound and Unbound fractions from the MBD2 pull-down were whole genome amplified (SIGMA-WGA2) in triplicate, pooled, quantified using Qubit, and subjected to another round of quantitative PCR for the above control targets. DNA was sent to Nimblegen to be hybridized to Nimblegen 385K

CpG island promoter plus arrays, in which ‘IP’ fraction was labeled with CY5 and ‘unbound’ fraction with CY3.

95 4.7.11 Methylation analysis of MBD2-chip data

Microarray data was Quantile normalized using the LIMMA Bioconductor package. Log2 ratios were then imported into Agilent Genomics Workbench (Agilent Technologies), following which the BATMAN algorithm was used to infer the methylation statuses associated with each probe.

Mean methylation states were calculated for probes within a 1000 base pair window and termed a region of interest (ROI)42. ROIs were then filtered to those with greater than 4 probes and mapped to autosomal chromosomes. ROIs exhibiting a standard deviation greater than 0.65 were used for subgroup assignment as described below. Comparisons between subgroups were performed using a Wilcoxon-rank sum test, and p-values were corrected for multiple testing using the Benjamini-Hochberg method. For comparisons of DNA methylation and other factors in this manuscript a Wilcoxon test was used and corrected for multiple testing, such that no assumptions were made regarding the normality of the data distributions.

4.7.12 Illumina Infinium 450K methylation sample preparation and data analysis

Genomic DNA was isolated according to previous methods12. DNA (1µg) was used for bisulfite treatment (Qiagen – EpiTect plus) with the use of DNA protect buffer, particularly in the case of

DNA from formalin fixed paraffin embedded tissue. Bisulfite treated DNA was then quantified using spectrophotometry (Nanodrop). >500 ng was sent to The Centre for Applied Genomics

(TCAG – Toronto) for hybridization to Illumina 450K Methylation Arrays. Array pre- processing was performed using GenomeStudio (Illumina) with background subtraction adjustment applied. Arrays were also normalized using the BMIQ method, which produced the same finding of a Group A specific CIMP. Methylation values were then exported as Beta-

96 values (estimate of actual CpG methylation levels). Probes that overlapped with known single nucleotide polymorphisms, which mapped to Chromosome X and Y, and were Illumina control probes, were removed from the analysis. Methylation probes were then filtered to CpG sites, which mapped to promoters containing CpG islands. A Wilcoxon-Rank Sum Test (Mann-

Whitney) was used to identify the differentially methylated CpG sites between Group A

(CIMP+) and Group B (CIMP-). P-values for differentially methylated CpG sites identified were then corrected for multiple testing using the Benjamini-Hochberg method. Significant differences between numbers of CpG sites, genes, or methylated and silenced genes, was calculated using a Binomial Distribution Test. Methylated and Silenced Genes were identified in two ways 1) By identifying genes which were methylated and down-regulated following comparison between Group A and B using a Wilcoxon-Rank Sum test or 2) By Performing a

Pearson correlation between the methylation status of a CpG site with the corresponding down- stream gene. Methylated and Silenced genes (within the same tumor) were identified by genes demonstrating significant and preferential methylation in a particular subgroup and evidence of down-regulation as compared to a collection of normal brain samples. Gene expression data for these samples can be found in our previous publication12.

4.7.13 Subgroup analysis of gene expression and methylation data

To detect robust sample clusters from the gene expression data (Affymetrix Exon 1.0ST) we performed hierarchical clustering using the top 1000 varying probes as described previously in

Witt and Mack., et al12. For clustering of MBD2-chip data we performed consensus hierarchical clustering with agglomerative average linkage as our method for consensus clustering. (R package: ConsensusClusterPlus)54. The change in area under the cumulative distribution

97 function curve was used to identify the principal number of subgroups for a given clustering method. Silhouette analysis was performed to evaluate cluster representation of samples (R package: cluster)55. To evaluate the concordance between gene expression and DNA methylation subgroup stratification we calculated the Rand Index, with the significance assessed by permutation of sample labels and computing the Rand index over 10000 iterations in order to generate a null distribution. Illumina 450K methylation data was clustered using the probes exhibiting a standard deviation of >0.2 as described previously56. A variety of consensus clustering methods was performed including K-means, non-negative matrix factorization, hierarchical clustering, and self-organizing maps were used57. The distance metric used in the case of K-means was Euclidean, whereas a Pearson correlation was used for all other methods.

Principal component analysis was performed within Partek Genomics Suite (Partek Inc.) to compare Group A (CIMP+) and Group B (CIMP-) posterior fossa subtypes with the same genes or CpG sites used for consensus HCL and consensus K-means respectively.

4.7.14 H3K27me3 and EZH2 ChIP-Seq profiling and analysis in PF ependymoma

10-20mg of fresh frozen primary tumour samples were homogenized in 1% Formaldehyde and allowed to incubate at room temperature for 6 – 10 min. Cross-linking was stopped with the addition of 125mM of Glycine, and samples were washed twice with ice-cold PBS containing

(1% BSA and 10% FBS). Samples were then sonicated to ~200 bp fragments using a Biorupter

(Diagenode). The chromatin immunoprecipitation was then performed using 5 mg of EZH2 antibody (#39875-Active Motif) or H3K27me3 antibody (C15410069-Diagenode) overnight at

4C as described previously45. DNA was quantified using PicoGreen (Invitrogen) and libraries were prepared using NEBNext ChIP-seq Illumina Sequencing library preparation kit (NEB).

98 Samples were barcoded (NEB Next Barcodes) and pooled in equimolar amounts such that up to

6 samples could be sequenced by paired-end Illumina HiSeq 2000 sequencing (Illumina).

ChIP-seq reads were aligned using the BWA algorithm with removal of redundant reads (Picard

Algorithm) likely to represent ChIP-seq PCR library artifacts, yielding uniquely mapped ChIP- seq reads. Peaks were identified using MACS (Version 2) with a p-value cutoff of 0.01.

Differential peaks were identified using the R: Bioconductor DiffBind package (p < 0.05) and annotated to the nearest gene +/- 5kb using Cistrome (http://cistrome.org/ap/root). Overlap analysis between H3K27me3 genes or EZH2 target genes was assessed statistically using a binomial distribution test. Unsupervised consensus clustering of H3K27me3 predicted target genes was performed using the top 1000 genes exhibiting the greatest standard deviation.

Supervised analysis of predicted H3K27me3 target genes was also performed using Significant

Analysis of Microarrays with and FDR cutoff of 0.01.

4.7.15 Sequenom analysis of ependymoma samples

Validation of gene methylation was performed using Sequenom Mass Spectrometry. Primers were designed using Sequenom: EpiDesigner and tested on bisulfite-treated universally methylated DNA (Invitrogen) by standard PCR (Qiagen) followed by Sanger Sequencing. For

Bisulfite treated tumour samples, following PCR amplification, amplicons were sent to Genome

Quebec for quantification using Sequenom Mass Spectrometry.

4.7.16 Subgroup stratification of ependymoma samples in a validation cohort

99 Sequenom primers were designed to three highly methylated genes in Group A (CIMP), PKP1,

CRIP1, CYP26C1, as selected by CpG coverage and PCR efficiency. PCR amplification was performed in a training data set consisting of the samples, which were analyzed by Illumina

450K methylation arrays. These three methylated genes were used to train a classification model using the Prediction Analysis for Microarrays algorithm. Class prediction was performed on a non-overlapping cohort of 82 samples collected from The Hospital for Sick Children, Children’s

Hospital Boston, University of Michigan, and the MD Anderson Cancer Centre. Posterior probabilities corresponding to Group A (CIMP+) or Group B (CIMP-) were calculated for each sample, and an odds ratio > 2-fold (Probability Group A / Probability Group B) for either subgroup was used to classify tumours. Survival was graphed throughout the manuscript using

Kaplan-Meier curves and assessed statistically using a Log-Rank test.

4.7.17 Pathway analysis of DNA methylation data

A Pathway Association Test was used to identify groups of functionally related genes that contained a greater than expected number of methylation events in one or the other ependymoma subtype. A gene was considered to be methylated if all profiled sites within

1000bp upstream of the TSS showed a variance of no more than 0.1 and a mean score of

>0.5. For each gene set, methylation events were classified by subtype. The number of methylation events in all genes and the number of methylation events observed in a given gene set were totaled for each subtype. A Fisher's exact test was performed with the null hypothesis that the frequency of methylation events in a given gene set was equal in the two subtypes. To correct for multiple testing, gene names were randomly shuffled and the analysis repeated to obtain a null p-value for a given gene set. Randomization was done 10,000 times and, for each

100 gene set, the percentage of null p-values that were the same or lower than that obtained from the actual data was used as an estimate of the false discovery rate (FDR).

4.7.18 Whole-genome bisulfite sequencing, DNA preparation, and differentially methylated region (DMR) analysis

Strand-specific MethylC-seq libraries were prepared according to Lister et al. 2011 with minor modifications. Adaptor-ligated DNA fragments with insert lengths of 200–250 bp were converted using the EZ DNA Methylation kit (Zymo Research). After PCR amplification in six parallel reactions using the FastStart High Fidelity PCR kit (Roche), library aliquots were pooled per sample and sequenced using the Illumina HiSeq 2000 platform. This yielded an average of 513 million (±102 million (s.d.)) 101 bp paired-end reads per sample.

For analysis of DMR enrichment in specific genomic sites, we first extracted genomic features from UCSC genome browser. Then the percentage of total genomic CpGs for each genomic feature was calculated as a background value. Thereafter, the percentage of total hyper/hypomethylated CpGs in each genomic feature was calculated based on the DMR list.

The enrichment fold change was then set as the ratio between the two percentages above. In order to test the significance of the enrichment /depletion, we randomly permuted the CpGs from all DMRs in the whole genome for 10,000 times and used Fisher's exact test to determine the significance of the difference between the observed and simulated results.

101 4.7.19 Whole-genome bisulfite sequencing DMR and PMD calling

WGBS data was mapped to hg19 using BSMAP (version 2.74)46. The potential duplications were removed afterwards using Picard tools104. BisSNP (Version 0.82.2) was then used to detect and remove SNPs and CpGs with potential technical biases before DMR calling. BSmooth was used to smooth bisulphite sequencing data and call candidate DMRs as described previously47.

PMDs were detected using MethylSeekR48.

4.7.20 Ependymoma short-term primary cell culture and in vitro drug treatment

Primary ependymoma cells were isolated from patients and cultured on Laminin (Sigma) coated plates in Neurobasal media (Invitrogen) consisting of N2 (Invitrogen), B27 (Invitrogen),

Glutamine (Invitrogen), BSA (Sigma), heparin (Sigma), human EGF (Invitrogen), and human basic FGF (Invitrogen). Media was replenished every other day while leaving ~50% conditioned media to encourage continued cell proliferation. Cell viability assays were performed in 96 wells using an Alamar Blue stain (Invitrogen) or MTS Aqueous One (Promega) according to manufacturer’s instructions. 5-aza-2’-deoxycytidine (decitabine-Sigma) was dissolved to a stock concentration of 2mM in PBS and stored in aliquots at -20 C. DAC was prepared fresh and added to treatment media on a daily basis at the appropriate final concentration, for a total of 7 days. 3-dezaneplanocin A (DZNep) (Cayman Chemical) was dissolved to a stock concentration of 25 mM in DMSO and stored in aliquots at -20 C. DZNep treatments were performed every other day along with replenishment of cell culture medium for a total of 7 days. GSK343 (active compound) and GSK669 (inactive compound) were dissolved in DMSO and used to treat cells at varying concentrations with media replenishment every other day for a total of 11 days.

102

4.7.21 Gene expression profiling of DAC and GSK343 treated cultures

Primary cell cultures were treated for 5 days in Decitabine (500nM) or GSK343 (500nM), following which RNA was isolated using the Trizol (Invitrogen) method. RNA libraries were prepared according to manufacturers recommendations and hybridized to Affymetrix Gene

1.0ST arrays. The RMA method with Quantile normalization was used for gene expression array normalization. Differentially expressed genes were detected using Significance of

Microarray Analysis (FDR < 0.01).

4.7.22 Western blot analysis

Ependymoma cell cultures were lysed in PLC lysis buffer containing deoxycholate, with sonication to facilitate the release of nuclear histones. SDS-PAGE analysis was performed in a

12% gel, loading 20 ug of protein, as quantified by BCA (Pierce). Membranes were blocked with 5% bovine serum albumin (Roche) diluted in TBST. Western blot antibodies were used at the following concentrations in overnight incubations (2% BSA): EZH2 (Abcam: ab110646,

1:5000), H3K27me3 (Cell Signaling: #9733, 1:5000), H3K4me3 (Cell Signaling: #9751,

1:5000), cleaved PARP (Cell Signaling: #5625, 1:1000) and Alpha Tubulin (Cell Signaling:

#2148, 1:20000). Secondary antibodies were used at a concentration of 1:5000 for all primary antibodies, and 1:20000 for alpha-tubulin.

103 4.7.23 Flank injections and in vivo treatments of immunodeficient mice

For all animal studies, following engraftment of tumor cells, mice were then randomly assigned a treatment of vehicle versus treatment to control for assignment biases and other confounding factors. 50,000 E520 – PF1 ependymoma cells were injected subcutaneously into flanks of immunodeficient NOD-scid gamma mice. Tumours were allowed to develop for 7 days until either visible or palpable. DZNep or vehicle (Sigma: Cremaphor) was administered 3 consecutive days a week at a dosage of 3 mg/kg/day via intraperitoneal injections. Tumours were monitored and measured continuously using a caliper. Experimental endpoint was determined when tumours reached 15 mm in size. Final tumour volumes were determined using caliper measurements. Investigators were blinded during measurement of tumour volumes. A comparison between tumour volumes of DZNep versus vehicle treated mice was calculated using a Wilcoxon-rank sum test. For all animal studies, adequate sample sizes were chosen such that any result could be appropriately evaluated statistically using a two-sided non-parametric test. For flank xenograft experiments this entailed a Wilcoxon-Rank sum test, and for intracranial experiments this involved a Log-Rank test.

4.7.24 Cerebellar xenografts and in vivo treatments of immunodeficient mice

10,000 cells were xenografted by stereotactic injection into posterior fossas of immunodeficient

NODscid gamma mice of 5-8 weeks old. Tumours were allowed to develop for 7 days, following which DZNep (3mg/kg/d) or vehicle (Sigma: cremaphor) administered by intra- peritoneal injection. Mice were treated according to the same protocol for flank tumor bearing

104 mice (above). Survival of mice was visualized using a Kaplan-Meier curve and quantified using a log rank test.

4.7.25 Limiting dilution assays (LDAs) of primary ependymoma patient samples or ependymoma xenografts

Cells from a lung metastasis resection or tumour xenograft were dissociated according to previously published protocols49. LDAs were performed in a 96-well plate format. LDAs from xenografts were not treated with inhibitors but monitored for neurosphere colony formation. For primary patient samples serial dilutions of cells were performed to reach cell doses of 2000 cells per well at the highest dose and 4 cells per well at the lowest dose. A total of 10 cell doses were tested with 6 technical replicates per dose. Cells were treated with selected compounds about 5 h post-surgery. GSK343 was used at a concentration of 3 uM, and Decitabine was used at 0.5 uM. Fresh media and drugs were added to the cells after 7 days. Wells were scored for sphere formation on day 14. Statistical analysis was performed with the Extreme Limiting Dilution

Analysis web-based software50

105 5 Chapter 5: Future Directions

5.1 Further examination of tumour heterogeneity in ependymoma

We and other groups have demonstrated that supratentorial, posterior fossa, and spinal ependymomas are distinct molecular and clinical entities, which can be further subdivided into

ST-RELA–FUSION+/-, and PF-CIMP+/-. One possible argument for the lack of detectable genetic alterations in genomically stable PF-CIMP+ ependymoma could be a significant degree of inter-tumoral heterogeneity within this subgroup. This was perhaps unrecognized in the past because tumours were grouped with histologically similar ST and SP ependymomas. One might argue that genomic characterization of a larger cohort of PF-CIMP+ ependymomas could reveal novel and recurrent somatic mutations, however these events are predicted to be exceedingly rare. As a simple thought process, combination of PF ependymomas from both Mack et al., 2014 and Parker et al., 2014 (Combined cohort n=73) suggests that the frequency of a recurrent somatic mutation, if identified, would be lower than 1.3%. In other words, 98.7% of PF ependymomas would most likely harbour no recurrent mutations in coding space.

Spatial intra-tumoural heterogeneity is another issue that will need to be addressed in future studies, particularly for design of clinical trials. As shown in other solid cancers such as renal cell carcinoma or glioblastoma multiforme mutational and/or subgroup diversity can be seen in distinct regions of the same tumour sample105,106. This information will be important for accurate subgroup classification of ependymoma samples in future clinical trials, based upon a single, or if needed, multiple biopsies from different locations of the same tumour.

106

5.2 Further investigation of genomically balanced posterior fossa ependymoma

Despite a balanced genome observed in PF ependymoma, it is possible that gains or losses of entire genomes, or specific chromosomes, might have occurred and were undetectable by copy number analysis using array CGH as presented in Chapter 2. The answer to this hypothesis could be addressed with current published data, by extracting allele frequency calls in 500K

SNP array data by Johnson et al., 2010, and mutant allele frequency calls in WGS data by

Parker et al., 2014. Furthermore, the analysis presented for PF ependymoma in this thesis has focused primarily on coding or exonic regions. New evidence in solid cancers such as melanoma and medulloblastoma has demonstrated that inter-genic mutations can have a direct influence upon gene expression through mutation of transcriptional regulatory elements99,100.

This has been demonstrated by TERT promoter somatic mutations, which are associated with increased TERT expression and therefore avoidance of replication induced senescence.

Although our study, in addition to Parker et al., 2014, present whole-genome sequencing from

37 PF ependymomas, the sequencing coverage (~30X) was significantly less than achieved through exome sequencing (~150X). Future sequencing coverage of PF ependymoma genomes will need to be increased in order to potentially identify non-coding somatic SNVs, or the absence of these events across a broad tumour cohort. If non-coding somatic SNVs are identified understanding the functional relevance of these events will require additional layers of genomic evidence. As one possible approach, WGS data could be overlaid with techniques such as FAIRE-seq (Formaldehyde assisted isolation of regulatory elements), MNase-seq

107 (Micrococcal nuclease), or DNase1-seq, which chart genomic regions of open versus closed chromatin, to determine if sequence alterations affect the higher-order structure of chromatin.

Based upon these findings, additional layers of histone modifications could be potentially examined such as enhancer marks: H3K27ac, H3K4me1, or repressive marks: H3K27me3,

H3K9me2/3 among many others. In addition, this also raises a question as to whether non- coding mutations need to be recurrent within the same genomic region, or could non-coding mutations be situated at distinct locations but regulate the same gene or sets of genes through

DNA looping. Delineating these scenarios could be potentially addressed with complementary techniques such as Hi-C, which reveal DNA-DNA interactions across the entire genome.

Another hypothesis is that the analysis presented in this study, and several other cancer sequencing studies, have been focused upon non-synonymous (non-silent) somatic mutations, which cause alterations in the amino acid sequence40-45. Analysis presented by Supek et al.,

2014 demonstrate that synonymous mutations (silent) in genes can create alternative splice variants, which may be bonafide drivers of tumourigenesis107. Potential events such as these could be mined in public data sets by leveraging the WGS sequencing data from PF ependymoma and combining this with RNA-seq data from the same samples.

5.3 Foundations for the PFA ependymoma CIMP

In Chapter 2 of this thesis, I presented evidence suggesting that Group A PF ependymoma may be characterized by a CpG island methylator phenotype (CIMP). In contrast to other cancers,

108 PFA ependymomas lack a clear genetic basis for the CIMP, with absence of IDH1 mutations

(GBM-CIMP), DNMT3A/TET1/2 mutations (AML-CIMP), or SDH mutations (Paraganglioma-

CIMP)40,108,109. Furthermore, PFA ependymomas do not significantly over-express DNA methyltransferases as observed in other solid tumours110. One possible explanation for a PFA

CIMP could be developmental, owing to a lack of necessary chemical and/or electrical stimuli in proliferating cells of origin of ependymoma that could be required for differentiation and cell cycle exit. Properly addressing this hypothesis would require a better understanding of the hierarchy of neuroepithelial cells giving rises to radial glial cells (RGCs), the progeny of RGCs, the transcriptional networks that govern RGC differentiation programs, and the chemical/electrical stimuli needed for induction and regulation. These programs could then be exploited in developing RGC populations by transgenic mouse models and/or chemical modulators that target key regulators of RGC differentiation. As one hypothesis, in a manner similar to GBM-CIMP and AML-CIMP, metabolite imbalance related to aberrant differentiation signals could potentially lead to a CIMP phenotype through pathogenic accumulation of 2- hydroxygluterate, an inhibitor of DNA de-methylation. An alternate mechanism for establishing the PFA ependymoma CIMP might involve re-distribution of the DNA methyltransferases

DNMT1/3A/3B during tumour initiation and/or progression. Despite no significant changes in the gene expression levels of these DNMT enzymes in PFA ependymoma, the locations of the

DNMTs has yet to be investigated, and could be readily assessed using ChIP-seq. Furthermore, because the DNMT proteins dimerize, investigating the coordinated locations of these enzymes could reveal possible mechanisms leading to aberrant DNA methylation.

109 5.4 Further elucidating the epigenomic landscape of posterior fossa ependymoma

Our study has demonstrated that DNA methylation patterns in PFA ependymoma are highly connected with higher-order chromatin structure, specifically the repressive mark H3K27me3.

Furthermore, the genes DNA hypermethylated in PFA ependymoma, and occupied by

H3K27me3, overlap significantly with genes known to be marked by the polycomb repressive complex 2 (PRC2) in embryonic stem cells (ESC). One important aspect that could be addressed in the future is examining whether the H3K27me3 mark is established first, followed by long-term silencing mediated by DNA hypermethylation. Alternatively, could DNA hypermethylation occur first followed by histone-mediated gene silencing, as directed by

EZH2's physical association with DNMT enzymes? Another important and related aspect is that the PRC2 marked ESC genes are bi-valent, and harbor the ‘active’ H3K4me3 mark. During fertilization CpG islands are kept un-methylated by recognition of H3K4 by an enzymatically inactive DNA methyltransferase, DNMT3L. A possible explanation for DNA methylation spreading into CpG islands could be because of inhibition of DNMT3L binding through H3K4 tri-methylation, thus leading to a possible CIMP phenotype

Although this thesis focused primarily upon the histone modification H3K27me3, global examination of other marks such as H3K4me1, H3K27ac, H3K36me3, H3K9me2/3, among many others remains to be explored. Understanding the functional consequences of these marks may reveal mechanisms of disease pathogenesis and targets for therapeutic intervention, by leveraging the recent development of numerous small-molecule inhibitors of epigenetic

110 regulators. The importance of elucidating epigenetic landscapes and the functional plasticity of tumour cells in response to treatment is highlighted in a publication by Knoechel et al., 2014.

Here they demonstrate that tumour resistance to targeted therapies, specifically gamma- secretase inhibitors, can be mediated through epigenetic adaptation, specifically modulation of binding of the histone acetylation ‘reader’ BRD4 protein. This resistance mechanism could be circumvented with up-front combinatory treatment of gamma-secretase inhibitors with the

BRD4 inhibitor, JQ1. Along the same lines, one hypothesis for the chemo-resistant nature of PF ependymoma could be cellular adaptation mediated by epigenetic modifications. In a similar approach, ependymoma primary cultures could be exposed to a panel of chemotherapeutic agents and synthetic lethal epigenetic targets could be identified using chemical and/or genetic library screens.

5.5 Mouse models and cellular origins of posterior fossa ependymoma

In Chapter 3, I emphasized that PFA ependymoma are characterized by distinct DNA and histone methylation patterns, which may be involved in tumour development. Potential animal modeling of these genome-wide epigenetic alterations could be achieved by perturbation of

DNA and histone methyltransferases in murine RGCs. As one possibility, the EZH2-Y641 mutant, which promotes the conversion of histone H3 lysine 27 di-methylation to a trimethylated state, could be over-expressed in RGCs through transgenic models or ex-vivo approaches111. To potentially model the CIMP phenotype a hyperactive DNMT1 isoform could be expressed in the RGCs, or genetic ablation of DNA de-methylation machinery such as TET proteins or AID/APOBEC proteins. These approaches however rely on the concept that PF ependymomas arise from the RGC compartment, which encompass a heterogeneous group of

111 neural progenitor cells both anatomically and temporally. Accurate modeling of PF ependymoma might first require elucidation of specific RGCs that give rise to PF ependymoma and the markers needed for cellular isolation. One strategy for pinpointing the exact cell of origin of posterior fossa ependymoma, might involve generating whole-genome DNA methylation profiles in mice of early neuroepithelial cells, radial glial cells, differentiated , astrocytes, ependymal cells, and oligodendrocytes in different CNS regions of the mouse developing embryo. Since DNA methylation profiles are highly stable, it is possible that genomic regions of methylation in PF ependymoma tumours may be capturing signatures of putative cells of origin of ependymoma. These signatures could then be compared against mouse methylation profiles by mapping conserved CpG sites and predicting neural compartments most similar to PF ependymomas.

5.6 Clinical implications for treatment of posterior fossa ependymoma

In this thesis, I have demonstrated that Group A and B are molecularly distinct entities and can be robustly stratified by routine immunohistochemistry (IHC), gene expression profiling, and

DNA methylation profiling. Given the clinical disparity between these subgroups, it will be important for future ependymoma clinical trials to stratify PF ependymomas based on subgroup, or at least collect tissue so that outcomes can then be analyzed in the setting of molecular subgroups. This is particularly important in the case of ependymoma for which histopathological grading has limited utility, and molecular markers are unavailable. There are several tools for which rapid and robust PFA versus PFB stratification could be achieved such

112 as IHC, gene expression arrays, DNA methylation arrays, or examination of specific DNA methylation markers using Sequenom technology. The advantage of molecular approaches that measure DNA methylation is that it is a stable mark, DNA from FFPE tissue can be used, and very small quantities of DNA are required. Indeed, the 3-marker PFA-CIMP+ assay described in

Chapter 3 of this manuscript requires less than 3ng of DNA from fresh frozen tissue for accurate sample classification.

Given the distinct clinical outcomes and behavior of PFA versus PFB ependymoma, numerous prospective clinical trials could be designed to improve PF ependymoma patient outcomes. In the case of PFB, which have a very good prognosis, a clinical trial could be designed to address whether de-escalation of radiotherapy could be achieved in order to reduce treatment-associated morbidity. In the case of PFA, which have a very poor prognosis, a clinical trial could be designed to address whether more aggressive surgery and/or radiotherapy might be beneficial, or whether novel biological agents such as Decitabine and/or Vorinostat should be investigated.

Given the fact that ependymomas are highly chemo-resistant, an existing FDA approved compound such as Decitabine or Vorinostat could be rapidly translated in a Phase I/II clinical trial. These epigenetic agents could also be tested in combination with chemotherapeutic agents, at least in pre-clinical models, to identify synergistic combinations that may be effective in PFA ependymoma treatment.

113 6 Attributions

The research described in the majority of this thesis stems from two publications for which I was co-first author, and played a central role in the conception, analysis, and interpretation of both studies under the direct supervision of Dr. Michael D. Taylor. Specifically, I organized sample collection and processing, performed gene expression analysis, copy number analysis,

DNA methylation analysis, and ChIP-seq analysis. I also performed and/or managed in vitro and in vivo drug treatments, ChIP experiments, and immunoblot experiments. These functional studies were performed in collaboration with Dr. Peter B. Dirks and Dr. Marco Gallo. Our partners in both studies, Dr. Hendrik Witt, Dr. Stefan M. Pfister, and Dr. Andrey Korshunov, were vital for generating and analyzing whole-genome, exome, and whole-genome bisulfite sequencing, in addition to TMA staining, and additional array processing. Because ependymoma is such a rare tumour, this study would not have been possible without many collaborators across the world whom provided both tissue samples, study design advice, and manuscript feedback.

114 7 Copyright Permissions

Copyright release for the following articles was obtained using RightsLink:

1) Licensed content title: The genetic and epigenetic basis of ependymoma License number: 3322680594487 License date: Feb 05, 2014 Licensed content Publisher: Springer Licensed content publication: Child's Nervous System Licensed content author: Stephen C. Mack Licensed content date: Jan 1, 2009

2) Licensed content title: Emerging Insights into the Ependymoma Epigenome License number: 3322680193331 License date: Feb 05, 2014 Licensed content publisher: John Wiley and Sons Licensed content publication: Brain Pathology Licensed content author: Stephen C. Mack Licensed content date: Feb 25, 2013

3) Licensed content title: Delineation of Two Clinically and Molecularly Distinct Subgroups of Posterior Fossa Ependymoma License number: 3322670685501 License date: Feb 05, 2014 Licensed content publisher: Elsevier Licensed content publication: Cancer Cell Licensed content author: Stephen C. Mack Licensed content date: 16 August 2011

115 8 Reference List

1 Villano, J. L., Parker, C. K. & Dolecek, T. A. Descriptive epidemiology of ependymal tumours in the United States. British journal of cancer 108, 2367-2371, doi:10.1038/bjc.2013.221 (2013).

2 Louis, D. N. et al. The 2007 WHO classification of tumours of the central nervous system. Acta neuropathologica 114, 97-109, doi:10.1007/s00401-007-0243-4 (2007).

3 Godfraind, C. Classification and controversies in pathology of ependymomas. Child's nervous system : ChNS : official journal of the International Society for Pediatric Neurosurgery 25, 1185-1193, doi:10.1007/s00381-008-0804-4 (2009).

4 Ellison, D. W. et al. Histopathological grading of pediatric ependymoma: reproducibility and clinical relevance in European trial cohorts. Journal of negative results in biomedicine 10, 7, doi:10.1186/1477-5751-10-7 (2011).

5 Castelo-Branco, P. et al. Methylation of the TERT promoter and risk stratification of childhood brain tumours: an integrative genomic and molecular study. The lancet oncology 14, 534-542, doi:10.1016/s1470-2045(13)70110-4 (2013).

6 Tabori, U. et al. Human reverse transcriptase expression predicts progression and survival in pediatric intracranial ependymoma. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 24, 1522-1528, doi:10.1200/jco.2005.04.2127 (2006).

7 Wong, V. C., Morrison, A., Tabori, U. & Hawkins, C. E. Telomerase inhibition as a novel therapy for pediatric ependymoma. Brain pathology (Zurich, Switzerland) 20, 780- 786, doi:10.1111/j.1750-3639.2010.00372.x (2010).

8 Gilbertson, R. J. et al. ERBB receptor signaling promotes ependymoma cell proliferation and represents a potential novel therapeutic target for this disease. Clin Cancer Res 8, 3054-3064 (2002).

9 Korshunov, A. et al. Adult and pediatric are genetically distinct and require different algorithms for molecular risk stratification. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 28, 3054-3060, doi:10.1200/jco.2009.25.7121 (2010).

10 Di Pinto, M., Conklin, H. M., Li, C., Xiong, X. & Merchant, T. E. Investigating verbal and visual auditory learning after conformal for childhood ependymoma. International journal of radiation oncology, biology, physics 77, 1002- 1008, doi:10.1016/j.ijrobp.2009.06.003 (2010).

11 Netson, K. L., Conklin, H. M., Wu, S., Xiong, X. & Merchant, T. E. A 5-year investigation of children's adaptive functioning following conformal radiation therapy

116 for localized ependymoma. International journal of radiation oncology, biology, physics 84, 217-223.e211, doi:10.1016/j.ijrobp.2011.10.043 (2012).

12 Willard, V. W., Conklin, H. M., Boop, F. A., Wu, S. & Merchant, T. E. Emotional and Behavioral Functioning After Conformal Radiation Therapy for Pediatric Ependymoma. International journal of radiation oncology, biology, physics, doi:10.1016/j.ijrobp.2013.12.006 (2014).

13 Bouffet, E. et al. Survival benefit for pediatric patients with recurrent ependymoma treated with reirradiation. International journal of radiation oncology, biology, physics 83, 1541-1548, doi:10.1016/j.ijrobp.2011.10.039 (2012).

14 Bouffet, E. & Foreman, N. Chemotherapy for intracranial ependymomas. Child's nervous system : ChNS : official journal of the International Society for Pediatric Neurosurgery 15, 563-570 (1999).

15 Venkatramani, R. et al. Outcome of infants and young children with newly diagnosed ependymoma treated on the "Head Start" III prospective clinical trial. Journal of neuro- oncology 113, 285-291, doi:10.1007/s11060-013-1111-9 (2013).

16 Mack, S. C. & Taylor, M. D. The genetic and epigenetic basis of ependymoma. Child's nervous system : ChNS : official journal of the International Society for Pediatric Neurosurgery 25, 1195-1201, doi:10.1007/s00381-009-0928-1 (2009).

17 Taylor, M. D. et al. Radial glia cells are candidate stem cells of ependymoma. Cancer Cell 8, 323-335, doi:10.1016/j.ccr.2005.09.001 (2005).

18 Johnson, R. A. et al. Cross-species genomics matches driver mutations and cell compartments to model ependymoma. Nature 466, 632-636, doi:10.1038/nature09173 (2010).

19 Korshunov, A. et al. Molecular staging of intracranial ependymoma in children and adults. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 28, 3182-3190, doi:10.1200/jco.2009.27.3359 (2010).

20 Witt, H. et al. Delineation of two clinically and molecularly distinct subgroups of posterior fossa ependymoma. Cancer Cell 20, 143-157, doi:10.1016/j.ccr.2011.07.007 (2011).

21 Bettegowda, C. et al. Exomic sequencing of four rare central nervous system tumor types. Oncotarget 4, 572-583 (2013).

22 Wani, K. et al. A prognostic gene expression signature in infratentorial ependymoma. Acta neuropathologica 123, 727-738, doi:10.1007/s00401-012-0941-4 (2012).

23 Milde, T. et al. A novel human high-risk ependymoma stem cell model reveals the differentiation-inducing potential of the histone deacetylase inhibitor Vorinostat. Acta neuropathologica 122, 637-650, doi:10.1007/s00401-011-0866-3 (2011).

117 24 Yu, L. et al. A clinically relevant orthotopic xenograft model of ependymoma that maintains the genomic signature of the primary tumor and preserves cancer stem cells in vivo. Neuro-oncology 12, 580-594, doi:10.1093/neuonc/nop056 (2010).

25 Atkinson, J. M. et al. An integrated in vitro and in vivo high-throughput screen identifies treatment leads for ependymoma. Cancer cell 20, 384-399, doi:10.1016/j.ccr.2011.08.013 (2011).

26 Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646- 674, doi:10.1016/j.cell.2011.02.013 (2011).

27 Rogers, H. A. et al. Supratentorial and spinal pediatric ependymomas display a hypermethylated phenotype which includes the loss of tumor suppressor genes involved in the control of cell growth and death. Acta neuropathologica 123, 711-725, doi:10.1007/s00401-011-0904-1 (2012).

28 Hamilton, D. W., Lusher, M. E., Lindsey, J. C., Ellison, D. W. & Clifford, S. C. Epigenetic inactivation of the RASSF1A tumour suppressor gene in ependymoma. Cancer letters 227, 75-81, doi:10.1016/j.canlet.2004.11.044 (2005).

29 Waha, A. et al. Analysis of HIC-1 methylation and transcription in human ependymomas. International journal of cancer. Journal international du cancer 110, 542-549, doi:10.1002/ijc.20165 (2004).

30 Rousseau, E. et al. CDKN2A, CDKN2B and p14ARF are frequently and differentially methylated in ependymal tumours. Neuropathology and applied neurobiology 29, 574- 583 (2003).

31 Alonso, M. E. et al. Aberrant CpG island methylation of multiple genes in ependymal tumors. Journal of neuro-oncology 67, 159-165 (2004).

32 Gonzalez-Gomez, P. et al. CpG island methylation status and mutation analysis of the RB1 gene essential promoter region and protein-binding pocket domain in nervous system tumours. British journal of cancer 88, 109-114, doi:10.1038/sj.bjc.6600737 (2003).

33 Koos, B. et al. The evi-1 is overexpressed, promotes proliferation, and is prognostically unfavorable in infratentorial ependymomas. Clin Cancer Res 17, 3631-3637, doi:10.1158/1078-0432.ccr-11-0175 (2011).

34 Michalowski, M. B. et al. Methylation of RASSF1A and TRAIL pathway-related genes is frequent in childhood intracranial ependymomas and benign choroid plexus papilloma. Cancer genetics and cytogenetics 166, 74-81, doi:10.1016/j.cancergencyto.2005.09.004 (2006).

35 Cadieux, B., Ching, T. T., VandenBerg, S. R. & Costello, J. F. Genome-wide hypomethylation in human glioblastomas associated with specific copy number alteration, methylenetetrahydrofolate reductase allele status, and increased proliferation. Cancer Res 66, 8469-8476, doi:10.1158/0008-5472.can-06-1547 (2006).

118 36 Xie, H. et al. Epigenomic analysis of Alu repeats in human ependymomas. Proc Natl Acad Sci U S A 107, 6952-6957, doi:10.1073/pnas.0913836107 (2010).

37 O'Connor, O. A. et al. Clinical experience with intravenous and oral formulations of the novel histone deacetylase inhibitor suberoylanilide hydroxamic acid in patients with advanced hematologic malignancies. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 24, 166-173, doi:10.1200/jco.2005.01.9679 (2006).

38 Shen, L. et al. DNA methylation predicts survival and response to therapy in patients with myelodysplastic syndromes. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 28, 605-613, doi:10.1200/jco.2009.23.4781 (2010).

39 Rahman, R. et al. Histone deacetylase inhibition attenuates cell growth with associated telomerase inhibition in high-grade childhood cells. Molecular cancer therapeutics 9, 2568-2581, doi:10.1158/1535-7163.mct-10-0272 (2010).

40 Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807-1812, doi:10.1126/science.1164382 (2008).

41 Schwartzentruber, J. et al. Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma. Nature 482, 226-231, doi:10.1038/nature10833 (2012).

42 Jones, D. T. et al. Dissecting the genomic complexity underlying medulloblastoma. Nature 488, 100-105, doi:10.1038/nature11284 (2012).

43 Parsons, D. W. et al. The genetic landscape of the childhood cancer medulloblastoma. Science 331, 435-439, doi:10.1126/science.1198056 (2011).

44 Pugh, T. J. et al. Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations. Nature 488, 106-110, doi:10.1038/nature11329 (2012).

45 Robinson, G. et al. Novel mutations target distinct subgroups of medulloblastoma. Nature 488, 43-48, doi:10.1038/nature11213 (2012).

46 Sturm, D. et al. Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell 22, 425-437, doi:10.1016/j.ccr.2012.08.024 (2012).

47 Mendrzyk, F. et al. Identification of gains on 1q and epidermal growth factor receptor overexpression as independent prognostic markers in intracranial ependymoma. Clin Cancer Res 12, 2070-2079, doi:12/7/2070 [pii] (2006).

48 Modena, P. et al. Identification of tumor-specific molecular signatures in intracranial ependymoma and association with clinical characteristics. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 24, 5223-5233 (2006).

119 49 Puget, S. et al. Candidate genes on chromosome 9q33-34 involved in the progression of childhood ependymomas. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 27, 1884-1892 (2009).

50 Verhaak, R. G. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98-110, doi:10.1016/j.ccr.2009.12.020 (2010).

51 Monti, S., Tamayo, P., Mesirov, J. & Golub, T. R. Consensus Clustering: a resampling- based method for class discovery and visualization of gene expression microarray data. Mach Learn 52, 91-118 (2003).

52 Brunet, J. P., Tamayo, P., Golub, T. R. & Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A 101, 4164-4169 (2004).

53 Liu, Y., Hayes, D. N., Nobel, A. & Marron, J. S. Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data. Journal of the American Statistical Association 103, 1281-1293, doi:10.1198/016214508000000454 (2008).

54 Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20, 53-65 (1987).

55 Hoshida, Y., Brunet, J. P., Tamayo, P., Golub, T. R. & Mesirov, J. P. Subclass mapping: identifying common subtypes in independent disease data sets. PLoS One 2, e1195, doi:10.1371/journal.pone.0001195 (2007).

56 Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545- 15550 (2005).

57 Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 5, e13984, doi:10.1371/journal.pone.0013984 (2010).

58 Wilson, C. W. et al. Fused has evolved divergent roles in vertebrate Hedgehog signalling and motile ciliogenesis. Nature 459, 98-102 (2009).

59 Rink, J. C., Gurley, K. A., Elliott, S. A. & Sanchez Alvarado, A. Planarian Hh signaling regulates regeneration polarity and links Hh pathway evolution to cilia. Science 326, 1406-1410 (2009).

60 Garcion, E., Faissner, A. & ffrench-Constant, C. Knockout mice reveal a contribution of the extracellular matrix molecule tenascin-C to neural precursor proliferation and migration. Development 128, 2485-2496 (2001).

61 Bouffet, E., Perilongo, G., Canete, A. & Massimino, M. Intracranial ependymomas in children: a critical review of prognostic factors and a plea for cooperation. Med Pediatr Oncol 30, 319-329; discussion 329-331 (1998).

120 62 Merchant, T. E. et al. Preliminary results from a phase II trial of conformal radiation therapy and evaluation of radiation-related CNS effects for pediatric patients with localized ependymoma. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 22, 3156-3162 (2004).

63 Kawabata, Y., Takahashi, J. A., Arakawa, Y. & Hashimoto, N. Long-term outcome in patients harboring intracranial ependymoma. J Neurosurg 103, 31-37, doi:10.3171/jns.2005.103.1.0031 (2005).

64 Perilongo, G. et al. Analyses of prognostic factors in a retrospective review of 92 children with ependymoma: Italian Pediatric Neuro-oncology Group. Med Pediatr Oncol 29, 79-85, doi:10.1002/(SICI)1096-911X(199708)29:2<79::AID-MPO3>3.0.CO;2-O [pii] (1997).

65 Pollack, I. F. et al. Intracranial ependymomas of childhood: long-term outcome and prognostic factors. Neurosurgery 37, 655-666; discussion 666-657 (1995).

66 Robertson, P. L. et al. Survival and prognostic factors following radiation therapy and chemotherapy for ependymomas in children: a report of the Children's Cancer Group. J Neurosurg 88, 695-703, doi:10.3171/jns.1998.88.4.0695 (1998).

67 Sutton, L. N. et al. Prognostic factors in childhood ependymomas. Pediatr Neurosurg 16, 57-65 (1990).

68 van Veelen-Vincent, M. L. et al. Ependymoma in childhood: prognostic factors, extent of surgery, and . J Neurosurg 97, 827-835, doi:10.3171/jns.2002.97.4.0827 (2002).

69 Carter, M. et al. Genetic abnormalities detected in ependymomas by comparative genomic hybridisation. British journal of cancer 86, 929-939, doi:10.1038/sj.bjc.6600180 (2002).

70 Dyer, S. et al. Genomic imbalances in pediatric intracranial ependymomas define clinically relevant groups. Am J Pathol 161, 2133-2141, doi:10.1016/s0002- 9440(10)64491-4 (2002).

71 Birkbak, N. J. et al. Paradoxical Relationship between Chromosomal Instability and Survival Outcome in Cancer. Cancer Res (2011).

72 Ebert, C. et al. Molecular genetic analysis of ependymal tumors. NF2 mutations and chromosome 22q loss occur preferentially in intramedullary spinal ependymomas. Am J Pathol 155, 627-632 (1999).

73 Birch, B. D. et al. Frequent type 2 neurofibromatosis gene transcript mutations in sporadic intramedullary spinal cord ependymomas. Neurosurgery 39, 135-140 (1996).

74 Gibson, P. et al. Subtypes of medulloblastoma have distinct developmental origins. Nature 468, 1095-1099 (2010).

121 75 Korshunov, A., Golanov, A. & Timirgaz, V. Immunohistochemical markers for prognosis of ependymal neoplasms. Journal of neuro-oncology 58, 255-270 (2002).

76 Preusser, M. et al. Survivin expression in intracranial ependymomas and its correlation with tumor cell proliferation and patient outcome. American journal of clinical pathology 124, 543-549, doi:10.1309/pp2g5gaafkv82dtg (2005).

77 Rand, V. et al. Investigation of chromosome 1q reveals differential expression of members of the S100 family in clinical subgroups of intracranial paediatric ependymoma. British journal of cancer 99, 1136-1143, doi:10.1038/sj.bjc.6604651 (2008).

78 Pfister, S. et al. Outcome prediction in pediatric medulloblastoma based on DNA copy- number aberrations of chromosomes 6q, 17q, and the MYC/MYCN loci. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 27, 1627-1636 (2009).

79 Northcott, P. A. et al. The miR-17/92 polycistron is up-regulated in - driven medulloblastomas and induced by N-myc in sonic hedgehog-treated cerebellar neural precursors. Cancer Res 69, 3249-3255 (2009).

80 R Development Core Team. R: A language and environment for statistical computing. . (R Foundation for Statistical Computing, 2009).

81 Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80 (2004).

82 Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A. & Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 Suppl 1, S96-104 (2002).

83 Solinas-Toldo, S. et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer 20, 399-407, doi:10.1002/(SICI)1098-2264(199712)20:4<399::AID-GCC12>3.0.CO;2-I [pii] (1997).

84 Zielinski, B. et al. Detection of chromosomal imbalances in retinoblastoma by matrix- based comparative genomic hybridization. Genes Chromosomes Cancer 43, 294-301, doi:10.1002/gcc.20186 (2005).

85 Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat Genet 36, 949-951 (2004).

86 Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520-525 (2001).

87 Kim, P. M. & Tidor, B. Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome research 13, 1706-1718, doi:10.1101/gr.903503 (2003).

122 88 Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572-1573 (2010).

89 Reich, M. et al. GenePattern 2.0. Nat Genet 38, 500-501, doi:10.1038/ng0506-500 (2006).

90 Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 13, 2498-2504, doi:10.1101/gr.1239303 (2003).

91 Schaefer, C. F. et al. PID: the Pathway Interaction Database. Nucleic acids research 37, D674-679, doi:10.1093/nar/gkn653 (2009).

92 Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. & Hirakawa, M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic acids research 38, D355-360, doi:10.1093/nar/gkp896 (2010).

93 Finn, R. D. et al. Pfam: the protein families database. Nucleic acids research 42, D222- 230, doi:10.1093/nar/gkt1223 (2014).

94 Goeman, J. J., van de Geer, S. A., de Kort, F. & van Houwelingen, H. C. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20, 93- 99 (2004).

95 Pfister, S. et al. Supratentorial primitive neuroectodermal tumors of the central nervous system frequently harbor deletions of the CDKN2A locus and other genomic aberrations distinct from medulloblastomas. Genes Chromosomes Cancer 46, 839-851, doi:10.1002/gcc.20471 (2007).

96 Neben, K. et al. Microarray-based screening for molecular markers in medulloblastoma revealed STK15 as independent predictor for survival. Cancer Res 64, 3103-3111 (2004).

97 Mack, S. C. et al. Epigenomic alterations define lethal CIMP-positive ependymomas of infancy. Nature, doi:10.1038/nature13108 (2014).

98 Parker, M. et al. C11orf95-RELA fusions drive oncogenic NF-kappaB signalling in ependymoma. Nature, doi:10.1038/nature13109 (2014).

99 Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957-959, doi:10.1126/science.1229259 (2013).

100 Killela, P. J. et al. TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal. Proc Natl Acad Sci U S A 110, 6021-6026, doi:10.1073/pnas.1303607110 (2013).

101 Remke, M. et al. TERT promoter mutations are highly recurrent in SHH subgroup medulloblastoma. Acta neuropathologica 126, 917-929, doi:10.1007/s00401-013-1198-2 (2013).

123 102 Ben-Porath, I. et al. An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet 40, 499-507, doi:10.1038/ng.127 (2008).

103 Kim, J. et al. A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs. Cell 143, 313-324, doi:10.1016/j.cell.2010.09.010 (2010).

104 Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079, doi:10.1093/bioinformatics/btp352 (2009).

105 Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. The New England journal of medicine 366, 883-892, doi:10.1056/NEJMoa1113205 (2012).

106 Sottoriva, A. et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc Natl Acad Sci U S A 110, 4009-4014, doi:10.1073/pnas.1219747110 (2013).

107 Supek, F., Minana, B., Valcarcel, J., Gabaldon, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324-1335, doi:10.1016/j.cell.2014.01.051 (2014).

108 Gaidzik, V. I. et al. TET2 mutations in acute myeloid leukemia (AML): results from a comprehensive genetic and clinical analysis of the AML study group. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 30, 1350-1357, doi:10.1200/jco.2011.39.2886 (2012).

109 Letouze, E. et al. SDH mutations establish a hypermethylator phenotype in paraganglioma. Cancer Cell 23, 739-752, doi:10.1016/j.ccr.2013.04.018 (2013).

110 Robertson, K. D. DNA methylation, methyltransferases, and cancer. Oncogene 20, 3139- 3155, doi:10.1038/sj.onc.1204341 (2001).

111 Yap, D. B. et al. Somatic mutations at EZH2 Y641 act dominantly through a mechanism of selectively altered PRC2 catalytic activity, to increase H3K27 trimethylation. Blood 117, 2451-2459, doi:10.1182/blood-2010-11-321208 (2011).

124